Re: [Nagios-devel] nagios-cvs: Too many open files?
Posted: Thu Feb 08, 2007 9:26 am
Gerd Mueller wrote:
>> I should have mentioned that all of my hosts and services are passiv. Maybe "passive host bug" wasn't completly solved
.
>
> Now I am sure it also happens on active services/hosts.
>
> Much too much files:
>
> nag179:/tmp# rm nagios*
> -bash: /bin/rm: Argument list too long
>
> Gerd
>
I think there are two problems here:
1. Too many physical temp files.
2. Too many open files that were deleted files, but still have kernel
references
For #1, could you sort the files by modification time and see how they
look. If you've got a lot of "old" files (> 1 hour), there' a problem.
Some of these older files are normal, as I've mentioned before, and
its best to run something like tmpwatch on the directory to remove them.
For #2...
lsof reports a number of temp files that are still open, but were
deleted. You can see if this is your problem by running:
lsof | grep nagios | grep DEL
I did some digging and this was caused by mmap() and munmap() when
Nagios encountered a temp file of 0 byte size, which will happen when
checks have no output. I changed the code to skip mmap()ing altogether
when it encounters 0 byte files, and that solved the problem for me. A
patch will be in CVS shortly for this...
Ethan Galstad,
Nagios Developer
---
Email: [email protected]
Website: http://www.nagios.org
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
>> I should have mentioned that all of my hosts and services are passiv. Maybe "passive host bug" wasn't completly solved
>
> Now I am sure it also happens on active services/hosts.
>
> Much too much files:
>
> nag179:/tmp# rm nagios*
> -bash: /bin/rm: Argument list too long
>
> Gerd
>
I think there are two problems here:
1. Too many physical temp files.
2. Too many open files that were deleted files, but still have kernel
references
For #1, could you sort the files by modification time and see how they
look. If you've got a lot of "old" files (> 1 hour), there' a problem.
Some of these older files are normal, as I've mentioned before, and
its best to run something like tmpwatch on the directory to remove them.
For #2...
lsof reports a number of temp files that are still open, but were
deleted. You can see if this is your problem by running:
lsof | grep nagios | grep DEL
I did some digging and this was caused by mmap() and munmap() when
Nagios encountered a temp file of 0 byte size, which will happen when
checks have no output. I changed the code to skip mmap()ing altogether
when it encounters 0 byte files, and that solved the problem for me. A
patch will be in CVS shortly for this...
Ethan Galstad,
Nagios Developer
---
Email: [email protected]
Website: http://www.nagios.org
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]