> 1. Too many physical temp files.
> 2. Too many open files that were deleted files, but still have kernel
> references
>
> For #1, could you sort the files by modification time and see how they
> look. If you've got a lot of "old" files (> 1 hour), there' a problem.
> Some of these older files are normal, as I've mentioned before, and
> its best to run something like tmpwatch on the directory to remove them.
Yes, a few of them (about 150) have content and are older than 1h. Ok, I
will take care of them. Btw debian's tmpwatch is called tmpreaper
nag01:/tmp# ls -al | grep nagios | grep -e "nagios\s*0\s*" -v | wc -l
150
> For #2...
> lsof reports a number of temp files that are still open, but were
> deleted. You can see if this is your problem by running:
>
> lsof | grep nagios | grep DEL
>
> I did some digging and this was caused by mmap() and munmap() when
> Nagios encountered a temp file of 0 byte size, which will happen when
> checks have no output. I changed the code to skip mmap()ing altogether
> when it encounters 0 byte files, and that solved the problem for me. A
> patch will be in CVS shortly for this...
It's true we are checking hundreds of services on unreachable networks
to test the new host checking logic.
"(Service Check Timed Out)". So this must cause this 0 size files.
nag01:/tmp# ls -al | grep nagios | grep -e "nagios\s*0\s*" | wc -l
21473
I will check your patch on my nagios-cvs installations as soon as it is
available.
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]