Could not stat() check result file

paulds · Post by **paulds** » Thu Aug 01, 2013 7:51 am

Running Nagios 3.5.0 on EL6, using packages from the EPEL repo. We monitor a few hundred hosts and services. We are getting sporadic warning in our nagios.log every few days like the following:

[1375153859] Warning: Could not stat() check result file '/var/log/nagios/spool/checkresults/cpkYQTf'.

Aside from these occasional warnings, everything appears to be functioning correctly. We've been trying to track down the cause of these warnings, with no success. FWIW, at any given time there appear to be on the order of a dozen or so check result files in that directory, all of them with very recent timestamps (less than 1 min), and they appear to be created and removed fairly rapidly.

Can anyone shed any light on the possible cause of these warnings, or suggest any diagnostic steps? Is there any reason to be concerned about this, or are these warnings utterly trivial and harmless?

Of possible significance is that this system was upgraded a while back from EL5, which had Nagios 2.12 in EPEL, and all of the old Nagios config files were migrated and then manually tweaked to get things working after the upgrade, since apparently a lot had changed. There was a lot of trial-and-error involved, but we eventually got everything working correctly, with only these sporadic warnings remaining. Just mentioning this in case it may shed any light on what may be going on here.

yancy · Post by **yancy** » Thu Aug 01, 2013 12:17 pm

paulds,

is it always the same file? cpkYQTf

can you check what the file permissions are of the file in question?

-Yancy

abrist · Post by **abrist** » Thu Aug 01, 2013 12:20 pm

Could you check the following permissions:

Code: Select all

ls -lad /usr/local/nagios/var/spool/checkresults/
ls -la /usr/local/nagios/var/spool/checkresults/

Additionally, do you notice any open file limits errors in /var/log/messages ?

Code: Select all

grep "open files" /var/log/messages

Also check the groups:

Code: Select all

grep nag /etc/group/

paulds · Post by **paulds** » Mon Aug 05, 2013 8:22 am

yancy, No, it's a different random filename each time.

I can't report on the permissions of the file in question, because by the time we check, it is already gone. I doubt it is a permissions issue, however, since if that's what it was, I'd expect we'd be getting hundreds of these warnings every minute, instead of one every couple of days.

$ ls -lad /var/log/nagios/spool/checkresults/
drwxr-x--- 2 nagios nagios 20480 Aug 5 09:17 /var/log/nagios/spool/checkresults/

$ ls -la /var/log/nagios/spool/checkresults/
total 48
drwxr-x--- 2 nagios nagios 20480 Aug 5 09:15 .
drwxr-x--- 3 nagios nagios 4096 Apr 24 21:05 ..
-rw------- 1 nagios nagios 436 Aug 5 09:15 c0kUn3T
-rw------- 1 nagios nagios 0 Aug 5 09:15 c0kUn3T.ok
-rw------- 1 nagios nagios 445 Aug 5 09:15 c92Z7l6
-rw------- 1 nagios nagios 0 Aug 5 09:15 c92Z7l6.ok
-rw------- 1 nagios nagios 469 Aug 5 09:15 cMmGQdO
-rw------- 1 nagios nagios 0 Aug 5 09:15 cMmGQdO.ok
-rw------- 1 nagios nagios 445 Aug 5 09:15 cPGw7Ai
-rw------- 1 nagios nagios 0 Aug 5 09:15 cPGw7Ai.ok
-rw------- 1 nagios nagios 443 Aug 5 09:15 cQ6rcQu
-rw------- 1 nagios nagios 0 Aug 5 09:15 cQ6rcQu.ok
-rw------- 1 nagios nagios 403 Aug 5 09:15 cV8GlRH
-rw------- 1 nagios nagios 0 Aug 5 09:15 cV8GlRH.ok

$ grep "open files" /var/log/messages*
$
[ie, no results]

Nagios runs as user "nagios", and is a member of group "nagios".

(Sorry for the delayed response. I expected to be notified when someone replied to my post, but I guess the forum doesn't do that. I'll try to remember to check back here at least daily.)

scottwilkerson · Post by **scottwilkerson** » Mon Aug 05, 2013 9:14 am

if I had to guess I would say this is caused by more than one nagios bin running at the same time

Code: Select all

ps -ef|grep bin/nagios

Code: Select all

service nagios stop
killall-9 nagios
service nagios start

paulds · Post by **paulds** » Mon Aug 05, 2013 9:33 am

(Oh hey, it sent me an e-mail notification this time! Cool.)

scottwilkerson, Nope, only one instance running.

$ ps -ef|grep bin/nagios
nagios 24121 1 0 Jul22 ? 00:11:58 /usr/sbin/nagios -d /etc/nagios/nagios.cfg

scottwilkerson · Post by **scottwilkerson** » Mon Aug 05, 2013 10:07 am

paulds wrote:Can anyone shed any light on the possible cause of these warnings, or suggest any diagnostic steps? Is there any reason to be concerned about this, or are these warnings utterly trivial and harmless?

Basically this error is saying that the file disappeared (was processed by another thread before it could be read) before it could be read. It really isn't much to be concerned about if you have verified that you do not have multiple instances running (which would be the most common cause of this).

paulds · Post by **paulds** » Mon Aug 05, 2013 10:32 am

Thanks for the additional details, Scott.

I really don't like ignoring mysteries like this, as they're often a subtle indication that something else is wrong.

abrist · Post by **abrist** » Mon Aug 05, 2013 10:44 am

Do you see any "defunct" nagios processes that hang around for longer than a few moments?

Code: Select all

watch "ps -aef| grep defunct"

You may want to the commands Scott suggested anyways:

Code: Select all

service nagios stop
killall-9 nagios
service nagios start

paulds · Post by **paulds** » Mon Aug 05, 2013 11:40 am

abrist, Ok, running that via watch -n1, and I am seeing one or more defunct nagios processes periodically, maybe every 5-10 seconds (it's irregular). Occasionally there are a bunch all at once, maybe 8-10 of them. But none ever linger for more than a second.

I'll restart the service just for the heck of it, but I really don't think this is going to be very helpful, since we actually restart it probably once a week on average already, since we do a restart whenever we add/change/remove any systems or services. And these warnings have been ongoing for many months.

Nagios Support Forum

Could not stat() check result file

Could not stat() check result file

Re: Could not stat() check result file

Re: Could not stat() check result file

Re: Could not stat() check result file

Re: Could not stat() check result file

Re: Could not stat() check result file

Re: Could not stat() check result file

Re: Could not stat() check result file

Re: Could not stat() check result file

Re: Could not stat() check result file