Search found 13 matches

by vinzclortho
Thu Apr 02, 2015 3:47 pm
Forum: Open Source Nagios Projects
Topic: Invalid Freshness Alarms
Replies: 23
Views: 9779

Re: Invalid Freshness Alarms

Sure, my problem is fixed since we're not using the command pipe anymore. I still think it would be good to patch nagios to fix this command pipe behavior.
by vinzclortho
Mon Mar 30, 2015 1:46 pm
Forum: Open Source Nagios Projects
Topic: Invalid Freshness Alarms
Replies: 23
Views: 9779

Re: Invalid Freshness Alarms

I just wanted to post an update. We have been running since Thursday using the new check_results strategy and have not seen any invalid alarms. It seems that the issue was related to that command pipe corruption that I found. I think it would be prudent to clear out the contents of the buf variable ...
by vinzclortho
Thu Mar 26, 2015 11:23 am
Forum: Open Source Nagios Projects
Topic: Invalid Freshness Alarms
Replies: 23
Views: 9779

Re: Invalid Freshness Alarms

After some spelunking in the nagios source, I think I have discovered how this was happening, if not necessarily why. I set debug_verbosity=2, which adds "Raw command entry:" logging. From the source, nagios logs Raw Commands before it processes any of the command entries. I noticed that t...
by vinzclortho
Wed Mar 25, 2015 9:54 am
Forum: Open Source Nagios Projects
Topic: Invalid Freshness Alarms
Replies: 23
Views: 9779

Re: Invalid Freshness Alarms

We do have NTP configured and monitored with the built-in check_mk agent checks. None of the hosts are off more than a few milliseconds. Even if they were, that would not explain checks that have a 0 timestamp or the repeated checks.
by vinzclortho
Mon Mar 23, 2015 12:45 pm
Forum: Open Source Nagios Projects
Topic: Invalid Freshness Alarms
Replies: 23
Views: 9779

Re: Invalid Freshness Alarms

I have modified my script to tail the nagios.debug log and watch for old ( > 150 second ) commands being pushed in. It has turned up lots of stuff : [1427128588.816659] [128.1] [pid=9556] Command Entry Time: 1427128385 large diff : 203 - 1427128588 1427128385 [1427131034.299819] [128.1] [pid=9556] C...
by vinzclortho
Mon Mar 23, 2015 10:37 am
Forum: Open Source Nagios Projects
Topic: Invalid Freshness Alarms
Replies: 23
Views: 9779

Re: Invalid Freshness Alarms

I have been keeping an eye on stuff in check_result_path. There is generally nothing in there that's more than a minute or so old. A few of the servers did have some very old files - weeks or months old. I suspect these were from crashes - we previously had issues with livestatus causing nagios to c...
by vinzclortho
Fri Mar 20, 2015 4:23 pm
Forum: Open Source Nagios Projects
Topic: Invalid Freshness Alarms
Replies: 23
Views: 9779

Re: Invalid Freshness Alarms

And one more note - we have the default max_check_result_file_age value (3600). Shouldn't this prevent these really old check results from being processed?
by vinzclortho
Fri Mar 20, 2015 3:44 pm
Forum: Open Source Nagios Projects
Topic: Invalid Freshness Alarms
Replies: 23
Views: 9779

Re: Invalid Freshness Alarms

One more note on the 12-hour old command - it does not show up in nagios.debug at the time that it was sent in by gostatus. So it's not as if it is an old result that got accepted once and then replayed somehow. It appears to have been submitted, then sat on the cmd file for 12+ hours, then picked u...
by vinzclortho
Fri Mar 20, 2015 3:20 pm
Forum: Open Source Nagios Projects
Topic: Invalid Freshness Alarms
Replies: 23
Views: 9779

Re: Invalid Freshness Alarms

Ok, got another alarm today and got a bunch of very interesting info. https://gist.githubusercontent.com/enichols/b298954839ed81eb913b/raw/8ccfc6a5cfb75e3257d33926da4a2f55e0370935/gistfile1.txt As you can see from the nagios.debug log, somehow gostatus pushed a command 12 hours ago and it just showe...
by vinzclortho
Thu Mar 19, 2015 11:15 am
Forum: Open Source Nagios Projects
Topic: Invalid Freshness Alarms
Replies: 23
Views: 9779

Re: Invalid Freshness Alarms

FYI, we had another bogus alarm last night, so the ramdisk did not help. Unfortunately, that server did not have a nagios debug_level set, so I don't have full debugging info, just cron, nagios.log and gostatus logs : https://gist.githubusercontent.com/enichols/899822f3f56df77f448b/raw/abc697328ad08...