Service checks go stale 30s after passive check received.

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
invade
Posts: 29
Joined: Thu Nov 16, 2017 7:45 am

Re: Service checks go stale 30s after passive check received

Post by invade »

tgriep wrote:Do you ever see that service go in to an OK state after receiving the Passive check?
Isn't that what the log shows? eg.

[Wed Nov 22 22:39:21 2017] PASSIVE SERVICE CHECK: host;System-Partitions;0;DISK OK
Passive check received OK.

[Wed Nov 22 22:39:21 2017] SERVICE ALERT: host;System-Partitions;OK;HARD;1;DISK OK
Service set to an OK state.

[Wed Nov 22 22:40:01 2017] Warning: The results of service 'System-Partitions' on host 'host' are stale by 0d 1h 15m 30s (threshold=0d 0h 14m 0s). I'm forcing an immediate check of the service.
Warning about service checks being stale.

[Wed Nov 22 22:40:11 2017] SERVICE ALERT: host;System-Partitions;CRITICAL;HARD;1;CRITICAL: No Recent Passive Service Checks.
Service set to a CRITICAL state.

Apologies if I've misunderstood.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Service checks go stale 30s after passive check received

Post by tgriep »

Sorry, I wasn't clear. I meant in the GUI interface.
Be sure to check out our Knowledgebase for helpful articles and solutions!
invade
Posts: 29
Joined: Thu Nov 16, 2017 7:45 am

Re: Service checks go stale 30s after passive check received

Post by invade »

Problem is resolved. The issue was that the clock was wrong on all the hosts where the service checks kept going stale.

The screenshot attached is following the successful receipt of a check from the host:
[Fri Dec 8 09:31:18 2017] PASSIVE SERVICE CHECK: ccc-gr-aio;Service-Asterisk;0;OK - Asterisk Service is running / Dec-08 08:50:01 EET
Screenshot from 2017-12-08 09-32-12.png
So the check was received on the Nagios server at 09:31:18 GMT (which should be 11:31:18 EET) but the time on the host was actually 08:50:01 EET. This seems to have set the "Last Check" time to 06:50:01 which is older than the freshness threshold and therefore set the check to stale.

So, although the Nagios server was receiving regular checks, there must be some sort of timestamp (that was incorrect) in the check that was causing the Nagios server to determine that the check received was actually hours old and therefore stale.

We have now corrected the time on the hosts and the checks are no longer going stale.

Many thanks for the assistance with this problem, we now have a much quieter alerting system.
kyang

Re: Service checks go stale 30s after passive check received

Post by kyang »

Glad you resolved your issue!

Did you have any more questions or are we okay to lock this up?
invade
Posts: 29
Joined: Thu Nov 16, 2017 7:45 am

Re: Service checks go stale 30s after passive check received

Post by invade »

All sorted now. This thread can be marked as resolved. Thanks.
kyang

Re: Service checks go stale 30s after passive check received

Post by kyang »

Sounds good! I'll be closing this thread!

If you have any more questions, feel free to create another thread.

Thanks for using the Nagios Support Forum!
Locked