Soft vs Hard States
Posted: Tue Feb 27, 2018 4:42 am
Hi.
Could someone explain to me the following events from the log:
[Mon Feb 26 04:50:11 2018] PASSIVE HOST CHECK: host1;0;OK - Feb-26 04:50:01 GMT
[Mon Feb 26 04:50:30 2018] PASSIVE SERVICE CHECK: host1;System-Partitions;0;DISK OK
[Mon Feb 26 05:50:51 2018] Warning: The results of service 'System-Partitions' on host 'host1' are stale by 0d 0h 0m 35s (threshold=0d 1h 0m 0s). I'm forcing an immediate check of the service.
[Mon Feb 26 05:50:51 2018] Warning: The results of host 'host1' are stale by 0d 0h 0m 50s (threshold=0d 1h 0m 0s). I'm forcing an immediate check of the host.
[Mon Feb 26 05:51:01 2018] HOST ALERT: host1;DOWN;SOFT;1;CRITICAL: No Recent Passive Host Checks.
[Mon Feb 26 05:51:01 2018] SERVICE ALERT: host1;System-Partitions;CRITICAL;HARD;1;CRITICAL: No Recent Passive Service Checks.
[Mon Feb 26 05:51:01 2018] SERVICE NOTIFICATION: host1;host1;System-Partitions;CRITICAL;notifyservice-host1;CRITICAL: No Recent Passive Service Checks.
We have a Nagios system that receives passive only checks from numerous hosts. The checks run every five minutes, have a max checks attempts value of one, and have a freshness threshold of one hour.
In the example above the host went down just after 04:50 and so an hour later Nagios ran the check command (check_dummy) resulting in a critical alert status.
What I don't understand is why the host check went into a soft state but the service alert went into a hard state (resulting in a notification). The same thing happens for every host and service check on all our systems.
Ideally I would like both host and service alerts to use the same state in the event that no checks are received.
I did have a look through the forums and I can see, what appears to be people asking a similar question but, no conclusive explanation. Apologies if the answer is already out there.
If further configuration details are required then I can add them to the thread.
Thanks in advance.
Could someone explain to me the following events from the log:
[Mon Feb 26 04:50:11 2018] PASSIVE HOST CHECK: host1;0;OK - Feb-26 04:50:01 GMT
[Mon Feb 26 04:50:30 2018] PASSIVE SERVICE CHECK: host1;System-Partitions;0;DISK OK
[Mon Feb 26 05:50:51 2018] Warning: The results of service 'System-Partitions' on host 'host1' are stale by 0d 0h 0m 35s (threshold=0d 1h 0m 0s). I'm forcing an immediate check of the service.
[Mon Feb 26 05:50:51 2018] Warning: The results of host 'host1' are stale by 0d 0h 0m 50s (threshold=0d 1h 0m 0s). I'm forcing an immediate check of the host.
[Mon Feb 26 05:51:01 2018] HOST ALERT: host1;DOWN;SOFT;1;CRITICAL: No Recent Passive Host Checks.
[Mon Feb 26 05:51:01 2018] SERVICE ALERT: host1;System-Partitions;CRITICAL;HARD;1;CRITICAL: No Recent Passive Service Checks.
[Mon Feb 26 05:51:01 2018] SERVICE NOTIFICATION: host1;host1;System-Partitions;CRITICAL;notifyservice-host1;CRITICAL: No Recent Passive Service Checks.
We have a Nagios system that receives passive only checks from numerous hosts. The checks run every five minutes, have a max checks attempts value of one, and have a freshness threshold of one hour.
In the example above the host went down just after 04:50 and so an hour later Nagios ran the check command (check_dummy) resulting in a critical alert status.
What I don't understand is why the host check went into a soft state but the service alert went into a hard state (resulting in a notification). The same thing happens for every host and service check on all our systems.
Ideally I would like both host and service alerts to use the same state in the event that no checks are received.
I did have a look through the forums and I can see, what appears to be people asking a similar question but, no conclusive explanation. Apologies if the answer is already out there.
If further configuration details are required then I can add them to the thread.
Thanks in advance.