Nagios XI Critical HARD 1/5 Followed By OK SOFT
Posted: Thu Aug 01, 2019 8:00 pm
Hi,
I'm getting some unexpected behaviour regarding SOFT and HARD states, as shown on the State History report. One of my services is showing as a Critical HARD state after 1/5 checks, and recovering with an OK SOFT state afterwards. I would expect Critical HARD states to be followed by an OK HARD state.
This severely affects the SLA reporting of HARD states, with the below outage reporting as 98.655% monthly compared with the reality of 99.989%.
From reading the State Types document, the only way a Critical HARD service could occur on the first of 5 max_check_attempts is if the HOST was DOWN, however, the host was OK during this time.
If the service was correctly showing Critical HARD on check 1/5, my understanding is it should always be followed by an OK HARD.
Here is the State History of the service in question.
Date / Time Host Service State State Type Attempt Information
28/07/2019 14:04 My Host Availability OK SOFT 1 of 5 MULTIPLE CHECK OK: 0 failed 4 succeeded
28/07/2019 13:59 My Host Availability CRITICAL HARD 1 of 5 MULTIPLE CHECK CRITICAL: 2 failed 2 succeeded
I'm running Nagios XI version 5.6.4.
Please suggest whether this behaviour is expected, and under what conditions.
Thanks in advance.
Kind regards,
Justin
I'm getting some unexpected behaviour regarding SOFT and HARD states, as shown on the State History report. One of my services is showing as a Critical HARD state after 1/5 checks, and recovering with an OK SOFT state afterwards. I would expect Critical HARD states to be followed by an OK HARD state.
This severely affects the SLA reporting of HARD states, with the below outage reporting as 98.655% monthly compared with the reality of 99.989%.
From reading the State Types document, the only way a Critical HARD service could occur on the first of 5 max_check_attempts is if the HOST was DOWN, however, the host was OK during this time.
If the service was correctly showing Critical HARD on check 1/5, my understanding is it should always be followed by an OK HARD.
Here is the State History of the service in question.
Date / Time Host Service State State Type Attempt Information
28/07/2019 14:04 My Host Availability OK SOFT 1 of 5 MULTIPLE CHECK OK: 0 failed 4 succeeded
28/07/2019 13:59 My Host Availability CRITICAL HARD 1 of 5 MULTIPLE CHECK CRITICAL: 2 failed 2 succeeded
I'm running Nagios XI version 5.6.4.
Please suggest whether this behaviour is expected, and under what conditions.
Thanks in advance.
Kind regards,
Justin