Every now and then I see some odd state behavior. We want to be sure there is always an OK HARD state after a CRITICAL HARD state has recovered.
Below is an example of where I've seen a CRITICAL HARD state and then an OK SOFT state when it recovers. Between the two checks the remote host was reported as down. In our core config we have host_down_disable_service_checks=1 so the services do not run when a host is down which seems to be interfering with the OK recovery state. Have you seen this before and is there anything we could do to resolve this?
Date / Time Host Service State State Type Attempt
2019-12-12 07:29:42 <remote host> Service status for: <service> OK SOFT 1 of 5
2019-12-12 07:29:41 <remote host> Swap Usage OK SOFT 1 of 5
2019-12-12 07:29:31 <remote host> UP HARD 1 of 5
2019-12-12 07:24:29 <remote host> DOWN HARD 5 of 5
2019-12-12 07:23:21 <remote host> DOWN SOFT 4 of 5
2019-12-12 07:22:12 <remote host> DOWN SOFT 3 of 5
2019-12-12 07:21:04 <remote host> DOWN SOFT 2 of 5
2019-12-12 07:20:43 <remote host> Service status for: <service> CRITICAL HARD 1 of 5
2019-12-12 07:20:42 <remote host> Swap Usage CRITICAL HARD 1 of 5
2019-12-12 07:19:55 <remote host> DOWN SOFT 1 of 5
2019-12-12 06:01:04 <remote host> CPU Usage OK SOFT 2 of 5
2019-12-12 06:00:03 <remote host> CRITICAL SOFT 1 of 5
2019-12-11 16:57:13 <remote host> OK SOFT 2 of 5
2019-12-11 16:56:12 <remote host> CPU Usage CRITICAL SOFT 1 of 5
HARD state behavior
Re: HARD state behavior
You appear to be running into the same issue mentioned here:
https://support.nagios.com/forum/viewto ... 03#p287003
and the documented bug here:
https://github.com/NagiosEnterprises/na ... issues/651
This is fixed in the 4.4.4 release of Core:
https://www.nagios.org/projects/nagios-core/history/4x/
Are you running an XI version older than 5.6.7? 5.6.7 installs with Core 4.4.5 which should have this fixed. The current version of XI is 5.6.9 and would have the fix as well. I'd recommend upgrading if you're on an older version.
https://support.nagios.com/forum/viewto ... 03#p287003
and the documented bug here:
https://github.com/NagiosEnterprises/na ... issues/651
This is fixed in the 4.4.4 release of Core:
https://www.nagios.org/projects/nagios-core/history/4x/
Are you running an XI version older than 5.6.7? 5.6.7 installs with Core 4.4.5 which should have this fixed. The current version of XI is 5.6.9 and would have the fix as well. I'd recommend upgrading if you're on an older version.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
meganwilliford
- Posts: 101
- Joined: Tue Aug 06, 2019 7:49 am
Re: HARD state behavior
Excellent, thanks! We are on 5.6.6 with plans to upgrade soon. You can lock this post.
Re: HARD state behavior
Sounds good!
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.