Page 1 of 1
Socket Timeouts immediately going into a HARD state
Posted: Thu Jan 17, 2019 2:01 pm
by hbouma
I have been noticing that when a check fails due to a socket timeout, the check doesn't retry, and instead immediately goes into a Hard failure state. Even more odd is that, in this instance, 5 disk checks failed but only 2 sent notifications.
For instance, I had a team reboot a server without putting it into downtime. The socket timeout emails on check failure 1 of 5.
2019-01-17 13_56_58-Document1 - Word.png
We are running Nagios XI 5.5.7 on Red Hat 7.6 64bit VM's. NRPE v3.2.1.
Re: Socket Timeouts immediately going into a HARD state
Posted: Thu Jan 17, 2019 5:27 pm
by npolovenko
@hbouma, Was the host in a Critical state when services started going into hard states? This sounds like an issue from this thread:
https://support.nagios.com/forum/viewto ... 16&t=52032
Re: Socket Timeouts immediately going into a HARD state
Posted: Thu Jan 17, 2019 5:28 pm
by ssax
This is likely intended functionality, the host was in a problem state.
The way that it's supposed to work is that when the service checks and detects a problem it then checks the host and if the host is in a down state (hard or soft), the service will go into a hard problem state, it won't go through the soft states if the host is down. I'm referring to this specifically:
When a service check results in a non-OK state, Nagios will check the host that the service is associated with to determine whether or not is UP. If the host is not UP (i.e. it is either down or unreachable), Nagios will immediately put the service into a hard non-OK state and it will reset the current attempt number to 1. Since the service is in a hard non-OK state, the service check will be rescheduled at the normal frequency specified by the check_interval option instead of the retry_interval option.
Taken from here:
https://assets.nagios.com/downloads/nag ... uling.html
One thing that you could do would be to add
host_down_disable_service_checks=1 in your
/usr/local/nagios/etc/nagios.cfg and then restart the nagios service:
That option will not even perform the service checks if the host is in a problem state (hard or soft).
So the functionality was actually broken in earlier versions​ and it is working as intended now in XI 5.5+ with the upgraded Core backend.
Re: Socket Timeouts immediately going into a HARD state
Posted: Mon Jan 21, 2019 8:45 am
by hbouma
Thank you.
This change has been added to our system and should resolve my issue.
Re: Socket Timeouts immediately going into a HARD state
Posted: Mon Jan 21, 2019 5:22 pm
by npolovenko
@hbouma, Please let us know if its ok to close this thread as resolved?
Re: Socket Timeouts immediately going into a HARD state
Posted: Mon Jan 21, 2019 5:33 pm
by hbouma
Yes, you can consider this as resolved.