Socket Timeouts immediately going into a HARD state

hbouma · Post by **hbouma** » Thu Jan 17, 2019 2:01 pm

I have been noticing that when a check fails due to a socket timeout, the check doesn't retry, and instead immediately goes into a Hard failure state. Even more odd is that, in this instance, 5 disk checks failed but only 2 sent notifications.

For instance, I had a team reboot a server without putting it into downtime. The socket timeout emails on check failure 1 of 5.

2019-01-17 13_56_58-Document1 - Word.png

We are running Nagios XI 5.5.7 on Red Hat 7.6 64bit VM's. NRPE v3.2.1.

npolovenko · Post by **npolovenko** » Thu Jan 17, 2019 5:27 pm

@hbouma, Was the host in a Critical state when services started going into hard states? This sounds like an issue from this thread:
https://support.nagios.com/forum/viewto ... 16&t=52032

ssax · Post by **ssax** » Thu Jan 17, 2019 5:28 pm

This is likely intended functionality, the host was in a problem state.

The way that it's supposed to work is that when the service checks and detects a problem it then checks the host and if the host is in a down state (hard or soft), the service will go into a hard problem state, it won't go through the soft states if the host is down. I'm referring to this specifically:

When a service check results in a non-OK state, Nagios will check the host that the service is associated with to determine whether or not is UP. If the host is not UP (i.e. it is either down or unreachable), Nagios will immediately put the service into a hard non-OK state and it will reset the current attempt number to 1. Since the service is in a hard non-OK state, the service check will be rescheduled at the normal frequency specified by the check_interval option instead of the retry_interval option.

Taken from here:

https://assets.nagios.com/downloads/nag ... uling.html

One thing that you could do would be to add host_down_disable_service_checks=1 in your /usr/local/nagios/etc/nagios.cfg and then restart the nagios service:

Code: Select all

service nagios restart

That option will not even perform the service checks if the host is in a problem state (hard or soft).

So the functionality was actually broken in earlier versions and it is working as intended now in XI 5.5+ with the upgraded Core backend.

hbouma · Post by **hbouma** » Mon Jan 21, 2019 8:45 am

Thank you.

This change has been added to our system and should resolve my issue.

npolovenko · Post by **npolovenko** » Mon Jan 21, 2019 5:22 pm

@hbouma, Please let us know if its ok to close this thread as resolved?

hbouma · Post by **hbouma** » Mon Jan 21, 2019 5:33 pm

Yes, you can consider this as resolved.

Nagios Support Forum

Socket Timeouts immediately going into a HARD state

Socket Timeouts immediately going into a HARD state

Re: Socket Timeouts immediately going into a HARD state

Re: Socket Timeouts immediately going into a HARD state

Re: Socket Timeouts immediately going into a HARD state

Re: Socket Timeouts immediately going into a HARD state

Re: Socket Timeouts immediately going into a HARD state