Socket Timeouts immediately going into a HARD state

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
hbouma
Posts: 483
Joined: Tue Feb 27, 2018 9:31 am

Socket Timeouts immediately going into a HARD state

Post by hbouma »

I have been noticing that when a check fails due to a socket timeout, the check doesn't retry, and instead immediately goes into a Hard failure state. Even more odd is that, in this instance, 5 disk checks failed but only 2 sent notifications.

For instance, I had a team reboot a server without putting it into downtime. The socket timeout emails on check failure 1 of 5.
2019-01-17 13_56_58-Document1 - Word.png
We are running Nagios XI 5.5.7 on Red Hat 7.6 64bit VM's. NRPE v3.2.1.
You do not have the required permissions to view the files attached to this post.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Socket Timeouts immediately going into a HARD state

Post by npolovenko »

@hbouma, Was the host in a Critical state when services started going into hard states? This sounds like an issue from this thread:
https://support.nagios.com/forum/viewto ... 16&t=52032
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Socket Timeouts immediately going into a HARD state

Post by ssax »

This is likely intended functionality, the host was in a problem state.

The way that it's supposed to work is that when the service checks and detects a problem it then checks the host and if the host is in a down state (hard or soft), the service will go into a hard problem state, it won't go through the soft states if the host is down. I'm referring to this specifically:
When a service check results in a non-OK state, Nagios will check the host that the service is associated with to determine whether or not is UP. If the host is not UP (i.e. it is either down or unreachable), Nagios will immediately put the service into a hard non-OK state and it will reset the current attempt number to 1. Since the service is in a hard non-OK state, the service check will be rescheduled at the normal frequency specified by the check_interval option instead of the retry_interval option.
Taken from here:

https://assets.nagios.com/downloads/nag ... uling.html

One thing that you could do would be to add host_down_disable_service_checks=1 in your /usr/local/nagios/etc/nagios.cfg and then restart the nagios service:

Code: Select all

service nagios restart
That option will not even perform the service checks if the host is in a problem state (hard or soft).


So the functionality was actually broken in earlier versions​ and it is working as intended now in XI 5.5+ with the upgraded Core backend.
hbouma
Posts: 483
Joined: Tue Feb 27, 2018 9:31 am

Re: Socket Timeouts immediately going into a HARD state

Post by hbouma »

Thank you.

This change has been added to our system and should resolve my issue.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Socket Timeouts immediately going into a HARD state

Post by npolovenko »

@hbouma, Please let us know if its ok to close this thread as resolved?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
hbouma
Posts: 483
Joined: Tue Feb 27, 2018 9:31 am

Re: Socket Timeouts immediately going into a HARD state

Post by hbouma »

Yes, you can consider this as resolved.
Locked