Service check timed out after 60.01 seconds as SOFT checks

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
meganwilliford
Posts: 101
Joined: Tue Aug 06, 2019 7:49 am

Service check timed out after 60.01 seconds as SOFT checks

Post by meganwilliford »

Is it possible to configure Service check timed out after 60.01 seconds responses as SOFT checks instead of HARD checks? We are trying to avoid these going into our ServiceNow feed. We are only reading hard checks and these are creating a lot of noise.

Or is there another way to avoid the following scenario?

We have host_down_disable_service_checks=1 in our core config file but sometimes we've seen the timing is not quite right where a service runs and reports Service check timed out after 60.01 seconds and then a minute later the host will be reported as down which is why the service check timed out.
meganwilliford
Posts: 101
Joined: Tue Aug 06, 2019 7:49 am

Re: Service check timed out after 60.01 seconds as SOFT chec

Post by meganwilliford »

I see this is the expected behavior from this link: https://assets.nagios.com/downloads/nag ... uling.html

"When a service check results in a non-OK state, Nagios will check the host that the service is associated with to determine whether or not is UP. If the host is not UP (i.e. it is either down or unreachable), Nagios will immediately put the service into a hard non-OK state and it will reset the current attempt number to 1. Since the service is in a hard non-OK state, the service check will be rescheduled at the normal frequency specified by the check_interval option instead of the retry_interval option."

It seems to be a timing issue. Are there best practices on how to get the timing right? Here is an example:

2019-12-18 12:39:23 [remotehost] DOWN HARD 5 of 5
2019-12-18 12:38:14 [remotehost] DOWN SOFT 4 of 5
2019-12-18 12:37:06 [remotehost] DOWN SOFT 3 of 5
2019-12-18 12:35:59 [remotehost] DOWN SOFT 2 of 5
2019-12-18 12:35:13 [remotehost] Disk Usage on C:/ CRITICAL HARD 1 of 5 (Service check timed out after 60.01 seconds)
2019-12-18 12:34:51 [remotehost] DOWN SOFT 1 of 5
2019-12-18 12:34:41 [remotehost] CPU Usage CRITICAL SOFT 1 of 5 (Service check timed out after 60.01 seconds)
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Service check timed out after 60.01 seconds as SOFT chec

Post by ssax »

You are correct, that is expected.

Set the check_interval lower on the host than on the services, that's how you'd do that (or increase the services, either way works).

Let me know if you have any questions or if I can clarify anything.
Locked