command check timeout disables active checks

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Post Reply
jrdoubleu
Posts: 3
Joined: Wed May 10, 2023 9:26 am

command check timeout disables active checks

Post by jrdoubleu »

Running Nagios XI 5.9.2

Is it expected that an active service check will be automatically disabled by Nagios when a command (in this case, NRPE) times out?

Over time, active checks have been automatically disabled for a not-insignificant % of my monitored hosts. The service check status for those sites still shows Ok/Green in the dashboard, but "Service Status Detail" shows Next Check is set to "Not Scheduled", and Active Check Flag is Disabled. If I enable active checks, the next check gets scheduled and things resume as expected (for a while).

nagios.log shows this:

[1683717541] SERVICE ALERT: host2;sit_devices;UNKNOWN;HARD;1;(No output on stdout) stderr: connect to address 10.192.26.1 port 5666: Connection timed out
...
[1683717767] Error: External command failed -> DISABLE_SVC_CHECK;host2;sit_devices

I interpret this to mean that because the active check command timed out once, Nagios has decided to place this host-service check combination in disabled status.

It's not uncommon for our monitored sites to have unexpected network outages or server downtime.

Is there a config setting to prevent the disable of the active check in this command timeout situation? If this is desired behavior to prevent thread overload, is there a way to get Nagios to automatically re-enable checks that get into this state?

This behavior lowers the integrity of our Nagios dashboards, as users stop trusting the statuses being reported.

Thanks in advance,

John
kg2857
Posts: 235
Joined: Wed Apr 12, 2023 5:48 pm

Re: command check timeout disables active checks

Post by kg2857 »

What is host_down_disable_service_checks set to in nagios.cfg?
You might look at the nagios.cfg doc and search for orphan to see if it might apply, as well as looking for other settings that might be the cause.
Yiu might look at the service check timeout setting in nagios.cfg and make sure that the check_nrpe commands defined have a timeout less than the global timeout.
jrdoubleu
Posts: 3
Joined: Wed May 10, 2023 9:26 am

Re: command check timeout disables active checks

Post by jrdoubleu »

Thanks for the response.
kg2857 wrote: Thu May 11, 2023 5:43 pm What is host_down_disable_service_checks set to in nagios.cfg?
/usr/local/nagios/etc/nagios.cfg has no entry for host_down_disable_service_checks
kg2857 wrote: Thu May 11, 2023 5:43 pm You might look at the nagios.cfg doc and search for orphan to see if it might apply, as well as looking for other settings that might be the cause.
Will do.
kg2857 wrote: Thu May 11, 2023 5:43 pm Yiu might look at the service check timeout setting in nagios.cfg and make sure that the check_nrpe commands defined have a timeout less than the global timeout.
/usr/local/nagios/etc/nagios.cfg has service_check_timeout=500

Core Config Manager shows the following for check_nrpe command:
$USER1$/check_nrpe -2 -H $HOSTADDRESS$ -u -t 300 -c $ARG1$ $ARG2$ $ARG3$

all monitored hosts, have the same nrpe.cfg file setting
command_timeout=300
connection_timeout=300

Thanks for the direction. I'll dig some.

John
jrdoubleu
Posts: 3
Joined: Wed May 10, 2023 9:26 am

Re: command check timeout disables active checks

Post by jrdoubleu »

FYI, I determined the cause of the issue. Figured I'd provide a resolution for others in a similar boat.

The service checks in question, I was converting to ACTIVE checks from PASSIVE checks.

/usr/local/nagios/libexec/eventhandlers/disable-service-checks.sh
- this script fires when a host status changes
- pre-existing custom code in there to pass in external command DISABLE_SVC_CHECK when the host status changes to UP
- I'm not quite sure what DISABLE_SVC_CHECK on a passive check does--this could have been a mistake or intentional
- changed those lines to be ENABLE_SVC_CHECK when host status changes to UP
- may be redundant as there is an ENABLE_HOST_SVC_CHECKS above it

Service checks are staying current now.
namdosan1409
Posts: 4
Joined: Tue Jun 27, 2023 3:50 am

Re: command check timeout disables active checks

Post by namdosan1409 »

I've just installed and configure checkmk agent on some AIX hosts. My issue is the checks are taking too long. car games
Post Reply