Page 1 of 1

command check timeout disables active checks

Posted: Wed May 10, 2023 10:15 am
by jrdoubleu
Running Nagios XI 5.9.2

Is it expected that an active service check will be automatically disabled by Nagios when a command (in this case, NRPE) times out?

Over time, active checks have been automatically disabled for a not-insignificant % of my monitored hosts. The service check status for those sites still shows Ok/Green in the dashboard, but "Service Status Detail" shows Next Check is set to "Not Scheduled", and Active Check Flag is Disabled. If I enable active checks, the next check gets scheduled and things resume as expected (for a while).

nagios.log shows this:

[1683717541] SERVICE ALERT: host2;sit_devices;UNKNOWN;HARD;1;(No output on stdout) stderr: connect to address 10.192.26.1 port 5666: Connection timed out
...
[1683717767] Error: External command failed -> DISABLE_SVC_CHECK;host2;sit_devices

I interpret this to mean that because the active check command timed out once, Nagios has decided to place this host-service check combination in disabled status.

It's not uncommon for our monitored sites to have unexpected network outages or server downtime.

Is there a config setting to prevent the disable of the active check in this command timeout situation? If this is desired behavior to prevent thread overload, is there a way to get Nagios to automatically re-enable checks that get into this state?

This behavior lowers the integrity of our Nagios dashboards, as users stop trusting the statuses being reported.

Thanks in advance,

John

Re: command check timeout disables active checks

Posted: Thu May 11, 2023 5:43 pm
by kg2857
What is host_down_disable_service_checks set to in nagios.cfg?
You might look at the nagios.cfg doc and search for orphan to see if it might apply, as well as looking for other settings that might be the cause.
Yiu might look at the service check timeout setting in nagios.cfg and make sure that the check_nrpe commands defined have a timeout less than the global timeout.

Re: command check timeout disables active checks

Posted: Fri May 12, 2023 7:26 am
by jrdoubleu
Thanks for the response.
kg2857 wrote: Thu May 11, 2023 5:43 pm What is host_down_disable_service_checks set to in nagios.cfg?
/usr/local/nagios/etc/nagios.cfg has no entry for host_down_disable_service_checks
kg2857 wrote: Thu May 11, 2023 5:43 pm You might look at the nagios.cfg doc and search for orphan to see if it might apply, as well as looking for other settings that might be the cause.
Will do.
kg2857 wrote: Thu May 11, 2023 5:43 pm Yiu might look at the service check timeout setting in nagios.cfg and make sure that the check_nrpe commands defined have a timeout less than the global timeout.
/usr/local/nagios/etc/nagios.cfg has service_check_timeout=500

Core Config Manager shows the following for check_nrpe command:
$USER1$/check_nrpe -2 -H $HOSTADDRESS$ -u -t 300 -c $ARG1$ $ARG2$ $ARG3$

all monitored hosts, have the same nrpe.cfg file setting
command_timeout=300
connection_timeout=300

Thanks for the direction. I'll dig some.

John

Re: command check timeout disables active checks

Posted: Tue Jun 20, 2023 2:48 pm
by jrdoubleu
FYI, I determined the cause of the issue. Figured I'd provide a resolution for others in a similar boat.

The service checks in question, I was converting to ACTIVE checks from PASSIVE checks.

/usr/local/nagios/libexec/eventhandlers/disable-service-checks.sh
- this script fires when a host status changes
- pre-existing custom code in there to pass in external command DISABLE_SVC_CHECK when the host status changes to UP
- I'm not quite sure what DISABLE_SVC_CHECK on a passive check does--this could have been a mistake or intentional
- changed those lines to be ENABLE_SVC_CHECK when host status changes to UP
- may be redundant as there is an ENABLE_HOST_SVC_CHECKS above it

Service checks are staying current now.

Re: command check timeout disables active checks

Posted: Tue Jun 27, 2023 3:51 am
by namdosan1409
I've just installed and configure checkmk agent on some AIX hosts. My issue is the checks are taking too long. car games