Nagios Support Forum

Posted: **Fri Aug 11, 2017 7:06 am**

Hi,

Running Nagios XI 5.4.4 on RHEL 6 with NCPA 2.0.3 (Windows, Linux).

Some of our service checks call an NCPA plugin (a Powershell script) which sometimes times out. Strangely enough this results in a CRITICAL alert for the service, and an accompanying notification. I would expect to see an UNKNOWN alert based on previous experience and checking the code of check_ncpa.py in Git.

I have done the following debugging:

1. run the script from the command line and when it times out, record the return values from check_ncpa.py, this looks good: exit value is 3 and stdout text contains UNKNOWN:

bash-4.1$ time /usr/local/nagios/libexec/check_ncpa.py -H <host> -t '<token>' -P 5693 -M 'plugins/Nagios_Plugin_eventfinder_Application_log.ps1/3221242535/Error'
UNKNOWN: Execution exceeded timeout threshold of 60s

real 1m0.050s
user 0m0.046s
sys 0m0.022s

bash-4.1$ echo $?
3

2. Run the same from the NCPA GUI. Exit value and stdout text differ:

https://<host>:5693/api/plugins/Nagios_Plugin_eventfinder_Application_log.ps1/3221242535/Error

{ "returncode": 1, "stdout": "Error: Plugin command timed out. (60 sec)" }

3. Check Nagios Event Log: for these timeouts a Warning is thown for the service check, followed by a critical service alert. See attached Hc_3151.jpg.

4. Check Nagios Notifications, a CRITICAL notification is sent out. See attached Hc_3150.jpg.

Here is one of the service definitions that exhibits this behaviour:

Code: Select all

define service {
        service_description             MSSQL Windows application log event ID 17063
        use                             xiwizard_ncpa_service
        hostgroup_name                  ACC SQL Server hosts,PRD SQL Server hosts
        display_name                    MSSQL event 17063
        servicegroups                   MS SQL Server services
        check_command                   check_xi_ncpa_agent!-t '<token>' -P 5693 -M 'plugins/Nagios_Plugin_eventfinder_Application_log.ps1/3221242535/Error'!!!!!!!
        max_check_attempts              1
        check_interval                  4
        retry_interval                  1
        check_period                    xi_timeperiod_24x7
        notification_options            c,
        contact_groups                  ISD SQL Server team
        register                        1
        }

Why are the CRITICAL service notifications sent out for this service check that times out? Is there a way to suppress it?

Edit: I know how to increase the timeout value and I know the check should be made quicker, but that's not the issue

Posted: **Fri Aug 11, 2017 3:02 pm**

You can add the following directive in the nagios.cfg file:

Code: Select all

service_check_timeout_state=u

in order for the critical state, caused by timeouts to change to unknown.

https://assets.nagios.com/downloads/nag ... gmain.html

Also, you could set up a timeout on the plugin itself using the -T option, which is lower than the default value of 60 sec in the main config file (service_check_timeout=60):

Example:

Code: Select all

check_command                   check_xi_ncpa_agent!-t '<token>' -P 5693 -T 50 -M 'plugins/Nagios_Plugin_eventfinder_Application_log.ps1/3221242535/Error'

Posted: **Sun Aug 13, 2017 3:30 am**

Hi,

Code: Select all

service_check_timeout_state=u

... does the trick.

Thanks. You may close the thread.

Nagios Support Forum

NCPA timeout on check results in CRITICAL alert

NCPA timeout on check results in CRITICAL alert

Re: NCPA timeout on check results in CRITICAL alert

Re: NCPA timeout on check results in CRITICAL alert