Page 1 of 1

Service check timed out after 30.01 seconds

Posted: Fri May 20, 2016 1:49 am
by PFSit
Hi support team

I have check witch monitoring directory size, via NCPA agent and powershell script.

Code: Select all

define service {
        host_name                       TEST-Server
        service_description              OS Dir size c:test01:test
        display_name                    test01
        check_command                   check_ncpa_pfs_FolderSize!xxx!$USERxxx$!-Path 'c:\test01\test'!!!!!
        max_check_attempts              2
        check_interval                  120
        retry_interval                  2
        check_period                    24x7
        notification_period             24x7
        notifications_enabled           0
        register                        1
        }

define service {
        host_name                       TEST-Server
        service_description              OS Dir size c:test02:test
        display_name                    test02
        check_command                   check_ncpa_pfs_FolderSize!xxx!$USERxxx$!-Path 'c:\test02\test'!!!!!
        max_check_attempts              2
        check_interval                  120
        retry_interval                  2
        check_period                    24x7
        notification_period             24x7
        notifications_enabled           0
        register                        1
        }
Command

Code: Select all

define command {
       command_name                             check_ncpa_pfs_FolderSize
       command_line                             $USER1$/check_ncpa.py -H $HOSTADDRESS$ -P $ARG1$ -t $ARG2$ -T 140 -M "agent/plugin/check_ms_FolderSize.ps1" -a "$ARG3$"
}
Check output via CLI:

Code: Select all

[root@nagiosserver libexec]# ./check_ncpa.py -H x.x.x.x -P xxxx -t xxxx -M agent/plugin/check_ms_FolderSize.ps1 -a "-Path 'c:\test01\test'" -T 180
OK: Velikost adresare c:\test01\test je 300.6399 Gb. | 'Size'=300.6399GB;;; 'Files'=203533;;; 'Dirs'=38225;;; 'Time'=79.8469s;;;

[root@nagiosserver libexec]# ./check_ncpa.py -H x.x.x.x -P xxxx -t xxxx -M agent/plugin/check_ms_FolderSize.ps1 -a "-Path 'c:\test02\test'" -T 180          
OK: Velikost adresare c:\test02\test je 21.2187 Gb. | 'Size'=21.2187GB;;; 'Files'=179;;; 'Dirs'=8;;; 'Time'=0.2969s;;;
When plugin exceeded timeout, then output looks as:

Code: Select all

UNKNOWN: Execution exceeded timeout threshold of xxxs
But at nagios UI check for directory test01 (300Gb and execution time about 80s) return this :

Code: Select all

Critical (Service check timed out after 30.01 seconds)
Whitch check timed out? Where i can chenge it?

./check_ncpa.py --help
-T TIMEOUT, --timeout=TIMEOUT
Enforced timeout, will terminate plugins after this
amount of seconds. [15]

btw. I tried this via "Tools by Box293" and command Test and there is the same result as trough the CLI.

Michal

Re: Service check timed out after 30.01 seconds

Posted: Fri May 20, 2016 1:57 am
by Box293
This sounds like the global timeout in nagios.cfg

Configure > Core Configuration Manager
Advanced > Nagios Core Main Config
service_check_timeout=xxx
Adjust
Click Save and then Apply Configuration

Re: Service check timed out after 30.01 seconds

Posted: Fri May 20, 2016 3:12 am
by PFSit
Value:
service_check_timeout=30

It is good idea change global time out (for all check), when i set for this command -T 180?

Re: Service check timed out after 30.01 seconds

Posted: Fri May 20, 2016 10:48 am
by tgriep
The global timeout will always override the timeout settings for a plugin.
This is the maximum number of seconds that Nagios will allow service checks to run. If checks exceed this limit, they are killed and a CRITICAL state is returned. A timeout error will also be logged.

There is often widespread confusion as to what this option really does. It is meant to be used as a last ditch mechanism to kill off plugins which are misbehaving and not exiting in a timely manner. It should be set to something high (like 60 seconds or more), so that each service check normally finishes executing within this time limit. If a service check runs longer than this limit, Nagios will kill it off thinking it is a runaway processes.
If your check consistently takes longer to run that that setting, you will have to edit the global service timeout to be higher than the time it takes that check to run.