check_esx3.pl - timeout is being ignored

op-team · Post by **op-team** » Tue Feb 27, 2018 8:29 am

Hi Guys,

We are using the check_esx3.pl to monitor our host ESX.
[root@nagios: /usr/local/nagios/libexec]# ./check_esx3.pl -V
check_esx3.pl 0.2.1

We have set a timeout in order to avoid 'CRITICAL' exit with "(Service check timed out after 60.01 seconds)" but unfortenatly this doesn't have any effect.

Running the script on CLI:
[root@nagios: /usr/local/nagios/libexec]# time ./check_esx3.pl -H xx.xx.xx.xx -f ../var/check_cache/.esxicred5 -l runtime -t 40
ESX3 OK - 3/3 VMs up, overall status=green, connection state=connected, maintenance=no, 171 health issue(s), no config issues | vmcount=3units;;

real 1m14.092s
user 0m1.766s
sys 0m0.080s

Thanks in advance for your help

B.Regards

kyang · Post by **kyang** » Tue Feb 27, 2018 11:39 am

Is it in the XI GUI that it's failing with service check timed out after 60.01 seconds?

Where did you set the timeout value? Can you provide a screenshot of this and of your check_command from the XI GUI?

op-team · Post by **op-team** » Wed Feb 28, 2018 5:21 am

Hi,

My service configuration

check_esx3.pl_timeout.PNG

For some hosts ESX the check takes too much time to complete, this is the reason why i would like to setup a timeout less than the "service_check_timeout" in order to exit with an "UNKNOWN" instead of a "CRITICAL".

[root@nagios: ~]# grep service_check_timeout /usr/local/nagios/etc/nagios.cfg
service_check_timeout=60

The timeout is being ignored on nagios and from CLI.
Please have a look to the script check_esx3.pl. I tried to edit the script by adding the following rows without any success:

my $timeout = $np->opts->timeout;
$SIG{'ALRM'} = sub {
die "script timed out";
};
alarm $timeout;

B.Regards

Post by **mcapra** » Wed Feb 28, 2018 10:19 am

Thanks for sharing the specific version of check_esx3.pl; That's super helpful.

So what you're saying is check_esx3.pl does not seem to be respecting the timeout argument as demonstrated here:

Code: Select all

[root@nagios: /usr/local/nagios/libexec]# time ./check_esx3.pl -H xx.xx.xx.xx -f ../var/check_cache/.esxicred5 -l runtime -t 40
ESX3 OK - 3/3 VMs up, overall status=green, connection state=connected, maintenance=no, 171 health issue(s), no config issues | vmcount=3units;;

real 1m14.092s
user 0m1.766s
sys 0m0.080s

Where the execution time (1m 14s) is way past the -t value of 40 seconds.

Interestingly, the 0.2.0 version of that script (included with the VMWare Config Wizard) doesn't seem to actually be doing anything with the timeout value. Neither does the 0.5.0 version.

The 0.7.1 version does, though:
https://github.com/shinken-monitoring/p ... ck_esx3.pl

Maybe give the 0.7.1 version a try and see if it does what you'd like? All it really does is leverage alarm after establishing the local variables. You could probably make that same change to your current plugin, but also create a subroutine that exits 3 as the alarm's handler to get "Unknown" for the status. I forget how Nagios Core interprets signals, but having your script exit gracefully is probably safer.

I would assert this is a bug in the VMWare Configuration Wizard.

kyang · Post by **kyang** » Wed Feb 28, 2018 1:47 pm

Thanks @mcapra!

Let us know if the newer plugin works.

Nagios Support Forum

check_esx3.pl - timeout is being ignored

check_esx3.pl - timeout is being ignored

Re: check_esx3.pl - timeout is being ignored

Re: check_esx3.pl - timeout is being ignored

Re: check_esx3.pl - timeout is being ignored

Re: check_esx3.pl - timeout is being ignored