Page 1 of 1

check_esx3.pl - timeout is being ignored

Posted: Tue Feb 27, 2018 8:29 am
by op-team
Hi Guys,

We are using the check_esx3.pl to monitor our host ESX.
[root@nagios: /usr/local/nagios/libexec]# ./check_esx3.pl -V
check_esx3.pl 0.2.1

We have set a timeout in order to avoid 'CRITICAL' exit with "(Service check timed out after 60.01 seconds)" but unfortenatly this doesn't have any effect.

Running the script on CLI:
[root@nagios: /usr/local/nagios/libexec]# time ./check_esx3.pl -H xx.xx.xx.xx -f ../var/check_cache/.esxicred5 -l runtime -t 40
ESX3 OK - 3/3 VMs up, overall status=green, connection state=connected, maintenance=no, 171 health issue(s), no config issues | vmcount=3units;;

real 1m14.092s
user 0m1.766s
sys 0m0.080s


Thanks in advance for your help

B.Regards

Re: check_esx3.pl - timeout is being ignored

Posted: Tue Feb 27, 2018 11:39 am
by kyang
Is it in the XI GUI that it's failing with service check timed out after 60.01 seconds?

Where did you set the timeout value? Can you provide a screenshot of this and of your check_command from the XI GUI?

Re: check_esx3.pl - timeout is being ignored

Posted: Wed Feb 28, 2018 5:21 am
by op-team
Hi,

My service configuration
check_esx3.pl_timeout.PNG
For some hosts ESX the check takes too much time to complete, this is the reason why i would like to setup a timeout less than the "service_check_timeout" in order to exit with an "UNKNOWN" instead of a "CRITICAL".

[root@nagios: ~]# grep service_check_timeout /usr/local/nagios/etc/nagios.cfg
service_check_timeout=60


The timeout is being ignored on nagios and from CLI.
Please have a look to the script check_esx3.pl. I tried to edit the script by adding the following rows without any success:

my $timeout = $np->opts->timeout;
$SIG{'ALRM'} = sub {
die "script timed out";
};
alarm $timeout;


B.Regards

Re: check_esx3.pl - timeout is being ignored

Posted: Wed Feb 28, 2018 10:19 am
by mcapra
Thanks for sharing the specific version of check_esx3.pl; That's super helpful.

So what you're saying is check_esx3.pl does not seem to be respecting the timeout argument as demonstrated here:

Code: Select all

[root@nagios: /usr/local/nagios/libexec]# time ./check_esx3.pl -H xx.xx.xx.xx -f ../var/check_cache/.esxicred5 -l runtime -t 40
ESX3 OK - 3/3 VMs up, overall status=green, connection state=connected, maintenance=no, 171 health issue(s), no config issues | vmcount=3units;;

real 1m14.092s
user 0m1.766s
sys 0m0.080s
Where the execution time (1m 14s) is way past the -t value of 40 seconds.

Interestingly, the 0.2.0 version of that script (included with the VMWare Config Wizard) doesn't seem to actually be doing anything with the timeout value. Neither does the 0.5.0 version.

The 0.7.1 version does, though:
https://github.com/shinken-monitoring/p ... ck_esx3.pl

Maybe give the 0.7.1 version a try and see if it does what you'd like? All it really does is leverage alarm after establishing the local variables. You could probably make that same change to your current plugin, but also create a subroutine that exits 3 as the alarm's handler to get "Unknown" for the status. I forget how Nagios Core interprets signals, but having your script exit gracefully is probably safer.

I would assert this is a bug in the VMWare Configuration Wizard.

Re: check_esx3.pl - timeout is being ignored

Posted: Wed Feb 28, 2018 1:47 pm
by kyang
Thanks @mcapra!

Let us know if the newer plugin works.