check_esx3.pl - timeout is being ignored

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
op-team
Posts: 50
Joined: Fri Jun 02, 2017 6:19 am

check_esx3.pl - timeout is being ignored

Post by op-team »

Hi Guys,

We are using the check_esx3.pl to monitor our host ESX.
[root@nagios: /usr/local/nagios/libexec]# ./check_esx3.pl -V
check_esx3.pl 0.2.1

We have set a timeout in order to avoid 'CRITICAL' exit with "(Service check timed out after 60.01 seconds)" but unfortenatly this doesn't have any effect.

Running the script on CLI:
[root@nagios: /usr/local/nagios/libexec]# time ./check_esx3.pl -H xx.xx.xx.xx -f ../var/check_cache/.esxicred5 -l runtime -t 40
ESX3 OK - 3/3 VMs up, overall status=green, connection state=connected, maintenance=no, 171 health issue(s), no config issues | vmcount=3units;;

real 1m14.092s
user 0m1.766s
sys 0m0.080s


Thanks in advance for your help

B.Regards
kyang

Re: check_esx3.pl - timeout is being ignored

Post by kyang »

Is it in the XI GUI that it's failing with service check timed out after 60.01 seconds?

Where did you set the timeout value? Can you provide a screenshot of this and of your check_command from the XI GUI?
op-team
Posts: 50
Joined: Fri Jun 02, 2017 6:19 am

Re: check_esx3.pl - timeout is being ignored

Post by op-team »

Hi,

My service configuration
check_esx3.pl_timeout.PNG
For some hosts ESX the check takes too much time to complete, this is the reason why i would like to setup a timeout less than the "service_check_timeout" in order to exit with an "UNKNOWN" instead of a "CRITICAL".

[root@nagios: ~]# grep service_check_timeout /usr/local/nagios/etc/nagios.cfg
service_check_timeout=60


The timeout is being ignored on nagios and from CLI.
Please have a look to the script check_esx3.pl. I tried to edit the script by adding the following rows without any success:

my $timeout = $np->opts->timeout;
$SIG{'ALRM'} = sub {
die "script timed out";
};
alarm $timeout;


B.Regards
You do not have the required permissions to view the files attached to this post.
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: check_esx3.pl - timeout is being ignored

Post by mcapra »

Thanks for sharing the specific version of check_esx3.pl; That's super helpful.

So what you're saying is check_esx3.pl does not seem to be respecting the timeout argument as demonstrated here:

Code: Select all

[root@nagios: /usr/local/nagios/libexec]# time ./check_esx3.pl -H xx.xx.xx.xx -f ../var/check_cache/.esxicred5 -l runtime -t 40
ESX3 OK - 3/3 VMs up, overall status=green, connection state=connected, maintenance=no, 171 health issue(s), no config issues | vmcount=3units;;

real 1m14.092s
user 0m1.766s
sys 0m0.080s
Where the execution time (1m 14s) is way past the -t value of 40 seconds.

Interestingly, the 0.2.0 version of that script (included with the VMWare Config Wizard) doesn't seem to actually be doing anything with the timeout value. Neither does the 0.5.0 version.

The 0.7.1 version does, though:
https://github.com/shinken-monitoring/p ... ck_esx3.pl

Maybe give the 0.7.1 version a try and see if it does what you'd like? All it really does is leverage alarm after establishing the local variables. You could probably make that same change to your current plugin, but also create a subroutine that exits 3 as the alarm's handler to get "Unknown" for the status. I forget how Nagios Core interprets signals, but having your script exit gracefully is probably safer.

I would assert this is a bug in the VMWare Configuration Wizard.
Former Nagios employee
https://www.mcapra.com/
kyang

Re: check_esx3.pl - timeout is being ignored

Post by kyang »

Thanks @mcapra!

Let us know if the newer plugin works.
Locked