Hi Guys,
We are using the check_esx3.pl to monitor our host ESX.
[root@nagios: /usr/local/nagios/libexec]# ./check_esx3.pl -V
check_esx3.pl 0.2.1
We have set a timeout in order to avoid 'CRITICAL' exit with "(Service check timed out after 60.01 seconds)" but unfortenatly this doesn't have any effect.
Running the script on CLI:
[root@nagios: /usr/local/nagios/libexec]# time ./check_esx3.pl -H xx.xx.xx.xx -f ../var/check_cache/.esxicred5 -l runtime -t 40
ESX3 OK - 3/3 VMs up, overall status=green, connection state=connected, maintenance=no, 171 health issue(s), no config issues | vmcount=3units;;
real 1m14.092s
user 0m1.766s
sys 0m0.080s
Thanks in advance for your help
B.Regards
check_esx3.pl - timeout is being ignored
-
kyang
Re: check_esx3.pl - timeout is being ignored
Is it in the XI GUI that it's failing with service check timed out after 60.01 seconds?
Where did you set the timeout value? Can you provide a screenshot of this and of your check_command from the XI GUI?
Where did you set the timeout value? Can you provide a screenshot of this and of your check_command from the XI GUI?
Re: check_esx3.pl - timeout is being ignored
Hi,
My service configuration For some hosts ESX the check takes too much time to complete, this is the reason why i would like to setup a timeout less than the "service_check_timeout" in order to exit with an "UNKNOWN" instead of a "CRITICAL".
[root@nagios: ~]# grep service_check_timeout /usr/local/nagios/etc/nagios.cfg
service_check_timeout=60
The timeout is being ignored on nagios and from CLI.
Please have a look to the script check_esx3.pl. I tried to edit the script by adding the following rows without any success:
my $timeout = $np->opts->timeout;
$SIG{'ALRM'} = sub {
die "script timed out";
};
alarm $timeout;
B.Regards
My service configuration For some hosts ESX the check takes too much time to complete, this is the reason why i would like to setup a timeout less than the "service_check_timeout" in order to exit with an "UNKNOWN" instead of a "CRITICAL".
[root@nagios: ~]# grep service_check_timeout /usr/local/nagios/etc/nagios.cfg
service_check_timeout=60
The timeout is being ignored on nagios and from CLI.
Please have a look to the script check_esx3.pl. I tried to edit the script by adding the following rows without any success:
my $timeout = $np->opts->timeout;
$SIG{'ALRM'} = sub {
die "script timed out";
};
alarm $timeout;
B.Regards
You do not have the required permissions to view the files attached to this post.
Re: check_esx3.pl - timeout is being ignored
Thanks for sharing the specific version of check_esx3.pl; That's super helpful.
So what you're saying is check_esx3.pl does not seem to be respecting the timeout argument as demonstrated here:
Where the execution time (1m 14s) is way past the -t value of 40 seconds.
Interestingly, the 0.2.0 version of that script (included with the VMWare Config Wizard) doesn't seem to actually be doing anything with the timeout value. Neither does the 0.5.0 version.
The 0.7.1 version does, though:
https://github.com/shinken-monitoring/p ... ck_esx3.pl
Maybe give the 0.7.1 version a try and see if it does what you'd like? All it really does is leverage alarm after establishing the local variables. You could probably make that same change to your current plugin, but also create a subroutine that exits 3 as the alarm's handler to get "Unknown" for the status. I forget how Nagios Core interprets signals, but having your script exit gracefully is probably safer.
I would assert this is a bug in the VMWare Configuration Wizard.
So what you're saying is check_esx3.pl does not seem to be respecting the timeout argument as demonstrated here:
Code: Select all
[root@nagios: /usr/local/nagios/libexec]# time ./check_esx3.pl -H xx.xx.xx.xx -f ../var/check_cache/.esxicred5 -l runtime -t 40
ESX3 OK - 3/3 VMs up, overall status=green, connection state=connected, maintenance=no, 171 health issue(s), no config issues | vmcount=3units;;
real 1m14.092s
user 0m1.766s
sys 0m0.080sInterestingly, the 0.2.0 version of that script (included with the VMWare Config Wizard) doesn't seem to actually be doing anything with the timeout value. Neither does the 0.5.0 version.
The 0.7.1 version does, though:
https://github.com/shinken-monitoring/p ... ck_esx3.pl
Maybe give the 0.7.1 version a try and see if it does what you'd like? All it really does is leverage alarm after establishing the local variables. You could probably make that same change to your current plugin, but also create a subroutine that exits 3 as the alarm's handler to get "Unknown" for the status. I forget how Nagios Core interprets signals, but having your script exit gracefully is probably safer.
I would assert this is a bug in the VMWare Configuration Wizard.
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/