I have a number of ESXi hosts and run various checks using this script. My issues are that when a command times out, it issues a critical alert, which to me, it should be unknown. It was a timeout, not an actual issue.
The other trouble is that when the networking check times out, it returns critical, with critical in the message, but the text is "all 8 NICs are connected," which is misleading, since what it means is the check timed out.
/usr/local/nagios/libexec/check_esx3-0.5.pl -H "cocsm2mvesx002" -t 45 -f "/usr/local/nagiosxi/etc/components/vmware/vmware_auth.txt" -l NET
CHECK_ESX3-0.5.PL CRITICAL - all 8 NICs are connected
I can't force all criticals to warnings because that will hide real issues.
check_esx3-0.5.pl behavior
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: check_esx3-0.5.pl behavior
I know that the check_esx3.pl script has a few bugs. The developers have now called it check_vmware_api
I can be downloaded from here:
http://git.op5.org/p/system-addons/plug ... re_api.git
I think this is the actual URL of the plugin:
I believe you can just rename it to check_esx3.pl and it should slot right in.
Once you've done that, does it fix your NIC issue?
I can be downloaded from here:
http://git.op5.org/p/system-addons/plug ... re_api.git
I think this is the actual URL of the plugin:
I believe you can just rename it to check_esx3.pl and it should slot right in.
Once you've done that, does it fix your NIC issue?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: check_esx3-0.5.pl behavior
My Nagios system had a birthday, so my subscriptions had expired.
While I eventually found check_vmware_api, it wasn't at that link. It is much newer though. The one I found was in a fork of check_vmware_api. I think I'll keep looking.
How would fixing the NIC issue fix the general misbehavior of the script when it times out?
While I eventually found check_vmware_api, it wasn't at that link. It is much newer though. The one I found was in a fork of check_vmware_api. I think I'll keep looking.
How would fixing the NIC issue fix the general misbehavior of the script when it times out?
Re: check_esx3-0.5.pl behavior
Following this http://git.op5.org/p/system-addons/plug ... re_api.git leads to an empty list.
Using the download link https://exchange.nagios.org/directory/P ... pi/details leads to a not found message.
Google leads me to https://github.com/BaldMansMojo/check_vmware_esx, which is a fork of check_vmware_api.
I finally found this, which seems to be the original, no fork version.
http://git.op5.org/gitweb?p=system-addo ... ;a=summary
Ok, its running with the newer version and I'll wait to see what it says.
Thanks!
Using the download link https://exchange.nagios.org/directory/P ... pi/details leads to a not found message.
Google leads me to https://github.com/BaldMansMojo/check_vmware_esx, which is a fork of check_vmware_api.
I finally found this, which seems to be the original, no fork version.
http://git.op5.org/gitweb?p=system-addo ... ;a=summary
Ok, its running with the newer version and I'll wait to see what it says.
Thanks!
Re: check_esx3-0.5.pl behavior
Are you having a better luck with the "check_vmware_api.pl" plugin? What's the status on timeouts?
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: check_esx3-0.5.pl behavior
I checked the log and for some reason today is a slow day for timeouts. That is there aren't that many. ~30 on the day (UTC), vs >100 yesterday. I see the timeouts as a network problem, but the return values are/were using an incorrect value.
On my twin site w/ the same Nagios config, the counts are zero.
I haven't seen anything since starting to use the new script. Have to wait a bit more, but it looks promising...
# grep "SERVICE ALERT" /usr/local/nagios/var/archives/nagios-03-16-2016-00.log | grep -v OK | grep -i esx | cut -c 1-132 | perl -pe 's/(\d+)/localtime($1)/e' | grep -v 127 | wc -l
111
# grep "SERVICE ALERT" /usr/local/nagios/var/nagios.log | grep -v OK | grep -i esx | cut -c 1-132 | perl -pe 's/(\d+)/localtime($1)/e' | grep -v 127 | wc -l
32
On my twin site w/ the same Nagios config, the counts are zero.
I haven't seen anything since starting to use the new script. Have to wait a bit more, but it looks promising...
# grep "SERVICE ALERT" /usr/local/nagios/var/archives/nagios-03-16-2016-00.log | grep -v OK | grep -i esx | cut -c 1-132 | perl -pe 's/(\d+)/localtime($1)/e' | grep -v 127 | wc -l
111
# grep "SERVICE ALERT" /usr/local/nagios/var/nagios.log | grep -v OK | grep -i esx | cut -c 1-132 | perl -pe 's/(\d+)/localtime($1)/e' | grep -v 127 | wc -l
32
Re: check_esx3-0.5.pl behavior
Both plugins (check_esx3.pl & check_vmware_api.pl) work fine on my test box but I don't have any timeouts.
Let us know how it goes. We will keep this thread open.I haven't seen anything since starting to use the new script. Have to wait a bit more, but it looks promising...
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: check_esx3-0.5.pl behavior
The timeouts are due to a larger problem. Try to focus on the issue described.
Re: check_esx3-0.5.pl behavior
The issue described is that the plugin, when timing out, is exiting with Critical instead of Unknown, which you suggest would be a better exit status for a timeout. Please correct me if I am wrong.
If this is correct, then from a Nagios standpoint you can take a look at this option in your nagios.cfg:
https://assets.nagios.com/downloads/nag ... eout_state
However, this will only apply to a service that hits the global timeout (defaults to 60 seconds, defined by the service_check_timeout parameter). If the plugin itself has timeout logic built in, then you will need to contact the original author of that plugin for a fix.
If this is correct, then from a Nagios standpoint you can take a look at this option in your nagios.cfg:
https://assets.nagios.com/downloads/nag ... eout_state
However, this will only apply to a service that hits the global timeout (defaults to 60 seconds, defined by the service_check_timeout parameter). If the plugin itself has timeout logic built in, then you will need to contact the original author of that plugin for a fix.
Former Nagios employee