check_esx3-0.5.pl behavior

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

check_esx3-0.5.pl behavior

Post by gormank »

I have a number of ESXi hosts and run various checks using this script. My issues are that when a command times out, it issues a critical alert, which to me, it should be unknown. It was a timeout, not an actual issue.

The other trouble is that when the networking check times out, it returns critical, with critical in the message, but the text is "all 8 NICs are connected," which is misleading, since what it means is the check timed out.

/usr/local/nagios/libexec/check_esx3-0.5.pl -H "cocsm2mvesx002" -t 45 -f "/usr/local/nagiosxi/etc/components/vmware/vmware_auth.txt" -l NET

CHECK_ESX3-0.5.PL CRITICAL - all 8 NICs are connected

I can't force all criticals to warnings because that will hide real issues.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: check_esx3-0.5.pl behavior

Post by Box293 »

I know that the check_esx3.pl script has a few bugs. The developers have now called it check_vmware_api

I can be downloaded from here:

http://git.op5.org/p/system-addons/plug ... re_api.git

I think this is the actual URL of the plugin:

I believe you can just rename it to check_esx3.pl and it should slot right in.

Once you've done that, does it fix your NIC issue?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: check_esx3-0.5.pl behavior

Post by gormank »

My Nagios system had a birthday, so my subscriptions had expired.
While I eventually found check_vmware_api, it wasn't at that link. It is much newer though. The one I found was in a fork of check_vmware_api. I think I'll keep looking.

How would fixing the NIC issue fix the general misbehavior of the script when it times out?
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: check_esx3-0.5.pl behavior

Post by gormank »

Following this http://git.op5.org/p/system-addons/plug ... re_api.git leads to an empty list.
Using the download link https://exchange.nagios.org/directory/P ... pi/details leads to a not found message.
Google leads me to https://github.com/BaldMansMojo/check_vmware_esx, which is a fork of check_vmware_api.

I finally found this, which seems to be the original, no fork version.
http://git.op5.org/gitweb?p=system-addo ... ;a=summary

Ok, its running with the newer version and I'll wait to see what it says.

Thanks!
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: check_esx3-0.5.pl behavior

Post by lmiltchev »

Are you having a better luck with the "check_vmware_api.pl" plugin? What's the status on timeouts?
Be sure to check out our Knowledgebase for helpful articles and solutions!
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: check_esx3-0.5.pl behavior

Post by gormank »

I checked the log and for some reason today is a slow day for timeouts. That is there aren't that many. ~30 on the day (UTC), vs >100 yesterday. I see the timeouts as a network problem, but the return values are/were using an incorrect value.

On my twin site w/ the same Nagios config, the counts are zero.

I haven't seen anything since starting to use the new script. Have to wait a bit more, but it looks promising...

# grep "SERVICE ALERT" /usr/local/nagios/var/archives/nagios-03-16-2016-00.log | grep -v OK | grep -i esx | cut -c 1-132 | perl -pe 's/(\d+)/localtime($1)/e' | grep -v 127 | wc -l
111

# grep "SERVICE ALERT" /usr/local/nagios/var/nagios.log | grep -v OK | grep -i esx | cut -c 1-132 | perl -pe 's/(\d+)/localtime($1)/e' | grep -v 127 | wc -l
32
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: check_esx3-0.5.pl behavior

Post by lmiltchev »

Both plugins (check_esx3.pl & check_vmware_api.pl) work fine on my test box but I don't have any timeouts. :)
I haven't seen anything since starting to use the new script. Have to wait a bit more, but it looks promising...
Let us know how it goes. We will keep this thread open.
Be sure to check out our Knowledgebase for helpful articles and solutions!
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: check_esx3-0.5.pl behavior

Post by gormank »

The timeouts are due to a larger problem. Try to focus on the issue described.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: check_esx3-0.5.pl behavior

Post by tmcdonald »

The issue described is that the plugin, when timing out, is exiting with Critical instead of Unknown, which you suggest would be a better exit status for a timeout. Please correct me if I am wrong.

If this is correct, then from a Nagios standpoint you can take a look at this option in your nagios.cfg:

https://assets.nagios.com/downloads/nag ... eout_state

However, this will only apply to a service that hits the global timeout (defaults to 60 seconds, defined by the service_check_timeout parameter). If the plugin itself has timeout logic built in, then you will need to contact the original author of that plugin for a fix.
Former Nagios employee
Locked