Page 1 of 1

Trying to use check_vmware_api.pl

Posted: Thu Oct 31, 2019 5:05 pm
by damindd
I am attempting to use the check_vmware_api.pl plugin with the storagehealth runtime suboption. Here is the command I am running (on the nagiosxi server cli):

./check_vmware_api.pl -H esx2a.dc.pud -f nagioscheck.txt -l runtime -s storagehealth

This gives the following output:

[root@nagiosxi libexec]# ./check_vmware_api.pl -H esx2a.dc.pud -f nagioscheck.txt -l runtime -s storagehealth
CHECK_VMWARE_API.PL UNKNOWN - 204 health issue(s) found:
UNKNOWN: Status of Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 1 #4: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit #30: Cannot report on the current status of the physical element
UNKNOWN: Status of Proc 2 Level-3 Cache: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel 3 Thermal Control #21: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation C610/X99 series chipset PCI Express Root Port #8 #28: Cannot report on the current status of the physical element
.
.
.
.
OK: Status of Disk 5 on HPSA1 : Port 1I Box 1 Bay 5 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 12 on HPSA1 : Port 1I Box 1 Bay 18 : 3576GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 7 on HPSA1 : Port 1I Box 1 Bay 7 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 2 on HPSA1 : Port 1I Box 1 Bay 2 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Memory: Physical element is functioning as expected
OK: Status of Disk 10 on HPSA1 : Port 1I Box 1 Bay 10 : 1490GB : Data Disk : OK: Physical element is functioning as expected | Alerts=204;;
[root@nagiosxi libexec]#

I have been trying without any success to filter out the UNKNOWN and OK status lines using the blacklist option. Does anyone have any experience with this plugin?

Here is the help output for the Host runtime section:
* runtime - shows runtime info
+ con - connection state
+ health - checks cpu/storage/memory/sensor status and propagates worst state
o listitems - list all available sensors(use for listing purpose only)
o blackregexpflag - whether to treat blacklist as regexp
b - blacklist status objects
+ storagehealth - storage status check
o blackregexpflag - whether to treat blacklist as regexp
b - blacklist status objects

Thanks.
Damin

Re: Trying to use check_vmware_api.pl

Posted: Fri Nov 01, 2019 11:24 am
by damindd
I have found that I can add "| grep -v UNKNOWN" as an $ARG$ to filter out the results for the UNKNOWN hardware. Is grepping an acceptable way to filter results?

Re: Trying to use check_vmware_api.pl

Posted: Fri Nov 01, 2019 11:55 am
by mbellerue
It would have to be a little more complicated than just a grep -v. Nagios bases its alerts off of the exit code of the plugin. If you added the | grep -v UNKNOWN, first you would lose valid UNKNOWN responses, and second the exit code would be that of the grep command, which will basically always be 0, or OK as far as Nagios is concerned.

We'll have to dig into why the plugin is returning that information. Is it possible that it's an issue with the login's permissions on the ESXi machine? Is that just a typical, low permission service account that you're using? Can you use an administrative account just to see if it changes the output?

Re: Trying to use check_vmware_api.pl

Posted: Fri Nov 01, 2019 4:24 pm
by damindd
I have made some progress on my monitoring. It seems that when I specify the host with the -l vmfs it is very slow to update the values.

./check_vmware_api.pl -H vmhost -f creds.txt -l vmfs -o used (does not update values quickly, hours/days)

However, if targeting the vCenter server with -l vmfs volume name -o used the values update very quickly. (I am only checking the values every 180 minutes)

./check_vmware_api.pl -D vcenter-f vcentercreds.txt -l vmfs -s VolumeName -o used -w 85% -c 90% (These values update quickly)

So, I have created a service with the above check for each volume in our infrastructure. I have only needed readonly permission on the vcenter server.

Still working on the storagehealth suboption. Thank you for the info about using grep.

./check_vmware_api.pl -H host -f nagioscheck.txt -l runtime -s storagehealth

Thank you.

Re: Trying to use check_vmware_api.pl

Posted: Mon Nov 04, 2019 3:22 pm
by ssax

Code: Select all

-o blacklistregexp -x 'skip1|skip2|skip3|etc'
It's likely slower when querying the host because the statistics are stored in vCenter, querying the host has to call back to vCenter DB for statistics as far as I'm aware. Querying vCenter directly removes that extra request back so it should be faster. Always query vCenter if possible.

Re: Trying to use check_vmware_api.pl

Posted: Wed Nov 13, 2019 12:24 pm
by damindd
I tried running this from the command line. The blacklist is still not working for me. Does the syntax I used look correct?

./check_vmware_api.pl -H esx1a.dc.pud -f /usr/local/nagiosxi/etc/components/vmware/nagioscheck.txt -l runtime -s storagehealth -o blacklistregexp -x 'UNKNOWN:'

Results
......
UNKNOWN: Status of Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Channel 2/3 Broadcast #22: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 4 #4: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation C610/X99 series chipset PCI Express Root Port #5 #28: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 3 #3: Cannot report on the current status of the physical element
UNKNOWN: Status of Proc 1 Level-3 Cache: Cannot report on the current status of the physical element
UNKNOWN: Status of Battery 40.1: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Global Broadcast #22: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D IIO Hot Plug #5: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 2 #4: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation Ethernet Controller X710 for 10GbE SFP+ #0: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Home Agent 0 Debug #18: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 0 #4: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 7 #4: Cannot report on the current status of the physical element
OK: Status of Disk 8 on HPSA1 : Port 1I Box 1 Bay 8 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 12 on HPSA1 : Port 1I Box 1 Bay 26 : 111GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 11 on HPSA1 : Port 1I Box 1 Bay 25 : 111GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 6 on HPSA1 : Port 1I Box 1 Bay 6 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 5 on HPSA1 : Port 1I Box 1 Bay 5 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Logical Volume 1 on HPSA1 : RAID 1 : 111GB : Disk 11,12 : Ok: Physical element is functioning as expected
OK: Status of Disk 4 on HPSA1 : Port 1I Box 1 Bay 4 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Logical Volume 2 on HPSA1 : RAID 6 : 11923GB : Disk 1,2,3,4,5,6,7,8,9,10 : Ok: Physical element is functioning as expected
OK: Status of Disk 2 on HPSA1 : Port 1I Box 1 Bay 2 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 7 on HPSA1 : Port 1I Box 1 Bay 7 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Memory: Physical element is functioning as expected
OK: Status of Disk 1 on HPSA1 : Port 1I Box 1 Bay 1 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 3 on HPSA1 : Port 1I Box 1 Bay 3 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 10 on HPSA1 : Port 1I Box 1 Bay 10 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 9 on HPSA1 : Port 1I Box 1 Bay 9 : 1490GB : Data Disk : OK: Physical element is functioning as expected | Alerts=200;;

Re: Trying to use check_vmware_api.pl

Posted: Wed Nov 13, 2019 1:54 pm
by damindd
I discovered what I was doing incorrectly:

This works:

-x singleitem (no single quotes)

and this works

-x 'item1|item2' (with single quotes)

This does not work:

-x UNKNOWN: (presumably because UNKNOWN: is the status, not the status message)

I was able to get it to work with:

[root@nagiosxi libexec]# ./check_vmware_api.pl -H esx1a.dc.pud -f user.txt -l runtime -s storagehealth -o blacklistregexp -x 'Intel|Proc|iLO|Smart|Battery'
CHECK_VMWARE_API.PL OK - All 17 Storage health checks are GREEN:
OK: Status of Disk 8 on HPSA1 : Port 1I Box 1 Bay 8 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 12 on HPSA1 : Port 1I Box 1 Bay 26 : 111GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 11 on HPSA1 : Port 1I Box 1 Bay 25 : 111GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 6 on HPSA1 : Port 1I Box 1 Bay 6 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 5 on HPSA1 : Port 1I Box 1 Bay 5 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Logical Volume 1 on HPSA1 : RAID 1 : 111GB : Disk 11,12 : Ok: Physical element is functioning as expected
OK: Status of Disk 4 on HPSA1 : Port 1I Box 1 Bay 4 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Logical Volume 2 on HPSA1 : RAID 6 : 11923GB : Disk 1,2,3,4,5,6,7,8,9,10 : Ok: Physical element is functioning as expected
OK: Status of Disk 2 on HPSA1 : Port 1I Box 1 Bay 2 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 7 on HPSA1 : Port 1I Box 1 Bay 7 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Memory: Physical element is functioning as expected
OK: Status of Disk 1 on HPSA1 : Port 1I Box 1 Bay 1 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 3 on HPSA1 : Port 1I Box 1 Bay 3 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 10 on HPSA1 : Port 1I Box 1 Bay 10 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 9 on HPSA1 : Port 1I Box 1 Bay 9 : 1490GB : Data Disk : OK: Physical element is functioning as expected | Alerts=0;;

Thank you for the help.

Damin

Re: Trying to use check_vmware_api.pl

Posted: Wed Nov 13, 2019 3:36 pm
by mbellerue
Thanks for posting that solution!

Re: Trying to use check_vmware_api.pl

Posted: Wed Nov 13, 2019 3:39 pm
by tgriep
That is correct, the Unknown in the output is the status the plugin gives to the devices so adding a the name of a device is needed for that option.