Trying to use check_vmware_api.pl

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
damindd
Posts: 9
Joined: Tue Mar 28, 2017 1:13 pm

Trying to use check_vmware_api.pl

Post by damindd »

I am attempting to use the check_vmware_api.pl plugin with the storagehealth runtime suboption. Here is the command I am running (on the nagiosxi server cli):

./check_vmware_api.pl -H esx2a.dc.pud -f nagioscheck.txt -l runtime -s storagehealth

This gives the following output:

[root@nagiosxi libexec]# ./check_vmware_api.pl -H esx2a.dc.pud -f nagioscheck.txt -l runtime -s storagehealth
CHECK_VMWARE_API.PL UNKNOWN - 204 health issue(s) found:
UNKNOWN: Status of Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 1 #4: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit #30: Cannot report on the current status of the physical element
UNKNOWN: Status of Proc 2 Level-3 Cache: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel 3 Thermal Control #21: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation C610/X99 series chipset PCI Express Root Port #8 #28: Cannot report on the current status of the physical element
.
.
.
.
OK: Status of Disk 5 on HPSA1 : Port 1I Box 1 Bay 5 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 12 on HPSA1 : Port 1I Box 1 Bay 18 : 3576GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 7 on HPSA1 : Port 1I Box 1 Bay 7 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 2 on HPSA1 : Port 1I Box 1 Bay 2 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Memory: Physical element is functioning as expected
OK: Status of Disk 10 on HPSA1 : Port 1I Box 1 Bay 10 : 1490GB : Data Disk : OK: Physical element is functioning as expected | Alerts=204;;
[root@nagiosxi libexec]#

I have been trying without any success to filter out the UNKNOWN and OK status lines using the blacklist option. Does anyone have any experience with this plugin?

Here is the help output for the Host runtime section:
* runtime - shows runtime info
+ con - connection state
+ health - checks cpu/storage/memory/sensor status and propagates worst state
o listitems - list all available sensors(use for listing purpose only)
o blackregexpflag - whether to treat blacklist as regexp
b - blacklist status objects
+ storagehealth - storage status check
o blackregexpflag - whether to treat blacklist as regexp
b - blacklist status objects

Thanks.
Damin
damindd
Posts: 9
Joined: Tue Mar 28, 2017 1:13 pm

Re: Trying to use check_vmware_api.pl

Post by damindd »

I have found that I can add "| grep -v UNKNOWN" as an $ARG$ to filter out the results for the UNKNOWN hardware. Is grepping an acceptable way to filter results?
User avatar
mbellerue
Posts: 1403
Joined: Fri Jul 12, 2019 11:10 am

Re: Trying to use check_vmware_api.pl

Post by mbellerue »

It would have to be a little more complicated than just a grep -v. Nagios bases its alerts off of the exit code of the plugin. If you added the | grep -v UNKNOWN, first you would lose valid UNKNOWN responses, and second the exit code would be that of the grep command, which will basically always be 0, or OK as far as Nagios is concerned.

We'll have to dig into why the plugin is returning that information. Is it possible that it's an issue with the login's permissions on the ESXi machine? Is that just a typical, low permission service account that you're using? Can you use an administrative account just to see if it changes the output?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
damindd
Posts: 9
Joined: Tue Mar 28, 2017 1:13 pm

Re: Trying to use check_vmware_api.pl

Post by damindd »

I have made some progress on my monitoring. It seems that when I specify the host with the -l vmfs it is very slow to update the values.

./check_vmware_api.pl -H vmhost -f creds.txt -l vmfs -o used (does not update values quickly, hours/days)

However, if targeting the vCenter server with -l vmfs volume name -o used the values update very quickly. (I am only checking the values every 180 minutes)

./check_vmware_api.pl -D vcenter-f vcentercreds.txt -l vmfs -s VolumeName -o used -w 85% -c 90% (These values update quickly)

So, I have created a service with the above check for each volume in our infrastructure. I have only needed readonly permission on the vcenter server.

Still working on the storagehealth suboption. Thank you for the info about using grep.

./check_vmware_api.pl -H host -f nagioscheck.txt -l runtime -s storagehealth

Thank you.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Trying to use check_vmware_api.pl

Post by ssax »

Code: Select all

-o blacklistregexp -x 'skip1|skip2|skip3|etc'
It's likely slower when querying the host because the statistics are stored in vCenter, querying the host has to call back to vCenter DB for statistics as far as I'm aware. Querying vCenter directly removes that extra request back so it should be faster. Always query vCenter if possible.
damindd
Posts: 9
Joined: Tue Mar 28, 2017 1:13 pm

Re: Trying to use check_vmware_api.pl

Post by damindd »

I tried running this from the command line. The blacklist is still not working for me. Does the syntax I used look correct?

./check_vmware_api.pl -H esx1a.dc.pud -f /usr/local/nagiosxi/etc/components/vmware/nagioscheck.txt -l runtime -s storagehealth -o blacklistregexp -x 'UNKNOWN:'

Results
......
UNKNOWN: Status of Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Channel 2/3 Broadcast #22: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 4 #4: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation C610/X99 series chipset PCI Express Root Port #5 #28: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 3 #3: Cannot report on the current status of the physical element
UNKNOWN: Status of Proc 1 Level-3 Cache: Cannot report on the current status of the physical element
UNKNOWN: Status of Battery 40.1: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Global Broadcast #22: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D IIO Hot Plug #5: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 2 #4: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation Ethernet Controller X710 for 10GbE SFP+ #0: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Home Agent 0 Debug #18: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 0 #4: Cannot report on the current status of the physical element
UNKNOWN: Status of Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 7 #4: Cannot report on the current status of the physical element
OK: Status of Disk 8 on HPSA1 : Port 1I Box 1 Bay 8 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 12 on HPSA1 : Port 1I Box 1 Bay 26 : 111GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 11 on HPSA1 : Port 1I Box 1 Bay 25 : 111GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 6 on HPSA1 : Port 1I Box 1 Bay 6 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 5 on HPSA1 : Port 1I Box 1 Bay 5 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Logical Volume 1 on HPSA1 : RAID 1 : 111GB : Disk 11,12 : Ok: Physical element is functioning as expected
OK: Status of Disk 4 on HPSA1 : Port 1I Box 1 Bay 4 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Logical Volume 2 on HPSA1 : RAID 6 : 11923GB : Disk 1,2,3,4,5,6,7,8,9,10 : Ok: Physical element is functioning as expected
OK: Status of Disk 2 on HPSA1 : Port 1I Box 1 Bay 2 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 7 on HPSA1 : Port 1I Box 1 Bay 7 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Memory: Physical element is functioning as expected
OK: Status of Disk 1 on HPSA1 : Port 1I Box 1 Bay 1 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 3 on HPSA1 : Port 1I Box 1 Bay 3 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 10 on HPSA1 : Port 1I Box 1 Bay 10 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 9 on HPSA1 : Port 1I Box 1 Bay 9 : 1490GB : Data Disk : OK: Physical element is functioning as expected | Alerts=200;;
damindd
Posts: 9
Joined: Tue Mar 28, 2017 1:13 pm

Re: Trying to use check_vmware_api.pl

Post by damindd »

I discovered what I was doing incorrectly:

This works:

-x singleitem (no single quotes)

and this works

-x 'item1|item2' (with single quotes)

This does not work:

-x UNKNOWN: (presumably because UNKNOWN: is the status, not the status message)

I was able to get it to work with:

[root@nagiosxi libexec]# ./check_vmware_api.pl -H esx1a.dc.pud -f user.txt -l runtime -s storagehealth -o blacklistregexp -x 'Intel|Proc|iLO|Smart|Battery'
CHECK_VMWARE_API.PL OK - All 17 Storage health checks are GREEN:
OK: Status of Disk 8 on HPSA1 : Port 1I Box 1 Bay 8 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 12 on HPSA1 : Port 1I Box 1 Bay 26 : 111GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 11 on HPSA1 : Port 1I Box 1 Bay 25 : 111GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 6 on HPSA1 : Port 1I Box 1 Bay 6 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 5 on HPSA1 : Port 1I Box 1 Bay 5 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Logical Volume 1 on HPSA1 : RAID 1 : 111GB : Disk 11,12 : Ok: Physical element is functioning as expected
OK: Status of Disk 4 on HPSA1 : Port 1I Box 1 Bay 4 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Logical Volume 2 on HPSA1 : RAID 6 : 11923GB : Disk 1,2,3,4,5,6,7,8,9,10 : Ok: Physical element is functioning as expected
OK: Status of Disk 2 on HPSA1 : Port 1I Box 1 Bay 2 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 7 on HPSA1 : Port 1I Box 1 Bay 7 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Memory: Physical element is functioning as expected
OK: Status of Disk 1 on HPSA1 : Port 1I Box 1 Bay 1 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 3 on HPSA1 : Port 1I Box 1 Bay 3 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 10 on HPSA1 : Port 1I Box 1 Bay 10 : 1490GB : Data Disk : OK: Physical element is functioning as expected
OK: Status of Disk 9 on HPSA1 : Port 1I Box 1 Bay 9 : 1490GB : Data Disk : OK: Physical element is functioning as expected | Alerts=0;;

Thank you for the help.

Damin
User avatar
mbellerue
Posts: 1403
Joined: Fri Jul 12, 2019 11:10 am

Re: Trying to use check_vmware_api.pl

Post by mbellerue »

Thanks for posting that solution!
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Trying to use check_vmware_api.pl

Post by tgriep »

That is correct, the Unknown in the output is the status the plugin gives to the devices so adding a the name of a device is needed for that option.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked