Page 1 of 1

CHECK_VMWARE_API.PL CRITICAL - health issue(s) found

Posted: Mon Oct 22, 2018 2:43 am
by rohithroki
Hi Team,

we have recently upgrade the Esxi version 5.x to 6.5 for some of the hosts.

As we upgraded the vmware esxi check plugin to the latest version https://github.com/op5/check_vmware_api

we are facing the issue while monitor the Esxi runtime health check getting the below alerts.

CHECK_VMWARE_API.PL CRITICAL - 60 health issue(s) found in 219 checks:
1) UNKNOWN[systemBoard] Status of System Board 1 Power Optimized 0 --- 0.7.1.118: Cannot report on the current health state of the element
2) UNKNOWN[memory] Status of BIOS 1 Mem Fatal NB CRC 65 --- 0.34.1.34: Cannot report on the current health state of the element
3) UNKNOWN[memory] Status of BIOS 1 Mem Fatal SB CRC 65 --- 0.34.1.33: Cannot report on the current health state of the element
4) UNKNOWN[memory] Status of BIOS 1 iDPT Mem Fail 65 --- 0.34.1.43: Cannot report on the current health state of the element
5) UNKNOWN[memory] Status of BIOS 1 Mem Overtemp 65 --- 0.34.1.32: Cannot report on the current health state of the element
6) UNKNOWN[memory] Status of BIOS 1 USB Over-current 65 --- 0.34.1.29: Cannot report on the current health state of the element
7) UNKNOWN[memory] Status of BIOS 1 Mem CRC Err 65 --- 0.34.1.28: Cannot report on the current health state of the element
8) UNKNOWN[memory] Status of BIOS 1 Mem ECC Warning 65 --- 0.34.1.27: Cannot report on the current health state of the element
9) UNKNOWN[memory] Status of BIOS 1 Mem Redun Gain 65 --- 0.34.1.23: Cannot report on the current health state of the element
10) UNKNOWN[memory] Status of BIOS 1 Memory Cfg Err 65 --- 0.34.1.22: Cannot report on the current health state of the element
11) UNKNOWN[memory] Status of BIOS 1 Memory Removed 65 --- 0.34.1.21: Cannot report on the current health state of the element
12) UNKNOWN[memory] Status of BIOS 1 Memory Added 65 --- 0.34.1.20: Cannot report on the current health state of the element
13) UNKNOWN[memory] Status of BIOS 1 Memory RAID 65 --- 0.34.1.19: Cannot report on the current health state of the element
14) UNKNOWN[memory] Status of BIOS 1 Memory Mirrored 65 --- 0.34.1.18: Cannot report on the current health state of the element
15) UNKNOWN[memory] Status of BIOS 1 Memory Spared 65 --- 0.34.1.17: Cannot report on the current health state of the element
16) UNKNOWN[memory] Status of BIOS 1 ECC Uncorr Err 65 --- 0.34.1.2: Cannot report on the current health state of the element
17) UNKNOWN[memory] Status of BIOS 1 ECC Corr Err 65 --- 0.34.1.1: Cannot report on the current health state of the element
18) UNKNOWN[watchdog] Status of BIOS 1 OS Watchdog Time 65 --- 0.34.1.113: Cannot report on the current health state of the element
19) UNKNOWN[bios] Status of BIOS 1 NonFatalPCIExpEr 65 --- 0.34.1.64: Cannot report on the current health state of the element
20) UNKNOWN[bios] Status of BIOS 1 FatalPCIExpEr 65 --- 0.34.1.63: Cannot report on the current health state of the element
21) UNKNOWN[bios] Status of BIOS 1 NonFatalPCIErARI 65 --- 0.34.1.62: Cannot report on the current health state of the element
22) UNKNOWN[bios] Status of BIOS 1 FatalPCIErARI 65 --- 0.34.1.61: Cannot report on the current health state of the element
23) UNKNOWN[bios] Status of BIOS 1 TXT Status 65 --- 0.34.1.42: Cannot report on the current health state of the element
24) UNKNOWN[bios] Status of BIOS 1 MSR Info Log 65 --- 0.34.1.40: Cannot report on the current health state of the element
25) UNKNOWN[bios] Status of BIOS 1 Link Error 65 --- 0.34.1.52: Cannot report on the current health state of the element
26) UNKNOWN[bios] Status of BIOS 1 Interconnect Err 65 --- 0.34.1.45: Cannot report on the current health state of the element
27) UNKNOWN[bios] Status of BIOS 1 Fatal IO Error 65 --- 0.34.1.39: Cannot report on the current health state of the element
28) UNKNOWN[bios] Status of BIOS 1 NonFatalSSDEr 65 --- 0.34.1.59: Cannot report on the current health state of the element
29) UNKNOWN[bios] Status of BIOS 1 NonFatalPCIErBus 65 --- 0.34.1.57: Cannot report on the current health state of the element
30) UNKNOWN[bios] Status of BIOS 1 Link Warning 65 --- 0.34.1.51: Cannot report on the current health state of the element
31) UNKNOWN[bios] Status of BIOS 1 Link Warning 65 --- 0.34.1.50: Cannot report on the current health state of the element
32) UNKNOWN[bios] Status of BIOS 1 CPU TDP 65 --- 0.34.1.47: Cannot report on the current health state of the element
33) UNKNOWN[bios] Status of BIOS 1 Interconnect Err 65 --- 0.34.1.44: Cannot report on the current health state of the element
34) UNKNOWN[bios] Status of BIOS 1 Non Fatal PCI Er 65 --- 0.34.1.38: Cannot report on the current health state of the element
35) UNKNOWN[bios] Status of BIOS 1 Chassis Mismatch 65 --- 0.34.1.55: Cannot report on the current health state of the element
36) UNKNOWN[bios] Status of BIOS 1 Hdwr version err 65 --- 0.34.1.31: Cannot report on the current health state of the element
37) UNKNOWN[bios] Status of BIOS 1 POST Err 65 --- 0.34.1.30: Cannot report on the current health state of the element
38) UNKNOWN[bios] Status of BIOS 1 Additional Info 65 --- 0.34.1.46: Cannot report on the current health state of the element
39) UNKNOWN[bios] Status of BIOS 1 Err Reg Pointer 65 --- 0.34.1.26: Cannot report on the current health state of the element
40) UNKNOWN[bios] Status of BIOS 1 Chipset Err 65 --- 0.34.1.25: Cannot report on the current health state of the element
41) UNKNOWN[bios] Status of BIOS 1 Fatal PCI SSD Er 65 --- 0.34.1.58: Cannot report on the current health state of the element
42) UNKNOWN[bios] Status of BIOS 1 FatalPCIErrOnBus 65 --- 0.34.1.56: Cannot report on the current health state of the element
43) UNKNOWN[bios] Status of BIOS 1 PCIE Fatal Err 65 --- 0.34.1.24: Cannot report on the current health state of the element
44) UNKNOWN[bios] Status of BIOS 1 Unknown 65 --- 0.34.1.8: Cannot report on the current health state of the element
45) UNKNOWN[bios] Status of BIOS 1 Logging Disabled 65 --- 0.34.1.7: Cannot report on the current health state of the element
46) UNKNOWN[bios] Status of BIOS 1 SBE Log Disabled 65 --- 0.34.1.6: Cannot report on the current health state of the element
47) UNKNOWN[bios] Status of BIOS 1 PCI System Err 65 --- 0.34.1.5: Cannot report on the current health state of the element
48) UNKNOWN[bios] Status of BIOS 1 PCI Parity Err 65 --- 0.34.1.4: Cannot report on the current health state of the element
49) UNKNOWN[bios] Status of BIOS 1 I/O Channel Chk 65 --- 0.34.1.3: Cannot report on the current health state of the element
50) UNKNOWN[bios] Status of BIOS 1 MRC Warning 65 --- 0.34.1.54: Cannot report on the current health state of the element
51) UNKNOWN[bios] Status of BIOS 1 MRC Warning 65 --- 0.34.1.53: Cannot report on the current health state of the element
52) UNKNOWN[bios] Status of BIOS 1 QPIRC Warning 65 --- 0.34.1.49: Cannot report on the current health state of the element
53) UNKNOWN[bios] Status of BIOS 1 QPIRC Warning 65 --- 0.34.1.48: Cannot report on the current health state of the element
54) UNKNOWN[other] Status of Add-in Card 3 SD2 0 --- 0.11.3.245: Cannot report on the current health state of the element
55) UNKNOWN[other] Status of Add-in Card 3 SD1 0 --- 0.11.3.244: Cannot report on the current health state of the element
56) UNKNOWN[processor] Status of BIOS 1 CPUMachineCheck 65 --- 0.34.1.60: Cannot report on the current health state of the element
57) UNKNOWN[processor] Status of BIOS 1 CPU Machine Chk 65 --- 0.34.1.13: Cannot report on the current health state of the element
58) UNKNOWN[processor] Status of BIOS 1 CPU Init Err 65 --- 0.34.1.12: Cannot report on the current health state of the element
59) UNKNOWN[processor] Status of BIOS 1 CPU Bus PERR 65 --- 0.34.1.11: Cannot report on the current health state of the element
60) UNKNOWN[processor] Status of BIOS 1 CPU Protocol Err 65 --- 0.34.1.10: Cannot report on the current health state of the element

Code: Select all

[root@INFSGDCNGOS03 libexec]# ./check_vmware_api.pl -H <IP Address> -u nagiosuser -p <password> -l runtime health
CHECK_VMWARE_API.PL OK - 7/7 VMs up, overall status=green, connection state=connected, maintenance=no, 60 health issue(s), 1 config issue(s) | vmcount=7units;; health_issues=60;; config_issues=1;;
please help to solve the health issues or is there any possibility to ignore the specific health issues via command line?

Regards,
Simbu S

Re: CHECK_VMWARE_API.PL CRITICAL - health issue(s) found

Posted: Mon Oct 22, 2018 3:45 pm
by lmiltchev
Try running the following command:

Code: Select all

./check_vmware_api.pl -H <IP Address> -u nagiosuser -p <password> -l runtime -s health -o blacklistregexpflag -x UNKNOWN.*
to see if this is going to solve your issue.

Re: CHECK_VMWARE_API.PL CRITICAL - health issue(s) found

Posted: Thu Oct 25, 2018 1:45 am
by rohithroki
Hi Limltchev,

we have tried the option and still the unwanted alerts displayed.

Code: Select all

[root@INFSGDCNGOS03 libexec]# ./check_vmware_api.pl -H <IPAddress> -u nagiosuser -p <password> -l runtime -s health -o blacklistregexpflag -x UNKNOWN.*
CHECK_VMWARE_API.PL OK - 60 health issue(s) found in 219 checks:
1) UNKNOWN[systemBoard] Status of System Board 1 Power Optimized 0 --- 0.7.1.118: Cannot report on the current health state of the element
2) UNKNOWN[memory] Status of BIOS 1 Mem Fatal NB CRC 65 --- 0.34.1.34: Cannot report on the current health state of the element
3) UNKNOWN[memory] Status of BIOS 1 Mem Fatal SB CRC 65 --- 0.34.1.33: Cannot report on the current health state of the element
4) UNKNOWN[memory] Status of BIOS 1 iDPT Mem Fail 65 --- 0.34.1.43: Cannot report on the current health state of the element
5) UNKNOWN[memory] Status of BIOS 1 Mem Overtemp 65 --- 0.34.1.32: Cannot report on the current health state of the element
6) UNKNOWN[memory] Status of BIOS 1 USB Over-current 65 --- 0.34.1.29: Cannot report on the current health state of the element
7) UNKNOWN[memory] Status of BIOS 1 Mem CRC Err 65 --- 0.34.1.28: Cannot report on the current health state of the element
8) UNKNOWN[memory] Status of BIOS 1 Mem ECC Warning 65 --- 0.34.1.27: Cannot report on the current health state of the element
9) UNKNOWN[memory] Status of BIOS 1 Mem Redun Gain 65 --- 0.34.1.23: Cannot report on the current health state of the element
10) UNKNOWN[memory] Status of BIOS 1 Memory Cfg Err 65 --- 0.34.1.22: Cannot report on the current health state of the element
11) UNKNOWN[memory] Status of BIOS 1 Memory Removed 65 --- 0.34.1.21: Cannot report on the current health state of the element
12) UNKNOWN[memory] Status of BIOS 1 Memory Added 65 --- 0.34.1.20: Cannot report on the current health state of the element
13) UNKNOWN[memory] Status of BIOS 1 Memory RAID 65 --- 0.34.1.19: Cannot report on the current health state of the element
14) UNKNOWN[memory] Status of BIOS 1 Memory Mirrored 65 --- 0.34.1.18: Cannot report on the current health state of the element
15) UNKNOWN[memory] Status of BIOS 1 Memory Spared 65 --- 0.34.1.17: Cannot report on the current health state of the element
16) UNKNOWN[memory] Status of BIOS 1 ECC Uncorr Err 65 --- 0.34.1.2: Cannot report on the current health state of the element
17) UNKNOWN[memory] Status of BIOS 1 ECC Corr Err 65 --- 0.34.1.1: Cannot report on the current health state of the element
18) UNKNOWN[watchdog] Status of BIOS 1 OS Watchdog Time 65 --- 0.34.1.113: Cannot report on the current health state of the element
19) UNKNOWN[bios] Status of BIOS 1 NonFatalPCIExpEr 65 --- 0.34.1.64: Cannot report on the current health state of the element
20) UNKNOWN[bios] Status of BIOS 1 FatalPCIExpEr 65 --- 0.34.1.63: Cannot report on the current health state of the element
21) UNKNOWN[bios] Status of BIOS 1 NonFatalPCIErARI 65 --- 0.34.1.62: Cannot report on the current health state of the element
22) UNKNOWN[bios] Status of BIOS 1 FatalPCIErARI 65 --- 0.34.1.61: Cannot report on the current health state of the element
23) UNKNOWN[bios] Status of BIOS 1 TXT Status 65 --- 0.34.1.42: Cannot report on the current health state of the element
24) UNKNOWN[bios] Status of BIOS 1 MSR Info Log 65 --- 0.34.1.40: Cannot report on the current health state of the element
25) UNKNOWN[bios] Status of BIOS 1 Link Error 65 --- 0.34.1.52: Cannot report on the current health state of the element
26) UNKNOWN[bios] Status of BIOS 1 Interconnect Err 65 --- 0.34.1.45: Cannot report on the current health state of the element
27) UNKNOWN[bios] Status of BIOS 1 Fatal IO Error 65 --- 0.34.1.39: Cannot report on the current health state of the element
28) UNKNOWN[bios] Status of BIOS 1 NonFatalSSDEr 65 --- 0.34.1.59: Cannot report on the current health state of the element
29) UNKNOWN[bios] Status of BIOS 1 NonFatalPCIErBus 65 --- 0.34.1.57: Cannot report on the current health state of the element
30) UNKNOWN[bios] Status of BIOS 1 Link Warning 65 --- 0.34.1.51: Cannot report on the current health state of the element
31) UNKNOWN[bios] Status of BIOS 1 Link Warning 65 --- 0.34.1.50: Cannot report on the current health state of the element
32) UNKNOWN[bios] Status of BIOS 1 CPU TDP 65 --- 0.34.1.47: Cannot report on the current health state of the element
33) UNKNOWN[bios] Status of BIOS 1 Interconnect Err 65 --- 0.34.1.44: Cannot report on the current health state of the element
34) UNKNOWN[bios] Status of BIOS 1 Non Fatal PCI Er 65 --- 0.34.1.38: Cannot report on the current health state of the element
35) UNKNOWN[bios] Status of BIOS 1 Chassis Mismatch 65 --- 0.34.1.55: Cannot report on the current health state of the element
36) UNKNOWN[bios] Status of BIOS 1 Hdwr version err 65 --- 0.34.1.31: Cannot report on the current health state of the element
37) UNKNOWN[bios] Status of BIOS 1 POST Err 65 --- 0.34.1.30: Cannot report on the current health state of the element
38) UNKNOWN[bios] Status of BIOS 1 Additional Info 65 --- 0.34.1.46: Cannot report on the current health state of the element
39) UNKNOWN[bios] Status of BIOS 1 Err Reg Pointer 65 --- 0.34.1.26: Cannot report on the current health state of the element
40) UNKNOWN[bios] Status of BIOS 1 Chipset Err 65 --- 0.34.1.25: Cannot report on the current health state of the element
41) UNKNOWN[bios] Status of BIOS 1 Fatal PCI SSD Er 65 --- 0.34.1.58: Cannot report on the current health state of the element
42) UNKNOWN[bios] Status of BIOS 1 FatalPCIErrOnBus 65 --- 0.34.1.56: Cannot report on the current health state of the element
43) UNKNOWN[bios] Status of BIOS 1 PCIE Fatal Err 65 --- 0.34.1.24: Cannot report on the current health state of the element
44) UNKNOWN[bios] Status of BIOS 1 Unknown 65 --- 0.34.1.8: Cannot report on the current health state of the element
45) UNKNOWN[bios] Status of BIOS 1 Logging Disabled 65 --- 0.34.1.7: Cannot report on the current health state of the element
46) UNKNOWN[bios] Status of BIOS 1 SBE Log Disabled 65 --- 0.34.1.6: Cannot report on the current health state of the element
47) UNKNOWN[bios] Status of BIOS 1 PCI System Err 65 --- 0.34.1.5: Cannot report on the current health state of the element
48) UNKNOWN[bios] Status of BIOS 1 PCI Parity Err 65 --- 0.34.1.4: Cannot report on the current health state of the element
49) UNKNOWN[bios] Status of BIOS 1 I/O Channel Chk 65 --- 0.34.1.3: Cannot report on the current health state of the element
50) UNKNOWN[bios] Status of BIOS 1 MRC Warning 65 --- 0.34.1.54: Cannot report on the current health state of the element
51) UNKNOWN[bios] Status of BIOS 1 MRC Warning 65 --- 0.34.1.53: Cannot report on the current health state of the element
52) UNKNOWN[bios] Status of BIOS 1 QPIRC Warning 65 --- 0.34.1.49: Cannot report on the current health state of the element
53) UNKNOWN[bios] Status of BIOS 1 QPIRC Warning 65 --- 0.34.1.48: Cannot report on the current health state of the element
54) UNKNOWN[other] Status of Add-in Card 3 SD2 0 --- 0.11.3.245: Cannot report on the current health state of the element
55) UNKNOWN[other] Status of Add-in Card 3 SD1 0 --- 0.11.3.244: Cannot report on the current health state of the element
56) UNKNOWN[processor] Status of BIOS 1 CPUMachineCheck 65 --- 0.34.1.60: Cannot report on the current health state of the element
57) UNKNOWN[processor] Status of BIOS 1 CPU Machine Chk 65 --- 0.34.1.13: Cannot report on the current health state of the element
58) UNKNOWN[processor] Status of BIOS 1 CPU Init Err 65 --- 0.34.1.12: Cannot report on the current health state of the element
59) UNKNOWN[processor] Status of BIOS 1 CPU Bus PERR 65 --- 0.34.1.11: Cannot report on the current health state of the element
60) UNKNOWN[processor] Status of BIOS 1 CPU Protocol Err 65 --- 0.34.1.10: Cannot report on the current health state of the element | Alerts=60;;
Is there any argument that we have to add to ignore the specific health issues?

Regards,
Simbu

Re: CHECK_VMWARE_API.PL CRITICAL - health issue(s) found

Posted: Thu Oct 25, 2018 10:02 am
by lmiltchev
If this doesn't work, then I am not sure. :( This is 3rd party plugin that is NOT developed/maintained by Nagios. I would recommend that you contact the plugin's owner in order to find out if these options even work.

Note: If check_vmware_api.pl is based on check_esx3.pl, it's possible this is not going to work as it is NOT working in check_esx3.pl.

For example, running the command below works:

Code: Select all

./check_esx3.pl -H xxxxxx.corp -f /tmp/pass -l runtime -s health -x "item1,item2,item3"
however, these ones don't work:

Code: Select all

./check_esx3.pl -H xxxxxx.corp -f /tmp/pass -l runtime -s health -x "item*" -o blacklistregexpflag
or

Code: Select all

./check_esx3.pl -H xxxxxx.corp -f /tmp/pass -l runtime -s health -o blacklistregexpflag -x "item*"