Page 1 of 2

box293_check_vmware plugin issue with Cluster HA Status

Posted: Mon Jul 01, 2019 7:35 am
by ghugon
Hi,

I'm having an issue with box293's check_vmware plugin.
We have 2 gearman workers dedicated to vmware monitoring with the vmware SDK.
The Cluster HA Status works fine for our VCSAs but two.
I have this error : [Undefined subroutine &ClusterFailoverLevelAdmissionControlPolicy::cpuFailoverResourcesPercent called at /usr/local/nagios/libexec/check_vmware.pl line 1912.]
The nagios service accounts for vmware have the same rights and are in the same groups.
The other Cluster checks from box293's plugin are working fine on those 2 VCSAs (Cluster CPU Usage, Cluster Memory Usage ...).

Do you have any idea how I could fix this ?

Re: box293_check_vmware plugin issue with Cluster HA Status

Posted: Mon Jul 01, 2019 3:19 pm
by cdienger
Try running the command directly on the command line of two gearman workers as well as the XI command line. Do they all fail? What is the full command?

Are the machines that fail a different version for the rest?

Please provide a copy of the /usr/local/nagios/libexec/check_vmware.pl.

Re: box293_check_vmware plugin issue with Cluster HA Status

Posted: Tue Jul 02, 2019 9:09 am
by ghugon
I've run the commands like you asked, they all failed.
See attachments for details.
VCSA 01 & VCSA02 - Worker 1.png
VCSA 01 & VCSA02 - Worker 2.png
All machines are VmWare VCSA 6.5.

The only thing that has been modified in the check_vmware.pl file is the amount of concurrent checks.

Re: box293_check_vmware plugin issue with Cluster HA Status

Posted: Tue Jul 02, 2019 3:24 pm
by cdienger
Compare the working and non working cluster settings - What is defined for "Define host failover capacity by" ? Is the non working one be set to something other than "Cluster resource percentage" ?

https://docs.vmware.com/en/VMware-vSphe ... B060A.html

Re: box293_check_vmware plugin issue with Cluster HA Status

Posted: Wed Jul 03, 2019 3:15 am
by ghugon
The working ones are either disabled (--ha_admission_control disabled option in the command for the check to be OK) or using the Cluster resource percentage set at 25% CPU & Memory.
The non working ones are both using Dedicated failover hosts set at 1.

Re: box293_check_vmware plugin issue with Cluster HA Status

Posted: Wed Jul 03, 2019 11:04 am
by cdienger
Try running the commands again with the --ha_state, --ha_host_monitoring, and --ha_admission_control options one at a time. It could be that one of them is causing the failure and can be excluded.

Re: box293_check_vmware plugin issue with Cluster HA Status

Posted: Thu Jul 04, 2019 3:15 am
by ghugon
I tried the commands with every options for both hosts on both workers and I still get the same error using the different options.
Example with one host on one worker :
options.png

Re: box293_check_vmware plugin issue with Cluster HA Status

Posted: Mon Jul 08, 2019 3:56 pm
by cdienger
Run the commands with the --debug option. This should create a box293_check_vmware_debug_log.txt file in /home/nagios/ or /root/ depending on which account you run it with.

Re: box293_check_vmware plugin issue with Cluster HA Status

Posted: Tue Jul 09, 2019 3:28 am
by ghugon
Alright so I ran the command with the debug option.

Code: Select all

sudo -u naemon ./check_vmware.pl --server GV1-FF-VXR-VCSA-01 --check Cluster_HA_Status --cluster VXMA-Cluster --debug
Here is the output :
GV1-FF-VXR-VCSA-01.txt
I get the exact same output for both the non working hosts.

Here is the output of a host were the checks works as intended :

Code: Select all

sudo -u naemon ./check_vmware.pl --server GV2-FF-VXR-VCSA-02 --check Cluster_HA_Status --cluster GV2-FF-CL01 --debug
GV2-FF-VXR-VCSA-02.txt

Re: box293_check_vmware plugin issue with Cluster HA Status

Posted: Tue Jul 09, 2019 1:32 pm
by cdienger
It defaults to using code meant for the cpuFailoverResourcesPercent policy if the statements above it don't match. In this case, the policy being returned is resourceReductionToToleratePercent which doesn't match:

Code: Select all

if ($cluster_ha_config_admission_control_policy_key eq 'failoverLevel') {

Code: Select all

elsif ($cluster_ha_config_admission_control_policy_key eq 'slotPolicy') 
or

Code: Select all

elsif ($cluster_ha_config_admission_control_policy_key eq 'failoverHosts') {
to get it to match against the last if statement, change line 1872 to:

Code: Select all

elsif ($cluster_ha_config_admission_control_policy_key eq 'failoverHosts' || $cluster_ha_config_admission_control_policy_key eq 'resourceReductionToToleratePercent' ) {
Hopefully this should do it. Let us know your results.