box293_check_vmware plugin issue with Cluster HA Status

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
ghugon
Posts: 23
Joined: Tue May 07, 2019 7:55 am

box293_check_vmware plugin issue with Cluster HA Status

Post by ghugon »

Hi,

I'm having an issue with box293's check_vmware plugin.
We have 2 gearman workers dedicated to vmware monitoring with the vmware SDK.
The Cluster HA Status works fine for our VCSAs but two.
I have this error : [Undefined subroutine &ClusterFailoverLevelAdmissionControlPolicy::cpuFailoverResourcesPercent called at /usr/local/nagios/libexec/check_vmware.pl line 1912.]
The nagios service accounts for vmware have the same rights and are in the same groups.
The other Cluster checks from box293's plugin are working fine on those 2 VCSAs (Cluster CPU Usage, Cluster Memory Usage ...).

Do you have any idea how I could fix this ?
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: box293_check_vmware plugin issue with Cluster HA Status

Post by cdienger »

Try running the command directly on the command line of two gearman workers as well as the XI command line. Do they all fail? What is the full command?

Are the machines that fail a different version for the rest?

Please provide a copy of the /usr/local/nagios/libexec/check_vmware.pl.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
ghugon
Posts: 23
Joined: Tue May 07, 2019 7:55 am

Re: box293_check_vmware plugin issue with Cluster HA Status

Post by ghugon »

I've run the commands like you asked, they all failed.
See attachments for details.
VCSA 01 & VCSA02 - Worker 1.png
VCSA 01 & VCSA02 - Worker 2.png
All machines are VmWare VCSA 6.5.

The only thing that has been modified in the check_vmware.pl file is the amount of concurrent checks.
You do not have the required permissions to view the files attached to this post.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: box293_check_vmware plugin issue with Cluster HA Status

Post by cdienger »

Compare the working and non working cluster settings - What is defined for "Define host failover capacity by" ? Is the non working one be set to something other than "Cluster resource percentage" ?

https://docs.vmware.com/en/VMware-vSphe ... B060A.html
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
ghugon
Posts: 23
Joined: Tue May 07, 2019 7:55 am

Re: box293_check_vmware plugin issue with Cluster HA Status

Post by ghugon »

The working ones are either disabled (--ha_admission_control disabled option in the command for the check to be OK) or using the Cluster resource percentage set at 25% CPU & Memory.
The non working ones are both using Dedicated failover hosts set at 1.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: box293_check_vmware plugin issue with Cluster HA Status

Post by cdienger »

Try running the commands again with the --ha_state, --ha_host_monitoring, and --ha_admission_control options one at a time. It could be that one of them is causing the failure and can be excluded.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
ghugon
Posts: 23
Joined: Tue May 07, 2019 7:55 am

Re: box293_check_vmware plugin issue with Cluster HA Status

Post by ghugon »

I tried the commands with every options for both hosts on both workers and I still get the same error using the different options.
Example with one host on one worker :
options.png
You do not have the required permissions to view the files attached to this post.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: box293_check_vmware plugin issue with Cluster HA Status

Post by cdienger »

Run the commands with the --debug option. This should create a box293_check_vmware_debug_log.txt file in /home/nagios/ or /root/ depending on which account you run it with.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
ghugon
Posts: 23
Joined: Tue May 07, 2019 7:55 am

Re: box293_check_vmware plugin issue with Cluster HA Status

Post by ghugon »

Alright so I ran the command with the debug option.

Code: Select all

sudo -u naemon ./check_vmware.pl --server GV1-FF-VXR-VCSA-01 --check Cluster_HA_Status --cluster VXMA-Cluster --debug
Here is the output :
GV1-FF-VXR-VCSA-01.txt
I get the exact same output for both the non working hosts.

Here is the output of a host were the checks works as intended :

Code: Select all

sudo -u naemon ./check_vmware.pl --server GV2-FF-VXR-VCSA-02 --check Cluster_HA_Status --cluster GV2-FF-CL01 --debug
GV2-FF-VXR-VCSA-02.txt
You do not have the required permissions to view the files attached to this post.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: box293_check_vmware plugin issue with Cluster HA Status

Post by cdienger »

It defaults to using code meant for the cpuFailoverResourcesPercent policy if the statements above it don't match. In this case, the policy being returned is resourceReductionToToleratePercent which doesn't match:

Code: Select all

if ($cluster_ha_config_admission_control_policy_key eq 'failoverLevel') {

Code: Select all

elsif ($cluster_ha_config_admission_control_policy_key eq 'slotPolicy') 
or

Code: Select all

elsif ($cluster_ha_config_admission_control_policy_key eq 'failoverHosts') {
to get it to match against the last if statement, change line 1872 to:

Code: Select all

elsif ($cluster_ha_config_admission_control_policy_key eq 'failoverHosts' || $cluster_ha_config_admission_control_policy_key eq 'resourceReductionToToleratePercent' ) {
Hopefully this should do it. Let us know your results.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked