Check_VMWARE_runtime_health status change issue

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Suganya
Posts: 3
Joined: Tue Jul 12, 2016 1:44 am

Check_VMWARE_runtime_health status change issue

Post by Suganya »

Hi Team,

In our environment, we are using Nagio core 3.5.0 for monitoring all the assets. We have esxi servers configured in nagios using the check_vmware_api.pl plugin for all the other services to monitor.

We found that all the other services are reporting the status change perfectly except the Check_runtime_health. The issue in this particular service is, when there is any failure, the status is not getting changed from OK to Critical/Warning but reporting the error message in status information with the OK status.

As there is no Critical/Warning alert, we will not get to know the failure until and unless we look the status information.

The error looks like below:

CHECK_VMWARE_API.PL OK - 2 health issue(s) found in 371 checks

Kindly help us in resolving the issue. We need the alert to be generated when there are any health issue in the server.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Check_VMWARE_runtime_health status change issue

Post by rkennedy »

Can you show us the full command definition associated with the service? You may just need to define a warning / critical threshold.
Former Nagios Employee
Suganya
Posts: 3
Joined: Tue Jul 12, 2016 1:44 am

Re: Check_VMWARE_runtime_health status change issue

Post by Suganya »

Thank you for your response.

The command definition is given as below:

/usr/local/nagios/libexec/check_vmware_api.pl -H $Hostaddress$ -f /home/nagios/.nagios_user -l runtime -s health
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Check_VMWARE_runtime_health status change issue

Post by rkennedy »

Got it.

Code: Select all

[ -w <warn_range> ] [ -c <crit_range> 
It looks like the plugin supports a warning / critical threshold. What happens if you alter your command to include a -w and -c?
Former Nagios Employee
Suganya
Posts: 3
Joined: Tue Jul 12, 2016 1:44 am

Re: Check_VMWARE_runtime_health status change issue

Post by Suganya »

Thank you for your response!

Warning and threshold is applicable for the command but there are two issues here. :?

1. Nagios needs to report the failure if anything and in the case of threshold we should give number of health checks -1 as warning and -2 as critical.
Say for example,

CHECK_VMWARE_API.PL OK - All 331 health checks are GREEN

For the above service, 331 health checks gives OK, which means 330 should be warning and 229 should be critical (It might be a bad idea). Kindly suggest me an alternative. :x

2. And to give the thresholds, we do have different count of health checks for each and every server where it is hard to create each and every service with different thresholds for all the servers(nearly 100+) in our environment. :roll:
User avatar
lmiltchev
Former Nagios Staff
Posts: 13587
Joined: Mon May 23, 2011 12:15 pm

Re: Check_VMWARE_runtime_health status change issue

Post by lmiltchev »

This is a single check that monitors many different things... You need to consider the number of "Alerts", not the number of "health checks" when setting up your warning and critical thresholds. For example, when I try a similar check, my output is:
CHECK_VMWARE_API.PL OK - All 212 health checks are GREEN: fan (5x); system (1x); CPU (2x); Cable/Interconnect (2x); Watchdog (4x); voltage (21x); Battery (3x); Processors (12x); Software Components (96x); Memory (1x); Storage (56x); power (7x); Chassis (1x); temperature (1x); | Alerts=0;;
In your case, you probably have Alerts=2, so you may use something like this:

Code: Select all

/usr/local/nagios/libexec/check_vmware_api.pl -H $Hostaddress$ -f /home/nagios/.nagios_user -l runtime -s health -w 1 -c 2
This check should give you a "warning" if the number of alerts is greater than 1, and "critical" if the number of alerts is greater than 2. I believe you could set up these same thresholds for all of your checks. This way, you will be notified in case of a warning/critical issue.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked