Issue with false critical values on Virtual Machines

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Ivajlo911
Posts: 38
Joined: Thu Mar 23, 2017 3:19 am

Issue with false critical values on Virtual Machines

Post by Ivajlo911 »

Hi,
we have a problem with false critical values coming from Nagios on a few virtual machines. Here is one example:

Service: L3CSGHVI CPU Usage
Host: vCSSGH
Address: ......
State: CRITICAL
Info:
ESX3 CRITICAL - L3CSGHVI cpu usage=-0.01 %
Date/Time: 2018-06-07 08:25:1

Service: L3CSGHVI Memory
Host: vCSSGH
Address: ......
State: CRITICAL
Info:
ESX3 CRITICAL - L3CSGHVI mem usage=-0.01 %
Date/Time: 2018-05-25 05:43:17

Details of our implementation:
CentOS Linux release 7.4.1708 (Core)
Manual Install of Nagios XI
No special configurations on our system, ie; is Gnome installed
We are not using a proxy
We are using Nagios XI 5.4.13.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Issue with false critical values on Virtual Machines

Post by scottwilkerson »

Can you show the full command your services?

I think this may be an anomaly in the ESX api, which you can probably get arrounf by setting your warning and critical thresholds to make negative numbers be OK

So for example if you had a warning/critical threshold set to 80 change it to ~:80 this means from negative infinity to 80 are OK and above is not.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Ivajlo911
Posts: 38
Joined: Thu Mar 23, 2017 3:19 am

Re: Issue with false critical values on Virtual Machines

Post by Ivajlo911 »

Hi,
I would avoid such workarounds if possible.

These are the commands:

/usr/local/nagios/libexec/check_esx3.pl -H "10.10.7.220" -f "/usr/local/nagiosxi/etc/components/vmware/vCSSGH_auth.txt" -N "L3CSGHVI" -l "CPU" -s usage -w 80% -c 90%
ESX3 OK - "L3CSGHVI" cpu usage=18.50 % | cpu_usage=18.50%;80;90+

/usr/local/nagios/libexec/check_esx3.pl -H "10.10.7.220" -f "/usr/local/nagiosxi/etc/components/vmware/vCSSGH_auth.txt" -N "L3CSGHVI" -l "MEM" -s usage -w 80% -c 90%
ESX3 OK - "L3CSGHVI" mem usage=71.99 % | mem_usage=71.99%;80;90
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Issue with false critical values on Virtual Machines

Post by cdienger »

Is the behavior frequent or consistent? The output in the last post looks to be normal. I tested with the 6.7.0 vmware sdk and haven't been able to reproduce yet. Run "/usr/bin/vmware-cmd --version" to find the version that is installed on the XI system.

Also, here is a screenshot of my check settings just to make sure you're running something similar.
You do not have the required permissions to view the files attached to this post.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Ivajlo911
Posts: 38
Joined: Thu Mar 23, 2017 3:19 am

Re: Issue with false critical values on Virtual Machines

Post by Ivajlo911 »

Hi,

behavior is more consistent then frequent. Happens from time to time - every two or three days for two particular VMs.

Also we use 6.5.0 vmware sdk. Do you think we should update.

Our settings are the same as yours.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Issue with false critical values on Virtual Machines

Post by cdienger »

Yes, I think it would be worth it to update. As was pointed out, this is likely something related to the API and is hopefully addressed in the update.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Ivajlo911
Posts: 38
Joined: Thu Mar 23, 2017 3:19 am

Re: Issue with false critical values on Virtual Machines

Post by Ivajlo911 »

Hi,
after update problem continues:

***** Nagios XI Alert *****

Nagios has detected a problem with this service.

Notification Type: PROBLEM

Service: L3CNS02 CPU Usage
Host: vCSSGH
Address:
State: CRITICAL
Info:
ESX3 CRITICAL - L3CNS02 cpu usage=-0.01 %
Date/Time: 2018-06-18 04:57:47
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Issue with false critical values on Virtual Machines

Post by cdienger »

Please confirm the plugin and wizard version. The plugin version can be seen with:

/usr/local/nagios/libexec/check_esx3.pl --version

and the wizard version can be found under Admin > System Extensions > Manage Config Wizards > VMware. The latest wizard version is 1.7.1 and you can upgrade to that version by clicking the Check for Updates and Intall Updates buttons found on the top of the page.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Ivajlo911
Posts: 38
Joined: Thu Mar 23, 2017 3:19 am

Re: Issue with false critical values on Virtual Machines

Post by Ivajlo911 »

Hi,

Wizard version is: 1.6.9
The version of the plugin is:
[root@l3cnagint ~]# /usr/local/nagios/libexec/check_esx3.pl --version
check_esx3.pl 0.2.1
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Issue with false critical values on Virtual Machines

Post by cdienger »

Is that a typo or does it actually show 0.2.1? My machien shows 0.7.1. In either case, try upgrading the wizard which should have the latest plugin as well.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked