Performance Graphs

operations_asavie · Post by **operations_asavie** » Tue Mar 29, 2016 6:57 am

Hi,

I am running the standard VMware wizard on an ESXi Host, for the CPU usage check I was able to see the data graphed in the Performance Graph but since I have added in warning and critical threshold % values for this service the Performance Graph is not being populated.

See images attached.

I am looking to have the % CPU usage graphed?

rkennedy · Post by **rkennedy** » Tue Mar 29, 2016 12:16 pm

Can you navigate to the advanced tab for the service in question, and post a screenshot of it? I'd like to see what perfdata it's returning.

Also, can you run the check over the CLI, and post the full input / output for the check? Do this once without the warning / critical defined, and once again with them defined.

operations_asavie · Post by **operations_asavie** » Wed Mar 30, 2016 4:34 am

Hi,

See below.

[root@NAG-IXDUB-02 libexec]# ./check_esx3.pl -H 172.17.4.39 -f /usr/local/nagiosxi/etc/components/vmware/4F0BC5J_mgmt_auth.txt -l CPU
ESX3 OK - cpu usage=1160.00 MHz (3.62%) | cpu_usagemhz=1160.00Mhz;; cpu_usage=3.62%;;
[root@NAG-IXDUB-02 libexec]# ./check_esx3.pl -H 172.17.4.39 -f /usr/local/nagiosxi/etc/components/vmware/4F0BC5J_mgmt_auth.txt -l CPU -s usage -w 80 -c 90
ESX3 OK - cpu usage=3.12 % | cpu_usage=3.12%;80;90

Post by **lmiltchev** » Wed Mar 30, 2016 1:02 pm

What happens if you move the RRD and the XML file for this service out of the "/usr/local/nagios/share/perfdata/<hostname>/" directory (to let's say "/tmp/"), and wait for 15-20 min? Do the graphs show up? The RRD/XML files should get recreated.

ssax · Post by **ssax** » Wed Mar 30, 2016 1:09 pm

The reason why this is not working is because you changed the command which changed the perfdata information.

This the RRD file is expecting two parameters (called datasources) because that is what it was originally built with:

Code: Select all

cpu_usagemhz=1160.00Mhz;; cpu_usage=3.62%;;

When you modified the check command it removed the cpu_usagemhz datasource so it's only trying to insert one datasource value now when the RRD is expecting two, which causes an error and will not insert the data.

You can remove the RRD and XML files for this service in /usr/local/nagios/share/perfdata/HOSTNAME/ so that it can rebuild them if you don't care about the historical data.

Note: You will lose all historical performance graph information for this service if you delete those files.

The only ways to get it graphing again WITHOUT losing the historical data would be to change the command back to what it was before OR follow this sweet guide that I wrote:

How to delete a datasource from an RRD

Here is how I did it:

First, open up the /usr/local/nagios/share/perfdata/HOSTNAME/SERVICE.xml file and find the <DS>NUMBER</DS> entry (this means data source number) for the one you want to remove.

Code: Select all

  <DATASOURCE>
    <TEMPLATE>check_isis</TEMPLATE>
    <RRDFILE>/usr/local/nagios/share/perfdata/UVN-DCD-SD02/Performance_Data.rrd</RRDFILE>
    <RRD_STORAGE_TYPE>SINGLE</RRD_STORAGE_TYPE>
    <RRD_HEARTBEAT>8460</RRD_HEARTBEAT>
    <IS_MULTI>0</IS_MULTI>
    <DS>4</DS>
    <NAME>MESSAGESPS</NAME>
    <LABEL>MESSAGESPS</LABEL>
    <UNIT></UNIT>
    <ACT>2324</ACT>
    <WARN></WARN>
    <WARN_MIN></WARN_MIN>
    <WARN_MAX></WARN_MAX>
    <WARN_RANGE_TYPE></WARN_RANGE_TYPE>
    <CRIT></CRIT>
    <CRIT_MIN></CRIT_MIN>
    <CRIT_MAX></CRIT_MAX>
    <CRIT_RANGE_TYPE></CRIT_RANGE_TYPE>
    <MIN></MIN>
    <MAX></MAX>
  </DATASOURCE>

Since you want to remove MESSAGESPS we can see that the DS number is 4.

Then remove the MESSAGEPS from the script/plugin/perfdata output.

Run these commands to install the tool that we will use to delete the data source:

Code: Select all

cd /tmp
wget "http://downloads.sourceforge.net/project/pnp4nagios/PNP-0.6/pnp4nagios-0.6.25.tar.gz?r=&ts=1452788875&use_mirror=iweb" -O /tmp/pnp4nagios-0.6.25.tar.gz
tar zxf /tmp/pnp4nagios-0.6.25.tar.gz
cd /tmp/pnp4nagios-0.6.25
./configure
make all
cp /tmp/pnp4nagios-0.6.25/scripts/rrd_modify.pl /root/scripts/
chmod +x /root/scripts/rrd_modify.pl

Run these commands to delete the data source:
*** NOTE: Make sure to change DATASOURCENUM, HOSTNAME, and SERVICENAME to the proper values.
*** NOTE: SERVICENAME should be changed to the actual service name from the filename (validate what it should be, it may differ from what you have set in Nagios)

Code: Select all

/root/scripts/rrd_modify.pl /usr/local/nagios/share/perfdata/HOSTNAME/SERVICENAME.rrd delete DATASOURCENUM
mv /usr/local/nagios/share/perfdata/HOSTNAME/SERVICENAME.rrd /usr/local/nagios/share/perfdata/HOSTNAME/SERVICENAME.rrd.bak
mv /usr/local/nagios/share/perfdata/HOSTNAME/SERVICENAME.rrd.chg /usr/local/nagios/share/perfdata/HOSTNAME/SERVICENAME.rrd
chown nagios.nagios /usr/local/nagios/share/perfdata/HOSTNAME/SERVICENAME.rrd
chmod 775 /usr/local/nagios/share/perfdata/HOSTNAME/SERVICENAME.rrd

Now it should start graphing properly when the new checks come in (may take 15 to 20 minutes).

operations_asavie · Post by **operations_asavie** » Thu Mar 31, 2016 9:11 am

Thank you for your help. I was able to just remove the RRD and XML file as the history was not important. I have this service graphing the percentage now.

Another quick question relating to the same script/ wizard.

I'm running a service check on the datastore on an ESXi host, I can get the value no problem and it carries out the correct checks when I implement the warning and critical thresholds, see attached. My problem is the 90.63% for the datastore in the image attached is the % space free, not % space used. So the warning and critical values to be specified for me should be -w 10% -c 5%. Obviously this won't work correctly though as the check is if the value is > the warning and critical thresholds defined, not <. Can you tell me how to change this in the script?

Post by **lmiltchev** » Thu Mar 31, 2016 1:05 pm

You could use ":" after the threshold for "less than" logic. See the "Threshold and ranges" section in the "Nagios Plugins Development Guidelines" here:

https://nagios-plugins.org/doc/guidelin ... HOLDFORMAT

Here's some examples with and without ":" after the threshold.

"Greater than" examples:

Code: Select all

[root@localhost ~]# /usr/local/nagios/libexec/check_esx3.pl -H "x.x.x.x" -f "/usr/local/nagiosxi/etc/components/vmware/MyHost_auth.txt" -l "VMFS" -s Datastore1 -w 90% -c 95%
ESX3 OK - Datastore1=10102235.00 MB (82.21%) | Datastore1=82.21%;90;95
[root@localhost ~]# /usr/local/nagios/libexec/check_esx3.pl -H "x.x.x.x" -f "/usr/local/nagiosxi/etc/components/vmware/MyHost_auth.txt" -l "VMFS" -s Datastore1 -w 80% -c 95%
ESX3 WARNING - Datastore1=10102235.00 MB (82.21%) | Datastore1=82.21%;80;95
[root@localhost ~]# /usr/local/nagios/libexec/check_esx3.pl -H "x.x.x.x" -f "/usr/local/nagiosxi/etc/components/vmware/MyHost_auth.txt" -l "VMFS" -s Datastore1 -w 70% -c 80%
ESX3 CRITICAL - Datastore1=10102235.00 MB (82.21%) | Datastore1=82.21%;70;80

"Less than" examples:

Code: Select all

[root@localhost ~]# /usr/local/nagios/libexec/check_esx3.pl -H "x.x.x.x" -f "/usr/local/nagiosxi/etc/components/vmware/MyHost_auth.txt" -l "VMFS" -s Datastore1 -w 80%: -c 70%:
ESX3 OK - Datastore1=10102235.00 MB (82.21%) | Datastore1=82.21%;80:;70:
[root@localhost ~]# /usr/local/nagios/libexec/check_esx3.pl -H "x.x.x.x" -f "/usr/local/nagiosxi/etc/components/vmware/MyHost_auth.txt" -l "VMFS" -s Datastore1 -w 90%: -c 80%:
ESX3 WARNING - Datastore1=10102235.00 MB (82.21%) | Datastore1=82.21%;90:;80:
[root@localhost ~]# /usr/local/nagios/libexec/check_esx3.pl -H "x.x.x.x" -f "/usr/local/nagiosxi/etc/components/vmware/MyHost_auth.txt" -l "VMFS" -s Datastore1 -w 95%: -c 90%:
ESX3 CRITICAL - Datastore1=10102235.00 MB (82.21%) | Datastore1=82.21%;95:;90:

Hope this helps.

operations_asavie · Post by **operations_asavie** » Tue Apr 05, 2016 11:24 am

Thank you for your help with this. With the command now, the thresholds of -w 20%: -c 10%: are working fine but they are carrying out checks on each of the datastores but I am only concerned with the 9f0bc5j.datastore=791910.00 MB (93.29%), is there a way to modify the command to only check the values of this against the defined thresholds and ultimately only alert on this and not all?

Post by **lmiltchev** » Tue Apr 05, 2016 1:48 pm

You can modify the existing check (or create a new one), where in the "$ARG3$" field you will have:

Code: Select all

-s 9f0bc5j.datastore -w 20%: -c 10%:

When you pass only "vmfs" to the plugin, it shows all datastores info. If you want to see a specific datastore, you can pass a sub-command - "-s <datastore name>".

Hope this helps.

Nagios Support Forum

Performance Graphs

Performance Graphs

Re: Performance Graphs

Re: Performance Graphs

Re: Performance Graphs

Re: Performance Graphs

Re: Performance Graphs

Re: Performance Graphs

Re: Performance Graphs

Re: Performance Graphs