Performance Graphs
-
operations_asavie
- Posts: 33
- Joined: Tue Dec 22, 2015 7:07 am
Performance Graphs
Hi,
I am running the standard VMware wizard on an ESXi Host, for the CPU usage check I was able to see the data graphed in the Performance Graph but since I have added in warning and critical threshold % values for this service the Performance Graph is not being populated.
See images attached.
I am looking to have the % CPU usage graphed?
I am running the standard VMware wizard on an ESXi Host, for the CPU usage check I was able to see the data graphed in the Performance Graph but since I have added in warning and critical threshold % values for this service the Performance Graph is not being populated.
See images attached.
I am looking to have the % CPU usage graphed?
You do not have the required permissions to view the files attached to this post.
Re: Performance Graphs
Can you navigate to the advanced tab for the service in question, and post a screenshot of it? I'd like to see what perfdata it's returning.
Also, can you run the check over the CLI, and post the full input / output for the check? Do this once without the warning / critical defined, and once again with them defined.
Also, can you run the check over the CLI, and post the full input / output for the check? Do this once without the warning / critical defined, and once again with them defined.
Former Nagios Employee
-
operations_asavie
- Posts: 33
- Joined: Tue Dec 22, 2015 7:07 am
Re: Performance Graphs
Hi,
See below.
[root@NAG-IXDUB-02 libexec]# ./check_esx3.pl -H 172.17.4.39 -f /usr/local/nagiosxi/etc/components/vmware/4F0BC5J_mgmt_auth.txt -l CPU
ESX3 OK - cpu usage=1160.00 MHz (3.62%) | cpu_usagemhz=1160.00Mhz;; cpu_usage=3.62%;;
[root@NAG-IXDUB-02 libexec]# ./check_esx3.pl -H 172.17.4.39 -f /usr/local/nagiosxi/etc/components/vmware/4F0BC5J_mgmt_auth.txt -l CPU -s usage -w 80 -c 90
ESX3 OK - cpu usage=3.12 % | cpu_usage=3.12%;80;90
See below.
[root@NAG-IXDUB-02 libexec]# ./check_esx3.pl -H 172.17.4.39 -f /usr/local/nagiosxi/etc/components/vmware/4F0BC5J_mgmt_auth.txt -l CPU
ESX3 OK - cpu usage=1160.00 MHz (3.62%) | cpu_usagemhz=1160.00Mhz;; cpu_usage=3.62%;;
[root@NAG-IXDUB-02 libexec]# ./check_esx3.pl -H 172.17.4.39 -f /usr/local/nagiosxi/etc/components/vmware/4F0BC5J_mgmt_auth.txt -l CPU -s usage -w 80 -c 90
ESX3 OK - cpu usage=3.12 % | cpu_usage=3.12%;80;90
You do not have the required permissions to view the files attached to this post.
Re: Performance Graphs
What happens if you move the RRD and the XML file for this service out of the "/usr/local/nagios/share/perfdata/<hostname>/" directory (to let's say "/tmp/"), and wait for 15-20 min? Do the graphs show up? The RRD/XML files should get recreated.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Performance Graphs
The reason why this is not working is because you changed the command which changed the perfdata information.
This the RRD file is expecting two parameters (called datasources) because that is what it was originally built with:
When you modified the check command it removed the cpu_usagemhz datasource so it's only trying to insert one datasource value now when the RRD is expecting two, which causes an error and will not insert the data.
You can remove the RRD and XML files for this service in /usr/local/nagios/share/perfdata/HOSTNAME/ so that it can rebuild them if you don't care about the historical data.
Note: You will lose all historical performance graph information for this service if you delete those files.
The only ways to get it graphing again WITHOUT losing the historical data would be to change the command back to what it was before OR follow this sweet guide that I wrote:
How to delete a datasource from an RRD
Here is how I did it:
First, open up the /usr/local/nagios/share/perfdata/HOSTNAME/SERVICE.xml file and find the <DS>NUMBER</DS> entry (this means data source number) for the one you want to remove.
Since you want to remove MESSAGESPS we can see that the DS number is 4.
Then remove the MESSAGEPS from the script/plugin/perfdata output.
Run these commands to install the tool that we will use to delete the data source:
Run these commands to delete the data source:
*** NOTE: Make sure to change DATASOURCENUM, HOSTNAME, and SERVICENAME to the proper values.
*** NOTE: SERVICENAME should be changed to the actual service name from the filename (validate what it should be, it may differ from what you have set in Nagios)
Now it should start graphing properly when the new checks come in (may take 15 to 20 minutes).
This the RRD file is expecting two parameters (called datasources) because that is what it was originally built with:
Code: Select all
cpu_usagemhz=1160.00Mhz;; cpu_usage=3.62%;;You can remove the RRD and XML files for this service in /usr/local/nagios/share/perfdata/HOSTNAME/ so that it can rebuild them if you don't care about the historical data.
Note: You will lose all historical performance graph information for this service if you delete those files.
The only ways to get it graphing again WITHOUT losing the historical data would be to change the command back to what it was before OR follow this sweet guide that I wrote:
How to delete a datasource from an RRD
Here is how I did it:
First, open up the /usr/local/nagios/share/perfdata/HOSTNAME/SERVICE.xml file and find the <DS>NUMBER</DS> entry (this means data source number) for the one you want to remove.
Code: Select all
<DATASOURCE>
<TEMPLATE>check_isis</TEMPLATE>
<RRDFILE>/usr/local/nagios/share/perfdata/UVN-DCD-SD02/Performance_Data.rrd</RRDFILE>
<RRD_STORAGE_TYPE>SINGLE</RRD_STORAGE_TYPE>
<RRD_HEARTBEAT>8460</RRD_HEARTBEAT>
<IS_MULTI>0</IS_MULTI>
<DS>4</DS>
<NAME>MESSAGESPS</NAME>
<LABEL>MESSAGESPS</LABEL>
<UNIT></UNIT>
<ACT>2324</ACT>
<WARN></WARN>
<WARN_MIN></WARN_MIN>
<WARN_MAX></WARN_MAX>
<WARN_RANGE_TYPE></WARN_RANGE_TYPE>
<CRIT></CRIT>
<CRIT_MIN></CRIT_MIN>
<CRIT_MAX></CRIT_MAX>
<CRIT_RANGE_TYPE></CRIT_RANGE_TYPE>
<MIN></MIN>
<MAX></MAX>
</DATASOURCE>Then remove the MESSAGEPS from the script/plugin/perfdata output.
Run these commands to install the tool that we will use to delete the data source:
Code: Select all
cd /tmp
wget "http://downloads.sourceforge.net/project/pnp4nagios/PNP-0.6/pnp4nagios-0.6.25.tar.gz?r=&ts=1452788875&use_mirror=iweb" -O /tmp/pnp4nagios-0.6.25.tar.gz
tar zxf /tmp/pnp4nagios-0.6.25.tar.gz
cd /tmp/pnp4nagios-0.6.25
./configure
make all
cp /tmp/pnp4nagios-0.6.25/scripts/rrd_modify.pl /root/scripts/
chmod +x /root/scripts/rrd_modify.pl*** NOTE: Make sure to change DATASOURCENUM, HOSTNAME, and SERVICENAME to the proper values.
*** NOTE: SERVICENAME should be changed to the actual service name from the filename (validate what it should be, it may differ from what you have set in Nagios)
Code: Select all
/root/scripts/rrd_modify.pl /usr/local/nagios/share/perfdata/HOSTNAME/SERVICENAME.rrd delete DATASOURCENUM
mv /usr/local/nagios/share/perfdata/HOSTNAME/SERVICENAME.rrd /usr/local/nagios/share/perfdata/HOSTNAME/SERVICENAME.rrd.bak
mv /usr/local/nagios/share/perfdata/HOSTNAME/SERVICENAME.rrd.chg /usr/local/nagios/share/perfdata/HOSTNAME/SERVICENAME.rrd
chown nagios.nagios /usr/local/nagios/share/perfdata/HOSTNAME/SERVICENAME.rrd
chmod 775 /usr/local/nagios/share/perfdata/HOSTNAME/SERVICENAME.rrd-
operations_asavie
- Posts: 33
- Joined: Tue Dec 22, 2015 7:07 am
Re: Performance Graphs
Thank you for your help. I was able to just remove the RRD and XML file as the history was not important. I have this service graphing the percentage now.
Another quick question relating to the same script/ wizard.
I'm running a service check on the datastore on an ESXi host, I can get the value no problem and it carries out the correct checks when I implement the warning and critical thresholds, see attached. My problem is the 90.63% for the datastore in the image attached is the % space free, not % space used. So the warning and critical values to be specified for me should be -w 10% -c 5%. Obviously this won't work correctly though as the check is if the value is > the warning and critical thresholds defined, not <. Can you tell me how to change this in the script?
Another quick question relating to the same script/ wizard.
I'm running a service check on the datastore on an ESXi host, I can get the value no problem and it carries out the correct checks when I implement the warning and critical thresholds, see attached. My problem is the 90.63% for the datastore in the image attached is the % space free, not % space used. So the warning and critical values to be specified for me should be -w 10% -c 5%. Obviously this won't work correctly though as the check is if the value is > the warning and critical thresholds defined, not <. Can you tell me how to change this in the script?
You do not have the required permissions to view the files attached to this post.
Re: Performance Graphs
You could use ":" after the threshold for "less than" logic. See the "Threshold and ranges" section in the "Nagios Plugins Development Guidelines" here:
https://nagios-plugins.org/doc/guidelin ... HOLDFORMAT
Here's some examples with and without ":" after the threshold.
"Greater than" examples:
"Less than" examples:
Hope this helps.
https://nagios-plugins.org/doc/guidelin ... HOLDFORMAT
Here's some examples with and without ":" after the threshold.
"Greater than" examples:
Code: Select all
[root@localhost ~]# /usr/local/nagios/libexec/check_esx3.pl -H "x.x.x.x" -f "/usr/local/nagiosxi/etc/components/vmware/MyHost_auth.txt" -l "VMFS" -s Datastore1 -w 90% -c 95%
ESX3 OK - Datastore1=10102235.00 MB (82.21%) | Datastore1=82.21%;90;95
[root@localhost ~]# /usr/local/nagios/libexec/check_esx3.pl -H "x.x.x.x" -f "/usr/local/nagiosxi/etc/components/vmware/MyHost_auth.txt" -l "VMFS" -s Datastore1 -w 80% -c 95%
ESX3 WARNING - Datastore1=10102235.00 MB (82.21%) | Datastore1=82.21%;80;95
[root@localhost ~]# /usr/local/nagios/libexec/check_esx3.pl -H "x.x.x.x" -f "/usr/local/nagiosxi/etc/components/vmware/MyHost_auth.txt" -l "VMFS" -s Datastore1 -w 70% -c 80%
ESX3 CRITICAL - Datastore1=10102235.00 MB (82.21%) | Datastore1=82.21%;70;80Code: Select all
[root@localhost ~]# /usr/local/nagios/libexec/check_esx3.pl -H "x.x.x.x" -f "/usr/local/nagiosxi/etc/components/vmware/MyHost_auth.txt" -l "VMFS" -s Datastore1 -w 80%: -c 70%:
ESX3 OK - Datastore1=10102235.00 MB (82.21%) | Datastore1=82.21%;80:;70:
[root@localhost ~]# /usr/local/nagios/libexec/check_esx3.pl -H "x.x.x.x" -f "/usr/local/nagiosxi/etc/components/vmware/MyHost_auth.txt" -l "VMFS" -s Datastore1 -w 90%: -c 80%:
ESX3 WARNING - Datastore1=10102235.00 MB (82.21%) | Datastore1=82.21%;90:;80:
[root@localhost ~]# /usr/local/nagios/libexec/check_esx3.pl -H "x.x.x.x" -f "/usr/local/nagiosxi/etc/components/vmware/MyHost_auth.txt" -l "VMFS" -s Datastore1 -w 95%: -c 90%:
ESX3 CRITICAL - Datastore1=10102235.00 MB (82.21%) | Datastore1=82.21%;95:;90:Be sure to check out our Knowledgebase for helpful articles and solutions!
-
operations_asavie
- Posts: 33
- Joined: Tue Dec 22, 2015 7:07 am
Re: Performance Graphs
Thank you for your help with this. With the command now, the thresholds of -w 20%: -c 10%: are working fine but they are carrying out checks on each of the datastores but I am only concerned with the 9f0bc5j.datastore=791910.00 MB (93.29%), is there a way to modify the command to only check the values of this against the defined thresholds and ultimately only alert on this and not all?
You do not have the required permissions to view the files attached to this post.
Re: Performance Graphs
You can modify the existing check (or create a new one), where in the "$ARG3$" field you will have:
When you pass only "vmfs" to the plugin, it shows all datastores info. If you want to see a specific datastore, you can pass a sub-command - "-s <datastore name>".
Hope this helps.
Code: Select all
-s 9f0bc5j.datastore -w 20%: -c 10%:Hope this helps.
Be sure to check out our Knowledgebase for helpful articles and solutions!