Nagios Support Forum

Posted: **Thu Jan 10, 2013 3:48 pm**

Hi,

I'm having some issues with the Check_esx3.

with the wizard of vmware i have added some ESXi. The plugin collect data with no problem, and get all the metrics for CPU, MEM, datastore and others. And diferent measurement of this metrics (%, MHZ, etc..) I can see the Performance Graph with no problem.

The issue happen when i specific a value from the metric to collect. For example when i insert in the SUBCOMMAND (-s) value for CPU metric "usage" i get the specific value for CPU_Usage and i define with this the notifications with the Critical and warning parameters to send emails. But when i do this i loss the performance graph for this variable. But when I leave in blank the SUBCOMMAND value, I recover the performance graph.

why is this happening?

Thanks a lot!
Mauricio

Posted: **Thu Jan 10, 2013 4:00 pm**

When RRDs are generated, there are done so for a specific number and type of metrics. When you change metrics for a check, the RRD will no longer populate. You usually have to rename/remove the corresponding RRD and let a new one get generated after the second check.

If you are having issues with RRDs after you changed metrics, you will find the related errors in /usr/local/nagios/var/perfdata.log.

Posted: **Mon Jan 14, 2013 8:56 am**

Hi,

If i see the perfdata.log i see this:
2013-01-03 13:00:29 [12161] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-01-03 13:00:29 [12161] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-01-03 13:00:29 [12161] [0] *** TIMEOUT: Please check your npcd.cfg
2013-01-03 13:00:29 [12161] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1357228817.perfdata.
2013-01-03 13:00:29 [12161] [0] *** Timeout while processing Host: "esx096-s1-cl96-16.cloud.sonda.com"
2013-01-03 13:00:29 [12161] [0] *** process_perfdata.pl terminated on signal ALRM

And i look in this directory /usr/local/nagios/share/perfdata/"hostname" and i find the .rrd and .xml files.
What happen if i delete this file for a specific services? this is what you say that its going to generate again?

when y left the subcommand in blank the service bring me all metrics (e.g cpu %, cpu mhz etc..) there is a way to define alarm for cpu % and get all those metric to?
or I have to make individuales services for each one of the metric and define notficación for each one??

Thanks for your help!
Mauricio

Posted: **Mon Jan 14, 2013 10:41 am**

Try making a run through of this document, due to the errors we can see this should solve your issue:

http://support.nagios.com/wiki/index.ph ... ve_No_Data

In answer to your question about deleting RRD's yes, once you remove them you will lose the past performance data forever, but they will be regenerated.

In regards to adding more metrics, how do you have your command defined? Also, did you run the Esx Monitoring wizard to set these checks up? The wizard should allow you to choose to monitor everything the plugins allow.

Posted: **Mon Jan 14, 2013 1:18 pm**

Thanks i will look the document.

First time i run the vmware wizard to add the ESX's, it brought all the metrics the respective Performance Graph. Then I was looking in how to configure warnings and critical parameters for notification's. Then I look to the specific $ARG for the SUBCOMMAND of the comand. This because i dont know how to set the W and C for the metrics when was all added under one services with the wizard.

Thats mean, when i add one ESX the wizard under "MEM" give me overhead, swapped and usage. When the command just give the posibility to enter in the Args just one W and C treshold.

Hope im clear with the explanation.

in resume i want to have all the metric for an ESX service (CPU, MEM, etc..) and generate C and W notifications (emails) with the metrics threshold (usage, swap, MHZ etc...)

Thanks a lot again!
Mauricio

Posted: **Wed Jan 16, 2013 12:15 pm**

If you want to separate these you can by adding sub-commands with the -s flag items with the + are subcommands

Code: Select all

# /usr/local/nagios/libexec/check_esx3.pl -h

...

Supported commands(^ means blank or not specified parameter) :
    Common options for VM, Host and DC :
        * cpu - shows cpu info
            + usage - CPU usage in percentage
            + usagemhz - CPU usage in MHz
            ^ all cpu info
        * mem - shows mem info
            + usage - mem usage in percentage
            + usagemb - mem usage in MB
            + swap - swap mem usage in MB
            + overhead - additional mem used by VM Server in MB
            + overall - overall mem used by VM Server in MB
            ^ all mem info
        * net - shows net info
            + usage - overall network usage in KB/s
            + receive - receive in KB/s
            + send - send in KB/s
            ^ all net info
        * io - shows disk io info
            + read - read latency in ms
            + write - write latency in ms
            ^ all disk io info
        * runtime - shows runtime info
            + status - overall host status (gray/green/red/yellow)
            + issues - all issues for the host
            ^ all runtime info
    VM specific :
        * cpu - shows cpu info
            + wait - CPU wait in ms
        * mem - shows mem info
            + swapin - swapin mem usage in MB
            + swapout - swapout mem usage in MB
            + active - active mem usage in MB
        * io - shows disk I/O info
            + usage - overall disk usage in MB/s
        * runtime - shows runtime info
            + con - connection state
            + cpu - allocated CPU in MHz
            + mem - allocated mem in MB
            + state - virtual machine state (UP, DOWN, SUSPENDED)
            + consoleconnections - console connections to VM
            + guest - guest OS status, needs VMware Tools
            + tools - VMWare Tools status
    Host specific :
        * net - shows net info
            + nic - makes sure all active NICs are plugged in
        * io - shows disk io info
            + aborted - aborted commands count
            + resets - bus resets count
            + kernel - kernel latency in ms
            + device - device latency in ms
            + queue - queue latency in ms
        * vmfs - shows Datastore info
            + (name) - info for datastore with name (name)
            ^ all datastore info
        * runtime - shows runtime info
            + con - connection state
            + health - checks cpu/storage/memory/sensor status
            + maintenance - shows whether host is in maintenance mode
            + list(vm) - list of VMWare machines and their statuses
        * service - shows Host service info
            + (names) - check the state of one or several services specified by (names), syntax for (names):<service1>,<service2>,...,<serviceN>
            ^ show all services
    DC specific :
        * io - shows disk io info
            + aborted - aborted commands count
            + resets - bus resets count
            + kernel - kernel latency in ms
            + device - device latency in ms
            + queue - queue latency in ms
        * vmfs - shows Datastore info
            + (name) - info for datastore with name (name)
            ^ all datastore info
        * runtime - shows runtime info
            + list(vm) - list of VMWare machines and their statuses
            + listhost - list of VMWare esx host servers and their statuses
        * recommendations - shows recommendations for cluster
            + (name) - recommendations for cluster with name (name)
            ^ all clusters recommendations

Posted: **Mon Feb 18, 2013 4:35 pm**

I would like to continue with this topic with this:

when i add an ESX for Memory the plugin get me "CHECK_ESX3.PL OK - mem usage=5761.18 MB (4.39%), overhead=312.89 MB, swapped=0.00 MB, memctl=0.00 MB"

This when the command is this:
./check_esx3.pl -H "HOSTNAME" -f "File" -l "MEM" -s "" -C -W

But how i configure the alarms (-C and -W) because the command get me 4 variables in diferent measures.

Thanks,

Posted: **Mon Feb 18, 2013 4:55 pm**

Looking at the usage information for the plugin, the warning and critical thresholds are measured in MB or %. The metric effected by the thresholds are overall memory (as this is an esx host check). If you want to measure the individual vms, you would have to specify the -N switch with the vm_name.

Posted: **Mon Feb 18, 2013 4:59 pm**

You would need to specify a subcommand ie

Code: Select all

./check_esx3.pl -H "HOSTNAME" -f "File" -l "MEM" -s "usage" -w 80 -c 90

Posted: **Tue Feb 19, 2013 8:12 am**

Ok,

But for examples for an ESXi critical and warning definition i need to made this command:

./check_esx3.pl -H "HOSTNAME" -l "MEM" -s "usage" -w 80 -c 90

But the thing is that with this command i lose al the other metric's to see... the overhead, swapped and memctl. And i need those metrics to.

I try then addind one by one adding new services with diferent subcommand (e.g: -s swap). But doesn't work.

Nagios Support Forum

specific metric selection (check_esx3) delet parameter graph

specific metric selection (check_esx3) delet parameter graph

Re: specific metric selection (check_esx3) delet parameter g

Re: specific metric selection (check_esx3) delet parameter g

Re: specific metric selection (check_esx3) delet parameter g

Re: specific metric selection (check_esx3) delet parameter g

Re: specific metric selection (check_esx3) delet parameter g

Re: specific metric selection (check_esx3) delet parameter g

Re: specific metric selection (check_esx3) delet parameter g

Re: specific metric selection (check_esx3) delet parameter g

Re: specific metric selection (check_esx3) delet parameter g