hello,
how to define threshold to raise an alert for esx runtime status, disk I/O, networking. As it just printing issues detected with OK status not going to critical/ warning state.
check_esx3 : unable to raise alert for health & IO
check_esx3 : unable to raise alert for health & IO
You do not have the required permissions to view the files attached to this post.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: check_esx3 : unable to raise alert for health & IO
you would need to run the subcommand
for example on the io command add
Also, for you warning and critical values, if you just use 1 it means >1 so it would not alert if == 1
for example on the io command add
Code: Select all
-s readCode: Select all
Usage: check_esx3.pl -D <data_center> | -H <host_name> [ -N <vm_name> ]
-u <user> -p <pass> | -f <authfile>
-l <command> [ -s <subcommand> ]
[ -t <timeout> ] [ -w <warn_range> ] [ -c <crit_range> ]
[ -V ] [ -h ]
-?, --usage
Print usage information
-h, --help
Print detailed help screen
-V, --version
Print version information
--extra-opts=[section][@file]
Read options from an ini file. See https://nagios-plugins.org/doc/extra-opts.html
for usage and examples.
-H, --host=<hostname>
ESX or ESXi hostname.
-D, --datacenter=<DCname>
Datacenter hostname.
-N, --name=<vmname>
Virtual machine name.
-u, --username=<username>
Username to connect with.
-p, --password=<password>
Password to use with the username.
-f, --authfile=<path>
Authentication file with login and password. File syntax :
username=<login>
password=<password>
-w, --warning=THRESHOLD
Warning threshold. See
http://nagiosplug.sourceforge.net/developer-guidelines.html#THRESHOLDFORMAT
for the threshold format.
-c, --critical=THRESHOLD
Critical threshold. See
http://nagiosplug.sourceforge.net/developer-guidelines.html#THRESHOLDFORMAT
for the threshold format.
-l, --command=COMMAND
Specify command type (CPU, MEM, NET, IO, VMFS, RUNTIME, ...)
-s, --subcommand=SUBCOMMAND
Specify subcommand
-S, --sessionfile=SESSIONFILE
Specify a filename to store sessions for faster authentication
-t, --timeout=INTEGER
Seconds before plugin times out (default: 30)
-v, --verbose
Show details for command-line debugging (can repeat up to 3 times)
Supported commands(^ means blank or not specified parameter) :
Common options for VM, Host and DC :
* cpu - shows cpu info
+ usage - CPU usage in percentage
+ usagemhz - CPU usage in MHz
^ all cpu info
* mem - shows mem info
+ usage - mem usage in percentage
+ usagemb - mem usage in MB
+ swap - swap mem usage in MB
+ overhead - additional mem used by VM Server in MB
+ overall - overall mem used by VM Server in MB
^ all mem info
* net - shows net info
+ usage - overall network usage in KB/s
+ receive - receive in KB/s
+ send - send in KB/s
^ all net info
* io - shows disk io info
+ read - read latency in ms
+ write - write latency in ms
^ all disk io info
* runtime - shows runtime info
+ status - overall host status (gray/green/red/yellow)
+ issues - all issues for the host
^ all runtime info
VM specific :
* cpu - shows cpu info
+ wait - CPU wait in ms
* mem - shows mem info
+ swapin - swapin mem usage in MB
+ swapout - swapout mem usage in MB
+ active - active mem usage in MB
* io - shows disk I/O info
+ usage - overall disk usage in MB/s
* runtime - shows runtime info
+ con - connection state
+ cpu - allocated CPU in MHz
+ mem - allocated mem in MB
+ state - virtual machine state (UP, DOWN, SUSPENDED)
+ consoleconnections - console connections to VM
+ guest - guest OS status, needs VMware Tools
+ tools - VMWare Tools status
Host specific :
* net - shows net info
+ nic - makes sure all active NICs are plugged in
* io - shows disk io info
+ aborted - aborted commands count
+ resets - bus resets count
+ kernel - kernel latency in ms
+ device - device latency in ms
+ queue - queue latency in ms
* vmfs - shows Datastore info
+ (name) - info for datastore with name (name)
^ all datastore info
* runtime - shows runtime info
+ con - connection state
+ health - checks cpu/storage/memory/sensor status
+ maintenance - shows whether host is in maintenance mode
+ list(vm) - list of VMWare machines and their statuses
* service - shows Host service info
+ (names) - check the state of one or several services specified by (names), syntax for (names):<service1>,<service2>,...,<serviceN>
^ show all services
DC specific :
* io - shows disk io info
+ aborted - aborted commands count
+ resets - bus resets count
+ kernel - kernel latency in ms
+ device - device latency in ms
+ queue - queue latency in ms
* vmfs - shows Datastore info
+ (name) - info for datastore with name (name)
^ all datastore info
* runtime - shows runtime info
+ list(vm) - list of VMWare machines and their statuses
+ listhost - list of VMWare esx host servers and their statuses
* recommendations - shows recommendations for cluster
+ (name) - recommendations for cluster with name (name)
^ all clusters recommendations
Copyright (c) 2008 op5Re: check_esx3 : unable to raise alert for health & IO
But in that way, i would require to set different checks for disk read latency, write latency,.... which i don't want to configure it.
Regarding runtime, I'm unable to generate alert even if there is any health/config issue. It just prints errors with OK status. Also, If I provide health as sub command it prints health issues with UNKNOWN status that should be CRITICAL. I want - 1) Disk I/O read, write latency in one check should raise alert if any latency time exceeds given thresholds.
2) Runtime with health/issues subcommand - should generate an critical alert if there is any health issue/config issue detected rather showing UNKNOWN/ OK
I think @ lmiltchev have idea about this. he solved my session file issue for check_esx3. so,@ lmiltchev could you please help me ?
Regarding runtime, I'm unable to generate alert even if there is any health/config issue. It just prints errors with OK status. Also, If I provide health as sub command it prints health issues with UNKNOWN status that should be CRITICAL. I want - 1) Disk I/O read, write latency in one check should raise alert if any latency time exceeds given thresholds.
2) Runtime with health/issues subcommand - should generate an critical alert if there is any health issue/config issue detected rather showing UNKNOWN/ OK
I think @ lmiltchev have idea about this. he solved my session file issue for check_esx3. so,@ lmiltchev could you please help me ?
You do not have the required permissions to view the files attached to this post.
Re: check_esx3 : unable to raise alert for health & IO
Hi @ sac1472,
I will try to help you with both issues.
https://exchange.nagios.org/directory/P ... ti/details
You could set up a simple config in the libexec directory, e.g. "multi.cfg", something like this:
Then, you could run:
Example:
If you are unsure of how to set up your service check after you tested the plugin from the command line, please follow the steps, outlined in the document below:
https://assets.nagios.com/downloads/nag ... ios-XI.pdf
It is possible that you are receiving an "OK", because (according to the output you shown us) you have "1 health issue(s)", and your critical threshold is "-c 1". So, you haven't exceeded the critical threshold, and the exit code must be "0". Can you run your command a few times with different threshold (and without passing any thresholds), and show the exit code? I just want to make sure that the plugin exits with the correct codes for you.
Examples:
I will try to help you with both issues.
The check_esx3.pl plugin is not designed to work with passing two sub-commands at the same time, e.g. "-s read -s write", "-s read,write", etc. However, you could still set up one check for both (read and write) via check_multi:1) Disk I/O read, write latency in one check should raise alert if any latency time exceeds given thresholds.
https://exchange.nagios.org/directory/P ... ti/details
You could set up a simple config in the libexec directory, e.g. "multi.cfg", something like this:
Code: Select all
command [ IO_read ] = check_esx3.pl -D <ip address> -f /path/to/the/auth.txt -l IO -s read -w <warning value> -c <critical value>
command [ IO_write ] = check_esx3.pl -D <ip address> -f /path/to/the/auth.txt -l IO -s write -w <warning value> -c <critical value>Code: Select all
/usr/local/nagios/libexec/check_multi -f multi.cfgCode: Select all
[nagios@main-nagios-xi libexec]$ ./check_multi -f multi.cfg
CRITICAL - 2 plugins checked, 1 critical (IO_write), 1 ok
[ 1] IO_read CHECK_ESX3.PL OK - io read latency=0 ms
[ 2] IO_write CHECK_ESX3.PL CRITICAL - io write latency=8 ms |check_multi::check_multi::plugins=2 time=1.771847 IO_read::check_esx3.pl::io_read=0ms;;1 IO_write::check_esx3.pl::io_write=8ms;;1https://assets.nagios.com/downloads/nag ... ios-XI.pdf
Are you saying that you are seeing errors in the output, however you are still getting an OK state? UNKNOWN errors are usually caused by mis-configuration...2) Runtime with health/issues subcommand - should generate an critical alert if there is any health issue/config issue detected rather showing UNKNOWN/ OK
It is possible that you are receiving an "OK", because (according to the output you shown us) you have "1 health issue(s)", and your critical threshold is "-c 1". So, you haven't exceeded the critical threshold, and the exit code must be "0". Can you run your command a few times with different threshold (and without passing any thresholds), and show the exit code? I just want to make sure that the plugin exits with the correct codes for you.
Examples:
Code: Select all
./check_esx3.pl.newest -D xxx -f /tmp/VC6 -H xxx -l runtime
echo $?
./check_esx3.pl.newest -D xxx -f /tmp/VC6 -H xxx -l runtime -c 1
echo $?
./check_esx3.pl.newest -D xxx -f /tmp/VC6 -H xxx -l runtime -c 0
echo $?Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: check_esx3 : unable to raise alert for health & IO
Thanks @lmiltchev for detailed response.
1) I have configured check_multi & that's working fine.
2) i unable to generate an config issue on esx, for testing. but i found, one ESX is in red state still check_esx3 showing OK state. see below screenshot.
1) I have configured check_multi & that's working fine.
2) i unable to generate an config issue on esx, for testing. but i found, one ESX is in red state still check_esx3 showing OK state. see below screenshot.
You do not have the required permissions to view the files attached to this post.
Re: check_esx3 : unable to raise alert for health & IO
OK, it seems like that the runtime check includes various "sub-checks", e.g. con, health, storagehealth, etc. The threshold will not work, unless you specify a sub-command.
Try:
You can also try passing some thresholds, for example: "-c 1", "-c 0" to see if the output will change.
Note: If you don't specify a sub-command, the check will return "all runtime info".
Try:
Code: Select all
./check_esx3.pl.newest -D xxx -f /tmp/VC6 -H xxx -l runtime -s status
echo $?Note: If you don't specify a sub-command, the check will return "all runtime info".
* runtime - shows runtime info
+ con - connection state
+ health - checks cpu/storage/memory/sensor status and propagates worst state
o listitems - list all available sensors(use for listing purpose only)
o blackregexpflag - whether to treat blacklist as regexp
b - blacklist status objects
+ storagehealth - storage status check
o blackregexpflag - whether to treat blacklist as regexp
b - blacklist status objects
+ temperature - temperature sensors
o blackregexpflag - whether to treat blacklist as regexp
b - blacklist status objects
+ sensor - threshold specified sensor
+ maintenance - shows whether host is in maintenance mode
+ list(vm) - list of VMWare machines and their statuses
+ status - overall object status (gray/green/red/yellow)
+ issues - all issues for the host
b - blacklist issues
^ all runtime info(health, storagehealth, temperature and sensor are represented as one value and no thresholds)
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: check_esx3 : unable to raise alert for health & IO
Yes, i'm able to get alert with sub-checks.
Actually, i want- It should raise alert without specifying any sub-checks. that will be best setup. let me know, if that's possible otherwise I will go with, check_multi with different sub-commands
Thanks for your help.
You can mark this topic as close.
Actually, i want- It should raise alert without specifying any sub-checks. that will be best setup. let me know, if that's possible otherwise I will go with, check_multi with different sub-commands
Thanks for your help.