We are attempting to monitor some specific database processes on one of our Linux systems via the Linux SNMP monitoring wizard.
What we have noticed is that the SNMP monitoring wizard queries HOST-RESOURCES-MIB::hrSWRunName. For this particular database (Oracle), all processes it spawns are all listed as 'oracle' (all 200 of them), which prevents us from querying a specific process spawned under the 'oracle' name. Doing an SNMP walk of the MIB tree of the system, we discovered that the MIB we actually want to be querying is HOST-RESOURCES-MIB::hrSWRunPath. This MIB has the process command (as reported by 'top' or 'ps -ef') rather than the process name.
My question is: Is there a simple way to re-point these SNMP process monitor queries using the Linux SNMP monitoring Wizard at hrSWRunPath rather than hrSWRunName, or are we stuck using a custom OID query?
Thanks
Linux SNMP process monitoring hrSWRunPath vs. hrSWRunPath
-
msbensonstk
- Posts: 34
- Joined: Wed Apr 11, 2012 1:01 pm
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Linux SNMP process monitoring hrSWRunPath vs. hrSWRunPat
You should be able to add -f to $ARG1$ of this service to use the full path
./check_snmp_process_wizard.pl -h
SNMP Process Monitor for Nagios version 1.10
GPL licence, (c)2004-2006 Patrick Proy
Usage: ./check_snmp_process_wizard.pl [-v] -H <host> -C <snmp_community> [-2] | (-l login -x passwd) [-p <port>] -n <name> [-w <min_proc>[,<max_proc>] -c <min_proc>[,max_proc] ] [-m<warn Mb>,<crit Mb> -a -u<warn %>,<crit%> -d<delta> ] [-t <timeout>] [-o <octet_length>] [-f -A -F ] [-r] [-V] [-g]
-v, --verbose
print extra debugging information (and lists all storages)
-h, --help
print this help message
-H, --hostname=HOST
name or IP address of host to check
-C, --community=COMMUNITY NAME
community name for the host's SNMP agent (implies SNMP v1 or v2c with option)
-l, --login=LOGIN ; -x, --passwd=PASSWD, -2, --v2c
Login and auth password for snmpv3 authentication
If no priv password exists, implies AuthNoPriv
-2 : use snmp v2c
-X, --privpass=PASSWD
Priv password for snmpv3 (AuthPriv protocol)
-L, --protocols=<authproto>,<privproto>
<authproto> : Authentication protocol (md5|sha : default md5)
<privproto> : Priv protocole (des|aes : default des)
-p, --port=PORT
SNMP port (Default 161)
-n, --name=NAME
Name of the process (regexp)
No trailing slash !
-r, --noregexp
Do not use regexp to match NAME in description OID
-f, --fullpath
Use full path name instead of process name
(Windows doesn't provide full path name)
-A, --param
Add parameters to select processes.
ex : "named.*-t /var/named/chroot" will only select named process with this parameter
-F, --perfout
Add performance output
outputs : memory_usage, num_process, cpu_usage
-w, --warn=MIN[,MAX]
Number of process that will cause a warning
-1 for no warning, MAX must be >0. Ex : -w-1,50
-c, --critical=MIN[,MAX]
number of process that will cause an error (
-1 for no critical, MAX must be >0. Ex : -c-1,50
Notes on warning and critical :
with the following options : -w m1,x1 -c m2,x2
you must have : m2 <= m1 < x1 <= x2
you can omit x1 or x2 or both
-m, --memory=WARN,CRIT
checks memory usage (default max of all process)
values are warning and critical values in Mb
-a, --average
makes an average of memory used by process instead of max
-u, --cpu=WARN,CRIT
checks cpu usage of all process
values are warning and critical values in % of CPU usage
if more than one CPU, value can be > 100% : 100%=1 CPU
-d, --delta=seconds
make an average of <delta> seconds for CPU (default 300=5min)
-g, --getall
In some cases, it is necessary to get all data at once because
process die very frequently.
This option eats bandwidth an cpu (for remote host) at breakfast.
-o, --octetlength=INTEGER
max-size of the SNMP message, usefull in case of Too Long responses.
Be carefull with network filters. Range 484 - 65535, default are
usually 1472,1452,1460 or 1440.
-t, --timeout=INTEGER
timeout for SNMP in seconds (Default: 5)
-V, --version
prints version number
Note :
CPU usage is in % of one cpu, so maximum can be 100% * number of CPU
example :
Browse process list : <script> -C <community> -H <host> -n <anything> -v
the -n option allows regexp in perl format :
All process of /opt/soft/bin : -n /opt/soft/bin/ -f
All 'named' process : -n named