Page 1 of 1

Help to make check_snmp_temperature work

Posted: Thu Apr 01, 2021 11:11 am
by kCyborg
Hi, I recently installed Nagios-Core 4.4.6 in order to monitor a lots of virtual servers on my network, all works flawlessly, I'm able to check pretty most of the stats/services I wanna check. But I would like to, also, check the temperature of a physical server (wich is a Dell server running Proxmox on top of Debian10) and of course show the temperature on the Nagios's local web page.

I found this amazing plugin check_snmp_temperature https://exchange.nagios.org/directory/P ... re/details but, altought it seems very easy to use, I can't make it work, I keep getting this error message:

Code: Select all

server:/usr/local/nagios/etc# /usr/lib/nagios/plugins/./check_snmp_temperature.pl -H 192.168.50.230 -C public -T dell -d .1.3.6.1.4.1.28402.3.3.3.1.5.1 -a'.' -o C -w 30 -c 35

Please either specify specify system type (-T) OR base SNMP OIDs for name (-N) and data (-D) tables OR exact list of sensor names (-n) and data OIDs (-d) !
Usage: /usr/lib/nagios/plugins/./check_snmp_temperature.pl [-v] -H <host> -C <snmp_community> [-2] | (-l login -x passwd [-X pass -L <authp>,<privp>])  [-p <port>] [-t <timeout>] -T dell|hp|cisco1|juniper|alteon|lmsensors | [-N <oid_attribnames> -D <oid_attribdata>] | [-n <list of sensor names> -d <list of sensor oids>] [-a <attributes to check> -w <warn levels> -c <crit levels> [-f]] [-A <attributes for perfdata>] [-o <out_temp_unit: C|F|K>] [-i <in_temp_unit>] [-u <unknown_default>] [-V]
(I also tried with changing the -H option and setting the local IP 127.0.0.1, but I get the same answer)

I know I need to check this part:
Please either specify specify system type (-T) OR base SNMP OIDs for name (-N) and data (-D) tables OR exact list of sensor names (-n) and data OIDs (-d) !
But I dig into the Git page https://github.com/willixix/WL-NagiosPl ... erature.pl and apparently I'm tooo way dumb to find what I'm doing wrong.

Could you help me?

Re: Help to make check_snmp_temperature work

Posted: Thu Apr 01, 2021 12:58 pm
by mcapra
I read this as an exclusive or:

Code: Select all

Please either specify specify system type (-T) OR base SNMP OIDs for name (-N) and data (-D) tables OR exact list of sensor names (-n) and data OIDs (-d) !
And the code verifies this behavior:
https://github.com/willixix/WL-NagiosPl ... #L580-L582

You're defining $o_type (the -T argument) in addition to one of either:
  • $oid_names (-N)
  • $oid_data (-D)
  • $o_sensornames (-n)
  • $o_sensorids (-d)
In your case, you're defining -T in addition to -d, which violates the exclusive or mentioned in the error message.

As to why that exclusive or exists, I imagine because when the plugin is passed the -T argument, it attempts to map the provided string to a set of pre-defined values based on whatever the -T value is:
https://github.com/willixix/WL-NagiosPl ... #L243-L252

So if you pass -T dell, it's going to plug these in automatically based on the map I linked above and some default values:

Code: Select all

-N 1.3.6.1.4.1.674.10892.1.700.20.1.8
-D 1.3.6.1.4.1.674.10892.1.700.20.1.6
-i 10C
According to this plugin, the $o_sensoroids (-d) value is useless without the $o_sensornames (-n) value. Presumably because it's trying to match the provided oids -d to the provided names -n.

I dunno anything about this plugin for the record. I'm just interpreting the Perl. Try removing the -d argument and see what you get.

Re: Help to make check_snmp_temperature work

Posted: Thu Apr 01, 2021 1:35 pm
by kCyborg
mcapra wrote: I dunno anything about this plugin for the record. I'm just interpreting the Perl. Try removing the -d argument and see what you get.
Hi mate, thanks very much for your answer, I tried removing the -d option, and I get a different error:

Code: Select all

ERROR: Alarm signal (Nagios time-out)
For whatever it serves, this is how I defined the service in localhost.cfg (a similar definition is on the phisical server Im trying to monitor):

Code: Select all

define service {

    use                     local-service
    host_name                   localhost
    service_description      Temperature
    check_command           check_temp!CPU,Ambient,Bottom!110,90,0!135,110,0
    notifications_enabled   1
}
And the commands.cfg:

Code: Select all

define command{

    command_name    check_temp
    command_line       $USER1$/check_snmp_temperature.pl -H $HOSTADDRESS$ -C public -N .1.3.6.1.4.1.674.10892.1.700.20.1.8 -i 10C -o F -u 0 -a ARG1$ -w $ARG2$ -c $ARG3$ -f

}

Re: Help to make check_snmp_temperature work

Posted: Fri Apr 02, 2021 3:01 pm
by benjaminsmith
Hi @kCyborg

Hi mate, thanks very much for your answer, I tried removing the -d option, and I get a different error:
CODE: SELECT ALL
ERROR: Alarm signal (Nagios time-out)
I would recommend trying that again but increase the timeout this time. The default is only 5 seconds. Also, add the --verbose option for extra debugging output.

Plugin Options:
./check_snmp_temperature.pl -h

SNMP Temperature Monitor for Nagios version 0.34
by William Leibzon - william(at)leibzon.org

Usage: ./check_snmp_temperature.pl [-v] -H <host> -C <snmp_community> [-2] | (-l login -x passwd [-X pass -L <authp>,<privp>]) [-p <port>] [-t <timeout>] -T dell|hp|cisco1|juniper|alteon | [-N <oid_attribnames> -D <oid_attribdata>] | [-n <list of sensor names> -d <list of sensor oids>] [-a <attributes to check> -w <warn levels> -c <crit levels> [-f]] [-A <attributes for perfdata>] [-o <out_temp_unit: C|F|K>] [-i <in_temp_unit>] [-u <unknown_default>] [-V]
-v, --verbose
print extra debugging information
-h, --help
print this help message
-H, --hostname=HOST
name or IP address of host to check
-C, --community=COMMUNITY NAME
community name for the host's SNMP agent (implies v 1 protocol)
-2, --v2c
Use snmp v2c
-l, --login=LOGIN ; -x, --passwd=PASSWD
Login and auth password for snmpv3 authentication
If no priv password exists, implies AuthNoPriv
-X, --privpass=PASSWD
Priv password for snmpv3 (AuthPriv protocol)
-L, --protocols=<authproto>,<privproto>
<authproto> : Authentication protocol (md5|sha : default md5)
<privproto> : Priv protocole (des|aes : default des)
-P, --port=PORT
SNMP port (Default 161)
-w, --warn=INT[,INT[,INT[..]]]
warning temperature level(s) (if more then one attribute is checked, must have multiple values)
-c, --crit=INT[,INT[,INT[..]]]
critical temperature level(s) (if more then one attribute is checked, must have multiple values)
-f, --perfdata
Perfparse compatible output
-t, --timeout=INTEGER
timeout for SNMP in seconds (Default: 5)
-V, --version
prints version number
-N, --oidtable_attribnames=OID_STRING
Base table OID to walk through to find names of those attributes supported and from that corresponding data OIDs
-D, --oidtable_attribdata=OID_STRING
Base table OID for sensor attribute data, one number is added to that to make up full attribute OID
-n, --sensor_names=STRING[,STRING[..]]
List of sensor names when -N is not used and sensors are specified with exeact oids
-d, --sensor_oids=OID_STRING[,OID_STRING[..]]
List of exact data OIDs for sensors specified with -n (specify this when -N and -D are not used)
-a, --attributes=STRING[,STRING[..]]
Which attribute(s) to check. This is used as regex to check if attribute is found in sensor names.
As an example for Dell the attribute names to use are: PROC_1, PROC_2, Ambient, Planar, Riser
-A, --perf_attributes=STRING[,STRING[..]]
Which attribute(s) to add to as part of performance data output. These names can be different then the
ones listed in '-a' to only output attributes in perf data but not check. Special value of '*' gets them all.
-f, --perfparse
Used only with '-a'. Causes to output data not only in main status line but also as perfparse output
-o --out_temp_unit=C|F|K
What temperature measurement units are used for output and warning/critical - 'C', 'F' or 'K' - default is 'C'
-i --in_temp_unit=[num]C|F|K
What temperature measurement reported by data OID - format is <num>C|F|K (default is 'C')
where num is used if data is num*realdata, i.e. if reported data of 330 means 33C, then it is: -i 10C
-u, --unknown_default=INT
If attribute is not found then report the output as this number (i.e. -u 0)
-T, --type=dell|hp|cisco1|juniper|alteon
This allows to use pre-defined system type to set Base, Data OIDs and incoming temperature measurement type
Currently support systems types are: dell, hp, cisco1 (7500, 5500, 2948, etc), juniper, alteon
Let us know how what you find out.