Page 1 of 2

Can't get my ProCurve monitoring to work.

Posted: Tue Aug 14, 2012 5:15 pm
by jbruyet
Hey all, I just downloaded two .cfg files so I can see a little more deeply into my ProCurve switches. HOWEVER, I've hit a snag and I'm not sure what the problem is. Here's the Command Definition:

Code: Select all

define command{ 
  command_name check_hpmemoryfree 
  command_line $USER1$/check_snmp -H $HOSTADDRESS$ -C $ARG1$ -o 1.3.6.1.4.1.11.2.14.11.5.1.1.2.1.1.1.6.1 -t 5 -w $ARG2$ -c $ARG3$ -u bytes -l free 
 } 
and here's the Service Definition:

Code: Select all

# Service definition MEM-FREE
define service{
        use                             generic-service         ; Name of service template to use
        hostgroup                       switches
        service_description             MEM-FREE
        is_volatile                     0
        check_period                    24x7
        max_check_attempts              3
        normal_check_interval           5
        retry_check_interval            1
        notification_interval           240
        notification_period             24x7
        notification_options            c,r
        check_command                   check_hpmemoryfree!nagios!2000:30000000!1000:30000000
        }
and here's what I see on the Nagios web page:

Code: Select all

MEM-FREE 	UNKNOWN 	08-14-2012 14:54:32 	0d 1h 48m 51s 	3/3 	External command error: Timeout: No Response from 192.168.2.20:161. 
BUT, if I run the command from the command line here's what I see:

Code: Select all

[root@link libexec]# ./check_snmp -H 192.168.2.20 -o 1.3.6.1.4.1.11.2.14.11.5.1.1.2.1.1.1.6.1
SNMP OK - 24307576 | iso.3.6.1.4.1.11.2.14.11.5.1.1.2.1.1.1.6.1=24307576
Any ideas on why it works from the command line and not from within Nagios? I copied the config lines from the downloaded file to my file.

Thanks,

Joe B

Re: Can't get my ProCurve monitoring to work.

Posted: Wed Aug 15, 2012 6:34 pm
by jbruyet
Quick question -- if there's a bad argument in the Command definition would that give me the "No Response" error? If not I can't see anything in either .cfg file that would cause the problem.

Thanks,

Joe B

Re: Can't get my ProCurve monitoring to work.

Posted: Fri Aug 17, 2012 9:58 am
by agriffin
The error message you're seeing is from the plugin, not from Nagios itself, so the bad argument is only the cause or the error if it also happens at the command line. Try testing the plugin as the nagios user, and seeing how long it takes to execute:

Code: Select all

# su nagios -s /bin/bash -c "time ./check_snmp -H 192.168.2.20 -o 1.3.6.1.4.1.11.2.14.11.5.1.1.2.1.1.1.6.1"

Re: Can't get my ProCurve monitoring to work.

Posted: Fri Aug 17, 2012 2:07 pm
by jbruyet
Hi agriffin, running that command took a little less than one second.

Thanks,

Joe B

Re: Can't get my ProCurve monitoring to work.

Posted: Fri Aug 17, 2012 2:42 pm
by agriffin
And did it give you the output you expected above the timing information (it should have been the same as when you tested it as the root user)?

Re: Can't get my ProCurve monitoring to work.

Posted: Fri Aug 17, 2012 3:18 pm
by jbruyet
Here's something--I'm experimenting with the arguments and after making the last change I was checking to see when the next check was and noticed that the OID isn'g getting passed along. Check out the Status information:

Code: Select all

Service State Information
Current Status:	
  UNKNOWN  
 (for 3d 0h 5m 41s)
Status Information:	No OIDs specified
Performance Data:	
Current Attempt:	3/3  (HARD state)
Last Check Time:	08-17-2012 13:10:36
Check Type:	ACTIVE
Check Latency / Duration:	0.369 / 0.074 seconds
Next Scheduled Check:  	08-17-2012 13:15:36
Last State Change:	08-14-2012 13:07:32
Last Notification:	N/A (notification 0)
Is This Service Flapping?	
  NO  
 (0.00% state change)
In Scheduled Downtime?	
  NO  
Last Update:	08-17-2012 13:13:05  ( 0d 0h 0m 8s ago)
I'm not fluent enough in Nagios to see what the problem is. Can you or anyone else see why the OID wouldn't get passed along?

Oh yeah, running the command straight up from the command line gave me the correct information.

Thanks,

Joe B

Re: Can't get my ProCurve monitoring to work.

Posted: Tue Aug 21, 2012 4:55 pm
by jbruyet
I've tried moving $ARG1$ around in the command but no joy. I've even tried running no arguments but I still get the "No OIDs specified" error. I'm running out of things to experiment with.

Thanks,

Joe B

Re: Can't get my ProCurve monitoring to work.

Posted: Wed Aug 22, 2012 1:38 pm
by agriffin
Can you post an example of a host definition you're using for one of these procurve switches?

Re: Can't get my ProCurve monitoring to work.

Posted: Wed Aug 22, 2012 4:19 pm
by jbruyet
Hi agriffin, here's one of my Host Definitions:

Code: Select all

define host{
        use             generic-switch
        host_name       OS-2524KVMBch
        alias           ProCurve 2524 OS KVM Bench
        address         192.168.2.18
        hostgroups      switches
        }
I have a couple of basic checks using my Hostgroup definition, a check_ping and a check_snmp for uptime, and they work just fine. I can run the mem_free script from a command line and I get the free memory on the switch; I just can't seem to figure out why my OIDs are staying invisible from my switch.cfg file.

Thanks,

Joe B

Re: Can't get my ProCurve monitoring to work.

Posted: Fri Aug 24, 2012 7:00 pm
by jbruyet
I can get these checks to work from the command line so I'm thinking it's something with the syntax or the arguments in the command definition, or maybe in the service definition. I've been looking online trying to find some examples that I can compare my definitions to but I'm only finding very generic stuff. For example, one question I have is why is there a "nagios" parameter in my service definition? Does anyone know of any tutorials I can look at to see if my problem might be syntax- or argument-related?

Thanks,

Joe B