Page 1 of 2

Problem monitoring core switch using snmp

Posted: Fri Oct 17, 2014 10:25 am
by chaosinverti
Hi,

On CentOS 6.5 64bit (from a NagiosXI VMWare image) - see attached for complete system profile - I am trying to monitor a Cisco core switch with the Network Switch / Router Monitoring Wizard.

The facts:
  • the wizard is giving me "No ports were detected on the switch."
  • the following command:

    Code: Select all

    /usr/bin/cfgmaker --show-op-down --noreversedns --zero-speed '100000000' '[email protected]:::::1'
    outputs 14486 lines and shows all ports and detailed info as I would expect it
  • deleting /usr/local/nagiosxi/tmp/mrtgscan-192.168.55.1* and retrying the wizard doesn't help
  • I am unable to get check_snmp_int.pl to work (hangs at some point):

    Code: Select all

    [root@nagios libexec]# ./check_snmp_int.pl -H 192.168.55.1 -C commro -n
    ERROR: Status table : Message size exceeded maxMsgSize.
    [root@nagios libexec]# ./check_snmp_int.pl -H 192.168.55.1 -C commro -o 65535 -n -2
    ERROR: Status table : No response from remote host '192.168.55.1'.
    [root@nagios libexec]# ./check_snmp_int.pl -o 65535 -H 192.168.55.1 -C commro -n -2 -v
    Alarm at 15 + 5
    SNMP v2c login
     actual max octets:: 1472
     new max octets:: 65535
    Filter : 
    OID : 1.3.6.1.2.1.2.2.1.2.214, Desc : TenGigabitEthernet2/2/8--Uncontrolled
    Name : TenGigabitEthernet2/2/8--Uncontrolled, Index : 214
    OID : 1.3.6.1.2.1.2.2.1.2.197, Desc : TenGigabitEthernet2/2/2--Controlled
    Name : TenGigabitEthernet2/2/2--Controlled, Index : 197
    OID : 1.3.6.1.2.1.2.2.1.2.112, Desc : TenGigabitEthernet2/1/4--Controlled
    [...]
    OID : 1.3.6.1.2.1.2.2.1.2.203, Desc : TenGigabitEthernet2/2/4--Controlled
    Name : TenGigabitEthernet2/2/4--Controlled, Index : 203
    ERROR: Status table : No response from remote host '192.168.55.1'.
    
  • Running check_snmp_int.pl with -t 60 yeilds the same output, hangs at the same interface, except that the error message is "No answer from host"
  • snmpwalk completes successfully and doesn't get interrupted, although it takes quite some time:

    Code: Select all

    [root@nagios libexec]# time snmpwalk -v2c -c commro 192.168.55.1 | wc -l
    19970
    
    real	1m42.172s
    user	0m1.846s
    sys	0m1.472s
    
Thoughts?

Thanks!
-Alex

Re: Problem monitoring core switch using snmp

Posted: Fri Oct 17, 2014 11:46 am
by sreinhardt
Since it seems to work properly from command line, let's try the following:

/usr/bin/cfgmaker --show-op-down --noreversedns --zero-speed '100000000' '[email protected]:::::2' >> /usr/local/nagiosxi/tmp/mrtgscan-192.168.55.1
touch /usr/local/nagiosxi/tmp/mrtgscan-192.168.55.1.done

Go back to the network switch and router wizard, make sure to have the ip address as the device to scan, and remove the "Force scan" checkbox. This should cause it to scan the output from your manual scan. I would point out however, it seems that either your device has some issues with it's snmp daemon, or the connection to said device my be having a few issues. For so many tools to have similar issues seems to pretty heavily point at your device being a large part of the issue. Not that we can't work with it, just that there might be further issues going forward as well.

Re: Problem monitoring core switch using snmp

Posted: Fri Oct 17, 2014 11:59 am
by Box293
Also, can you please time the cfgmaker command:

Code: Select all

time /usr/bin/cfgmaker --show-op-down --noreversedns --zero-speed '100000000' '[email protected]:::::1'

Re: Problem monitoring core switch using snmp

Posted: Fri Oct 17, 2014 1:53 pm
by chaosinverti
Thanks for your reply.

cfgmaker returns relatively quickly:

Code: Select all

real	0m6.822s
user	0m1.793s
sys	0m0.163s
The only checkbox I have in the wizard is "Scan Interfaces" and when unchecked, the next step offers me to ping the device only, despite the fact that I generated the files manually:

Code: Select all

[root@nagios ~]# ls -l /usr/local/nagiosxi/tmp/mrtgscan-192.168.55.1
-rw-r--r-- 1 apache nagios 319390 Oct 17 14:13 /usr/local/nagiosxi/tmp/mrtgscan-192.168.55.1
[root@nagios ~]# ls -l /usr/local/nagiosxi/tmp/mrtgscan-192.168.55.1.done 
-rw-r--r-- 1 apache nagios 0 Oct 17 14:14 /usr/local/nagiosxi/tmp/mrtgscan-192.168.55.1.done
Also, if it's an issue with the switch/network, wouldn't snmpwalk also run into issues?

Re: Problem monitoring core switch using snmp

Posted: Fri Oct 17, 2014 2:03 pm
by sreinhardt
Also, if it's an issue with the switch/network, wouldn't snmpwalk also run into issues?
Well that really depends on how both apps scan things. Considering mrtgscan seemed to work via command line, I was thinking it was a timeout we may have applied for the wizards. The reason I suggested it being your device or path to it, was due to multiple tools (cfgmaker, check_snmp_int.pl and slowness with snmpwalk) all having strange issues. Is this a stacked set of switches? Any model or config information that might be important for us to know as you go through configuring this?

Re: Problem monitoring core switch using snmp

Posted: Thu Oct 23, 2014 9:07 am
by chaosinverti
It is a Cisco Catalyst 4500-X core switch, not stacked per say, but using VSS with another 4500-X.

Just out of curiosity, are there known limitations to the amount of data check_snmp_int.pl can handle or anything else you can think of?

Re: Problem monitoring core switch using snmp

Posted: Thu Oct 23, 2014 10:59 am
by sreinhardt
The only limitation I'm curious about but we have not confirmed would be 10GB+ speeds on a single interface. 1gb and lower should have absolutely no issues, and considering each port is handled individually it should not be a per check issue. How many ports does that guy have? Have you moved the config from cfgmaker into the /etc/mrtg/conf.d/ directory and seen if mrtg properly scans each interface and creates rrds?

Re: Problem monitoring core switch using snmp

Posted: Thu Oct 23, 2014 3:05 pm
by chaosinverti
Each switch has 16 ports (all 10Gb/sec)

I was able to copy the file and run mrtg to generate rrd files for the switch. However it graphs 83 ports (it sees VLANs and port-channels too).

Re: Problem monitoring core switch using snmp

Posted: Fri Oct 24, 2014 11:54 am
by lmiltchev
I was able to copy the file and run mrtg to generate rrd files for the switch. However it graphs 83 ports (it sees VLANs and port-channels too).
It expected behavior. You can remove the "extra" entries from the config and delete these checks in the CCM.

Re: Problem monitoring core switch using snmp

Posted: Wed Oct 29, 2014 10:00 am
by chaosinverti
Extra checks would be good news IMO :)
I still can't add monitoring for the switch (Network Switch / Router Monitoring Wizard still displays "No ports were detected on the switch")...