Weird thing happened on the way to the VLAN.
Posted: Tue Apr 30, 2013 9:30 am
OK, strange subject, but it got you to look.
XI2012R1.6
I had one of our other monitoring tools perform a bulk SNMP get from 60 ESX hosts on a particular VLAN yesterday. The VLAN in question has a rate-limiting ACL on it for low-priority traffic (ICMP/SNMP/etc.) put in place by a former network admin.
When this ACL tripped due to that SNMP polling, both the other (unnamed) monitoring tool and NagiosXI alerted to host down for all devices on that VLAN. Nagios host checks all showed RTA NAN. Weird thing was, I could go to the host details screen and ping them just fine. I could traceroute just fine. ICMP from Nagios to all systems on that VLAN was technically working but host checks were offline.
The other monitoring tool recovered within three minutes. Nagios stayed in mode of all hosts reporting down for more than 15. At this point, I restarted the engine from the console and host checks started going through.
Any ideas? Could this indicate a bug in the polling engine that would prevent it from recovering these hosts?
There were 745 hosts being monitored on that VLAN.
Thanks,
AJ
XI2012R1.6
I had one of our other monitoring tools perform a bulk SNMP get from 60 ESX hosts on a particular VLAN yesterday. The VLAN in question has a rate-limiting ACL on it for low-priority traffic (ICMP/SNMP/etc.) put in place by a former network admin.
When this ACL tripped due to that SNMP polling, both the other (unnamed) monitoring tool and NagiosXI alerted to host down for all devices on that VLAN. Nagios host checks all showed RTA NAN. Weird thing was, I could go to the host details screen and ping them just fine. I could traceroute just fine. ICMP from Nagios to all systems on that VLAN was technically working but host checks were offline.
The other monitoring tool recovered within three minutes. Nagios stayed in mode of all hosts reporting down for more than 15. At this point, I restarted the engine from the console and host checks started going through.
Any ideas? Could this indicate a bug in the polling engine that would prevent it from recovering these hosts?
There were 745 hosts being monitored on that VLAN.
Thanks,
AJ