Page 1 of 1

Weird thing happened on the way to the VLAN.

Posted: Tue Apr 30, 2013 9:30 am
by vAJ
OK, strange subject, but it got you to look.

XI2012R1.6

I had one of our other monitoring tools perform a bulk SNMP get from 60 ESX hosts on a particular VLAN yesterday. The VLAN in question has a rate-limiting ACL on it for low-priority traffic (ICMP/SNMP/etc.) put in place by a former network admin.

When this ACL tripped due to that SNMP polling, both the other (unnamed) monitoring tool and NagiosXI alerted to host down for all devices on that VLAN. Nagios host checks all showed RTA NAN. Weird thing was, I could go to the host details screen and ping them just fine. I could traceroute just fine. ICMP from Nagios to all systems on that VLAN was technically working but host checks were offline.

The other monitoring tool recovered within three minutes. Nagios stayed in mode of all hosts reporting down for more than 15. At this point, I restarted the engine from the console and host checks started going through.

Any ideas? Could this indicate a bug in the polling engine that would prevent it from recovering these hosts?

There were 745 hosts being monitored on that VLAN.

Thanks,
AJ

Re: Weird thing happened on the way to the VLAN.

Posted: Tue Apr 30, 2013 10:38 am
by abrist
What interval are these host checks run at? What check/plugin are you using to check host status? Was there subsequent checks that failed during those 15 minutes (check the event log)?

Re: Weird thing happened on the way to the VLAN.

Posted: Tue Apr 30, 2013 11:33 am
by vAJ
All default, 5min, no special check command, no - only host checks for all systems on this one VLAN.

Re: Weird thing happened on the way to the VLAN.

Posted: Tue Apr 30, 2013 1:59 pm
by scottwilkerson
Out of curiosity, do you have any custom broker_modules installed (ie. livestatus)?

Code: Select all

cat /usr/local/nagios/etc/nagios.cfg|grep broker

Re: Weird thing happened on the way to the VLAN.

Posted: Tue Apr 30, 2013 2:49 pm
by vAJ

Code: Select all

broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg
event_broker_options=-1
That's all.

Re: Weird thing happened on the way to the VLAN.

Posted: Tue Apr 30, 2013 4:00 pm
by abrist
Are these hosts checked with an active icmp/ping/keep-host-alive check, or through passive means?

Re: Weird thing happened on the way to the VLAN.

Posted: Wed May 01, 2013 9:09 am
by vAJ
Active.

Re: Weird thing happened on the way to the VLAN.

Posted: Wed May 01, 2013 2:46 pm
by abrist
Well, what is the retry interval on those checks (and number of retries)?