Nagios Support Forum

Posted: **Tue Apr 30, 2013 9:30 am**

OK, strange subject, but it got you to look.

XI2012R1.6

I had one of our other monitoring tools perform a bulk SNMP get from 60 ESX hosts on a particular VLAN yesterday. The VLAN in question has a rate-limiting ACL on it for low-priority traffic (ICMP/SNMP/etc.) put in place by a former network admin.

When this ACL tripped due to that SNMP polling, both the other (unnamed) monitoring tool and NagiosXI alerted to host down for all devices on that VLAN. Nagios host checks all showed RTA NAN. Weird thing was, I could go to the host details screen and ping them just fine. I could traceroute just fine. ICMP from Nagios to all systems on that VLAN was technically working but host checks were offline.

The other monitoring tool recovered within three minutes. Nagios stayed in mode of all hosts reporting down for more than 15. At this point, I restarted the engine from the console and host checks started going through.

Any ideas? Could this indicate a bug in the polling engine that would prevent it from recovering these hosts?

There were 745 hosts being monitored on that VLAN.

Thanks,
AJ

Posted: **Tue Apr 30, 2013 10:38 am**

What interval are these host checks run at? What check/plugin are you using to check host status? Was there subsequent checks that failed during those 15 minutes (check the event log)?

Posted: **Tue Apr 30, 2013 11:33 am**

All default, 5min, no special check command, no - only host checks for all systems on this one VLAN.

Posted: **Tue Apr 30, 2013 1:59 pm**

Out of curiosity, do you have any custom broker_modules installed (ie. livestatus)?

Code: Select all

cat /usr/local/nagios/etc/nagios.cfg|grep broker

Posted: **Tue Apr 30, 2013 2:49 pm**

Code: Select all

broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg
event_broker_options=-1

That's all.

Posted: **Tue Apr 30, 2013 4:00 pm**

Are these hosts checked with an active icmp/ping/keep-host-alive check, or through passive means?

Posted: **Wed May 01, 2013 9:09 am**

Active.

Posted: **Wed May 01, 2013 2:46 pm**

Well, what is the retry interval on those checks (and number of retries)?

Nagios Support Forum

Weird thing happened on the way to the VLAN.

Weird thing happened on the way to the VLAN.

Re: Weird thing happened on the way to the VLAN.

Re: Weird thing happened on the way to the VLAN.

Re: Weird thing happened on the way to the VLAN.

Re: Weird thing happened on the way to the VLAN.

Re: Weird thing happened on the way to the VLAN.

Re: Weird thing happened on the way to the VLAN.

Re: Weird thing happened on the way to the VLAN.