Weird thing happened on the way to the VLAN.

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Weird thing happened on the way to the VLAN.

Post by vAJ »

OK, strange subject, but it got you to look.

XI2012R1.6

I had one of our other monitoring tools perform a bulk SNMP get from 60 ESX hosts on a particular VLAN yesterday. The VLAN in question has a rate-limiting ACL on it for low-priority traffic (ICMP/SNMP/etc.) put in place by a former network admin.

When this ACL tripped due to that SNMP polling, both the other (unnamed) monitoring tool and NagiosXI alerted to host down for all devices on that VLAN. Nagios host checks all showed RTA NAN. Weird thing was, I could go to the host details screen and ping them just fine. I could traceroute just fine. ICMP from Nagios to all systems on that VLAN was technically working but host checks were offline.

The other monitoring tool recovered within three minutes. Nagios stayed in mode of all hosts reporting down for more than 15. At this point, I restarted the engine from the console and host checks started going through.

Any ideas? Could this indicate a bug in the polling engine that would prevent it from recovering these hosts?

There were 745 hosts being monitored on that VLAN.

Thanks,
AJ
Andrew J. - Do you even grok?
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Weird thing happened on the way to the VLAN.

Post by abrist »

What interval are these host checks run at? What check/plugin are you using to check host status? Was there subsequent checks that failed during those 15 minutes (check the event log)?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: Weird thing happened on the way to the VLAN.

Post by vAJ »

All default, 5min, no special check command, no - only host checks for all systems on this one VLAN.
Andrew J. - Do you even grok?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Weird thing happened on the way to the VLAN.

Post by scottwilkerson »

Out of curiosity, do you have any custom broker_modules installed (ie. livestatus)?

Code: Select all

cat /usr/local/nagios/etc/nagios.cfg|grep broker
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: Weird thing happened on the way to the VLAN.

Post by vAJ »

Code: Select all

broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg
event_broker_options=-1
That's all.
Andrew J. - Do you even grok?
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Weird thing happened on the way to the VLAN.

Post by abrist »

Are these hosts checked with an active icmp/ping/keep-host-alive check, or through passive means?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: Weird thing happened on the way to the VLAN.

Post by vAJ »

Active.
Andrew J. - Do you even grok?
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Weird thing happened on the way to the VLAN.

Post by abrist »

Well, what is the retry interval on those checks (and number of retries)?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Locked