Assistance with Flapping.

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Bevlar
Posts: 3
Joined: Fri Oct 21, 2011 3:48 am

Assistance with Flapping.

Post by Bevlar »

Hi,

I have installed Nagios on Ubuntu Server 10.04 which is running fine and monitoring all switches and 2 windows servers.

I'd like to know how to stop services flapping once I have been notified.

The only reference to flapping that I have been able to find is 'Detection and Handling of State Flapping ' included in the Nagios documentation.

Is there a different term in windows for flapping so I can find out how to resolve the issue.

Any info will be greatly appreciated.
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: Assistance with Flapping.

Post by jsmurphy »

Flapping means that the service is switching between states on a regular basis now this could mean one of two things.

1. Either your flapping threshold in Nagios for that service/host is set too low and it's not being tolerant enough to account for minor hiccups.

2. The device it is querying is experiencing network connectivity problems or the service being monitored is unreliable and failing and restarting consistently.

So to diagnose flapping you need to work out why there are dropouts in communication between Nagios and whatever it's monitoring.
Bevlar
Posts: 3
Joined: Fri Oct 21, 2011 3:48 am

Re: Assistance with Flapping.

Post by Bevlar »

Hi jsmurphy,

thanks for the reply.

If it was a connectivity issue, would it not show all monitored services as flapping instead of just two (CPU & NSClient++)?
I've just checked the logs and there have been no switch drop-outs since I started monitoring.

Also of the two servers the one showing flapping services is a physical host whilst the other is an ESXi host. I'm not sure if that would make a difference.

Although the one with issues is running Sage 200 and will receive more traffic than the other which is a domain controller.
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: Assistance with Flapping.

Post by jsmurphy »

I think what may help you most is reading this: http://nagios.sourceforge.net/docs/3_0/flapping.html

Those services crossing the threshold are changing state regularly between critical/warning/ok at a rate above the flap threshold. Which means either your flap threshold is too low OR your warning/critical thresholds are too low OR the services really are not reliable...

The answer is going to really depend on how you expect your hardware/software to perform and what you consider to be a rate of change that is to high.
Locked