Page 1 of 1

Service never enters RECOVERED state - No notification

Posted: Sat Aug 03, 2013 9:30 am
by dervari
We had a disk space alert last night that went into critical and issued the proper escalation alerts. After I resolved the issue, which involved restarting a service so the files that were in queue could be processed, the service went into a FLAPPING STARTED and then FLAPPING STOPPED state. This was expected since as it neared the threshold level it would bounce back and forth a few times due to old files being processed and new files being formed. However, it never went into a RECOVERED state so the recovery notification was never sent out. I assume this was caused by the starting and stopping of flapping. Is this expected behavior or should it have entered a recovered state after the flapping had stopped? I would think that it should have recovered.

Re: Service never enters RECOVERED state - No notification

Posted: Mon Aug 05, 2013 10:08 am
by lmiltchev
When the services enters a "FLAPPING STOPPED state", you should be notified for every state change afterwords. Was the service in a "CRITICAL" state AFTER the flapping stopped? If it were, you should have received an alert on the following recovery.
I would go to: Reports->State History->disk space service->select the period, and review the state changes after the flapping stopped.
Then go to: Reports->Notifications, and check alerts that you received for this service/time period, after the flapping stopped.

Re: Service never enters RECOVERED state - No notification

Posted: Mon Aug 05, 2013 12:18 pm
by dervari
No, it transitioned as follows:

Warning
Critical
Warning
Slapping Start
Flapping Stop

It went into an OK state during the flapping and when the flapping stopped, remained in an OK state. Should it have entered a RECOVERED state? It appears that since the state went OK during the flapping that it never saw a WARNING to RECOVERED transition, and never send the alert that the service had recovered.

Re: Service never enters RECOVERED state - No notification

Posted: Mon Aug 05, 2013 1:17 pm
by lmiltchev
Notifications are not sent while the service is in a flapping state. An alert would be sent after the service exits the flapping state, and IF the service's state changes AFTER the "FLAPPINGSTOP" alert has been sent. If your service's state changed to "OK" while it was in a flapping state, and remained in "OK" state (no change in state), you would not be getting an alert - default behavior.

Re: Service never enters RECOVERED state - No notification

Posted: Tue Aug 13, 2013 12:08 pm
by dervari
Thanks. I've disabled flapping notification for the services in question. Hopefully that should take care of the issue for the time being.

However, please investigate modifying the behavior (or allowing a user defined option) so that if a service goes into an OK state while flapping that a recovery notification be sent out when the flapping stops. We have SNMP traps being sent that are parsed by our ticketing system. The system will auto-resolve a ticket upon receiving a recovered trap and it would be nice to be able to re-enable flapping notifications.

Re: Service never enters RECOVERED state - No notification

Posted: Tue Aug 13, 2013 12:12 pm
by slansing
That is a good use-case for a change. Would you mind submitting it to:

http://tracker.nagios.com/

That way the whole team can see it, including devs. Thanks!