Page 1 of 1
Nagios XI Email Glitch
Posted: Tue May 26, 2020 5:08 am
by Branigan
Hi Community,
I need some advice/help with regard to an issue experienced with our Nagios Xi Monitoring system a few days ago. (Nagios XI 5.6.14)
Alerts had been sent out on "Flapping Start/Flapping Stop" states at least 2 hours after an outage had occurred (the service had not been in a flapping state either).
The time stamp generated by Nagios Xi on the Nagios Xi email contacts received did not match/tally up with any historical data on Nagios Xi.
There are no records of the results seen on the emails received when compared to the Nagios Xi "Notification tab" or the Configured item's "service history" itself.
Example: Outage occurred at 04:07, Services recovered at 04:35 (Outage at this stage is possibly ISP related)
Flapping Start/Flapping Stop emails were still received up until 08:45
Service history on Nagios Xi has no Host/Service state records of the emails sent after services had recovered.
I have since restarted the system (no health check errors, there are no system resource issues either), first time seeing this particular issue, also this issue has not reoccurred, not sure whether anyone else has had the same experience and what steps have been taken to prevent/trace the cause of such an incident?
Thanks

Re: Nagios XI Email Glitch
Posted: Tue May 26, 2020 1:47 pm
by scottwilkerson
Did the Flapping Stop email have an OK state?
If so, this would be expected behavior.
Re: Nagios XI Email Glitch
Posted: Wed May 27, 2020 2:44 am
by Branigan
scottwilkerson wrote:Did the Flapping Stop email have an OK state?
If so, this would be expected behavior.
Yes, the Flapping Stop email had an OK state. See Nagios Timestamp
tempsnip1.png
However I do not see collaborating State history here:
tempsnip.png
Is this normal?
Went through other configured items "State History" and there are no discrepancies as seen above.
Flapping State info collaborates with email alerts.
Thanks.
Re: Nagios XI Email Glitch
Posted: Wed May 27, 2020 8:20 am
by scottwilkerson
The state history report you are showing part of is only showing OK states, at some point the state change from non-OK to OK, and if that happens during a period where flapping had started, you would not get anymore notifications until the service came out of flapping.
Here's an overview of what happens when flapping
https://assets.nagios.com/downloads/nag ... pping.html
Re: Nagios XI Email Glitch
Posted: Wed May 27, 2020 9:18 am
by Branigan
scottwilkerson wrote:The state history report you are showing part of is only showing OK states, at some point the state change from non-OK to OK, and if that happens during a period where flapping had started, you would not get anymore notifications until the service came out of flapping.
Here's an overview of what happens when flapping
https://assets.nagios.com/downloads/nag ... pping.html
Completely understand what you are saying, thank you for the info.
In this instance, the service in question was last an issue at around 4:15, the service from a dashboard and observational perspective was in actual fact not in a flapping state after 4:15, the service was online and accessible. This was confirmed by other internal system logs.
Reason this had been detected was due to support Teams being inundated with alerts from Nagios that had recovered hours ago.
This issue has not reoccurred, post had been logged to determine whether other users had experienced same or similar issues and what preventative measures were taken.
Thanks.
Re: Nagios XI Email Glitch
Posted: Wed May 27, 2020 9:45 am
by scottwilkerson
Once a service enters a flapping state based on the criteria in the link above, it remains there with no notifications going out until until the criteria in the link above for the service to exit the flapping state is resolved.
After entering a flapping state, a service can have all OK results and this stabilization is what allow it to exit the flapping state.
Conversely, it could stabilize in a CRITICAL state, which would also exit the flapping state, but continue to send notification at the defined Notification interval
Re: Nagios XI Email Glitch
Posted: Thu May 28, 2020 7:39 am
by Branigan
scottwilkerson wrote:Once a service enters a flapping state based on the criteria in the link above, it remains there with no notifications going out until until the criteria in the link above for the service to exit the flapping state is resolved.
After entering a flapping state, a service can have all OK results and this stabilization is what allow it to exit the flapping state.
Conversely, it could stabilize in a CRITICAL state, which would also exit the flapping state, but continue to send notification at the defined Notification interval
Thanks Scott, appreciate the feedback.
Re: Nagios XI Email Glitch
Posted: Thu May 28, 2020 8:11 am
by scottwilkerson
Branigan wrote:scottwilkerson wrote:Once a service enters a flapping state based on the criteria in the link above, it remains there with no notifications going out until until the criteria in the link above for the service to exit the flapping state is resolved.
After entering a flapping state, a service can have all OK results and this stabilization is what allow it to exit the flapping state.
Conversely, it could stabilize in a CRITICAL state, which would also exit the flapping state, but continue to send notification at the defined Notification interval
Thanks Scott, appreciate the feedback.
No problem