Abnormal behavior on the Dashboard

lucas.shelton · Post by **lucas.shelton** » Tue Oct 13, 2015 9:04 am

This morning one of our sources had abnormal behavior between 8:30am-8:50am. When I click on the red and hit "View Problem" I can't really see what the issue was. Basically it's like clicking on your source from the dashboard except it only shows about the past 40 minutes instead of the past two hours. It would be nice if it actually made obvious what the abnormal behavior was or why it showed up as red for that time frame.

jolson · Post by **jolson** » Tue Oct 13, 2015 5:06 pm

Just so we're clear, I'd like you to send us a screenshot of the exact place in the GUI you're talking about - I'd also like to know any detail regarding this functionality that you'd like to see. I'll go ahead and submit a feature request when you get back with that information - I would like the request to be as verbose as possible, hence why I'm requesting this information from you. Thanks!

Jesse

Post by **lmiltchev** » Tue Oct 13, 2015 5:08 pm

NNA uses Holt-Winters to check if the latest data is outside of what the predicted value should be. It is a "default" setting in the RRDTool.

RRDtool actually runs it for us and comes up with either an ok or failure, and the failures are what are put in as abnormal behavior.

You can read more on the topic in the RRDTool man pages (the "Aberrant Behavior Detection" section).

Here's a quote from the man pages:

Aberrant Behavior Detection
by Jake Brutlag

RRDtool provides the building blocks for near real-time aberrant behavior detection. These
components include:

Â· An algorithm for predicting the value of a time series one time step into the future.

Â· A measure of deviation between predicted and observed values.

Â· A mechanism to decide if and when an observed value or sequence of observed values is too
deviant from the predicted value(s).

Here is a brief explanation of these components:

The Holt-Winters time series forecasting algorithm is an on-line (or incremental) algorithm
that adaptively predicts future observations in a time series. Its forecast is the sum of
three components: a baseline (or intercept), a linear trend over time (or slope), and a
seasonal coefficient (a periodic effect, such as a daily cycle). There is one seasonal
coefficient for each time point in the period (cycle). After a value is observed, each of
these components is updated via exponential smoothing. This means that the algorithm "learns"
from past values and uses them to predict the future. The rate of adaptation is governed by 3
parameters, alpha (intercept), beta (slope), and gamma (seasonal). The prediction can also be
viewed as a smoothed value for the time series.

The measure of deviation is a seasonal weighted absolute deviation. The term seasonal means
deviation is measured separately for each time point in the seasonal cycle. As with Holt-
Winters forecasting, deviation is predicted using the measure computed from past values (but
only at that point in the seasonal cycle). After the value is observed, the algorithm learns
from the observed value via exponential smoothing. Confidence bands for the observed time
series are generated by scaling the sequence of predicted deviation values (we usually think
of the sequence as a continuous line rather than a set of discrete points).

Aberrant behavior (a potential failure) is reported whenever the number of times the observed
value violates the confidence bands meets or exceeds a specified threshold within a specified
temporal window (e.g. 5 violations during the past 45 minutes with a value observed every 5
minutes).

Hope this helps.

lucas.shelton · Post by **lucas.shelton** » Wed Oct 14, 2015 7:33 am

Does that help? Yes and no. It helps me understand how the abnormal behavior is computed, but it does me little good unless I know exactly what the previous behavior looked like. This is where the "View Problem" button would really be useful if it clearly indicated what exactly was outside of normal. Is it flows, packets, bytes? Etc. It does little good to have the "View Problem" button there when essentially it just takes me to the source when I click it.

Post by **lmiltchev** » Wed Oct 14, 2015 4:56 pm

I agree that clicking on the "View Problem" link should provide a user with more useful information. This seems like a good candidate for a feature request . I can file an internal feature request for you if you want me to. Thank you!

lucas.shelton · Post by **lucas.shelton** » Mon Oct 19, 2015 1:35 pm

lmiltchev wrote:I agree that clicking on the "View Problem" link should provide a user with more useful information. This seems like a good candidate for a feature request . I can file an internal feature request for you if you want me to. Thank you!

Yes, please file a feature request for this.

Thanks

Post by **lmiltchev** » Tue Oct 20, 2015 11:25 am

Done. I posted an internal feature request (TASK ID 6682) and referenced this post in it. Thank you!

Post by **eloyd** » Wed Jun 08, 2016 1:00 pm

I love adding to unclosed old threads.

I still see nothing when clicking on "View Problem" that indicates what the problem is. This is especially annoying when tied with NXI's Abnormal Behavior checks, since our NXI is linked to Incident Manager. This means every time NNA sees abnormal behavior, NXI sends a ticket to IM which sends notifications to everyone because there isn't robust routing capabilities between NXI and NIM. So everyone gets notified about a ticket generated for abnormal behavior, but no one really knows what the behavior was that triggered it. And to top it off, 5/10/15 minutes later when NNA sees "normal" traffic again, it updates XI which updates IM, which resolves the ticket and sends notifications out. So people are learning to ignore notifications, which is not a good thing.

Please consider exposing more of the abnormal behavior determination if possible, and (as a side note) please consider allow us to disable IM ticket creation for specific services/servicegroups within NXI. I'll probably drop that last one into a NIM or NXI board.

Post by **lmiltchev** » Wed Jun 08, 2016 3:56 pm

Please consider exposing more of the abnormal behavior determination if possible...

I am going to discuss this with our developers and get back to you.

...please consider allow us to disable IM ticket creation for specific services/servicegroups within NXI...

Have you tried setting up filtering in the Nagios IM component in XI by hostgroups and servicegroups?

Filtering: If hostgroups OR servicegroups are selected, Nagios XI will only forward events for selected groups.

You could place the services in question in a separate, i.e. "IM" servicegroup, then select all servicegroup but "IM".

Post by **eloyd** » Thu Jun 09, 2016 8:14 am

Honestly, I'd forgotten about the host/service group stuff in the IM component in XI.

Thanks for the reminder!

New request then, related to that: Rather than select everything that should be sent (which means remembering to do so when new servicegroups/hostgroups are added) can we flip it around and have everything OTHER THAN what is selected sent? Meaning, make it an exclusion filter instead of an inclusion filter? Having both options would be even cooler, with the ULTIMATE IN SELECTING POWER!!!!! (You have to read that last part as if you're the announcer from He-Man.)

Nagios Support Forum

Abnormal behavior on the Dashboard

Abnormal behavior on the Dashboard

Re: Abnormal behavior on the Dashboard

Re: Abnormal behavior on the Dashboard

Re: Abnormal behavior on the Dashboard

Re: Abnormal behavior on the Dashboard

Re: Abnormal behavior on the Dashboard

Re: Abnormal behavior on the Dashboard

Re: Abnormal behavior on the Dashboard

Re: Abnormal behavior on the Dashboard

Re: Abnormal behavior on the Dashboard