Identifying bursts of alerts?

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
davehkent
Posts: 7
Joined: Wed Jul 06, 2016 8:28 am

Identifying bursts of alerts?

Post by davehkent »

Is anyone aware of the best way to identify bursts of alerts?

We have an issue at the moment we are trying to track down where we have a sudden burst of alerts, which clear within a minute.That makes most of them soft alerts. It can happen up to 5 times per day and so far we have not spotted a pattern. We are trying to pin down the cause but finding it difficult to use nagios to identify a time. I really want a report that can report something like the number of alerts per minute, flagging over a threshold like 50 or 100.

The best I have found so far is the Alert Stream, which can not be exported, scheduled and has no vertical scale, so it is hard to tell busy days from quiet days. The other is the alert histogram, which is broken down per hour, and is very coarse for what we need. The more exact the time, the better chance we have of finding a smoking gun in our logs somewhere.

I'm also happy to consider making our own report from the API if someone can suggest a few attributes that might be a good starting point?

We are a little behind on Nagiox XI 5.5.11, have 1588 host checks and 5335 service checks.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Identifying bursts of alerts?

Post by benjaminsmith »

Hi Dave,

Have you tried to pull this data using the State History Report? I believe this may work well here. You can download this report in CSV format for analysis in a spreadsheet if desired. The report has the option to select either hard, soft, or both state types and can filter down to specific hosts or services.

This data can be pulled from the API using the GET objects/statehistory endpoint and there's is the option to build limited queries based on host or time periods. This is documented in the GUI at Help > API Docs

Lastly, if you're comfortable with bash/CLI, the nagios logs can be really helpful for troubleshooting as everything is logged there and never deleted. You'll find the nagios.log at:

Code: Select all

/usr/local/nagios/var/nagios.log
And all the archives based on dates in;

Code: Select all

/usr/local/nagios/var/archives
Hope that helps answer your question, let me know if you need assistance with anything.

--Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
davehkent
Posts: 7
Joined: Wed Jul 06, 2016 8:28 am

Re: Identifying bursts of alerts?

Post by davehkent »

Hi,

Thanks for those suggestions. Looking at it, I think the best way to do what I want to achieve is to write a nagios check to look at it's own log and count the number of alerts in a particular window. If that is above a certain threshold, it could go critical and immediately log a ticket, rather than our standard checks which will try three times, wait an hour and if it is still in an alert state, then log a ticket.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Identifying bursts of alerts?

Post by benjaminsmith »

HI Dave,

That sounds great. Here's a link to the official doc on plugin development to help get you started.

https://nagios-plugins.org/doc/guidelines.html

Let us know if you need anything else on this one.

Thanks,
Benjamin

Reference:
Nagios Exchange
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked