Is anyone aware of the best way to identify bursts of alerts?
We have an issue at the moment we are trying to track down where we have a sudden burst of alerts, which clear within a minute.That makes most of them soft alerts. It can happen up to 5 times per day and so far we have not spotted a pattern. We are trying to pin down the cause but finding it difficult to use nagios to identify a time. I really want a report that can report something like the number of alerts per minute, flagging over a threshold like 50 or 100.
The best I have found so far is the Alert Stream, which can not be exported, scheduled and has no vertical scale, so it is hard to tell busy days from quiet days. The other is the alert histogram, which is broken down per hour, and is very coarse for what we need. The more exact the time, the better chance we have of finding a smoking gun in our logs somewhere.
I'm also happy to consider making our own report from the API if someone can suggest a few attributes that might be a good starting point?
We are a little behind on Nagiox XI 5.5.11, have 1588 host checks and 5335 service checks.
Identifying bursts of alerts?
-
benjaminsmith
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: Identifying bursts of alerts?
Hi Dave,
Have you tried to pull this data using the State History Report? I believe this may work well here. You can download this report in CSV format for analysis in a spreadsheet if desired. The report has the option to select either hard, soft, or both state types and can filter down to specific hosts or services.
This data can be pulled from the API using the GET objects/statehistory endpoint and there's is the option to build limited queries based on host or time periods. This is documented in the GUI at Help > API Docs
Lastly, if you're comfortable with bash/CLI, the nagios logs can be really helpful for troubleshooting as everything is logged there and never deleted. You'll find the nagios.log at:
And all the archives based on dates in;
Hope that helps answer your question, let me know if you need assistance with anything.
--Benjamin
Have you tried to pull this data using the State History Report? I believe this may work well here. You can download this report in CSV format for analysis in a spreadsheet if desired. The report has the option to select either hard, soft, or both state types and can filter down to specific hosts or services.
This data can be pulled from the API using the GET objects/statehistory endpoint and there's is the option to build limited queries based on host or time periods. This is documented in the GUI at Help > API Docs
Lastly, if you're comfortable with bash/CLI, the nagios logs can be really helpful for troubleshooting as everything is logged there and never deleted. You'll find the nagios.log at:
Code: Select all
/usr/local/nagios/var/nagios.log
Code: Select all
/usr/local/nagios/var/archives
--Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Identifying bursts of alerts?
Hi,
Thanks for those suggestions. Looking at it, I think the best way to do what I want to achieve is to write a nagios check to look at it's own log and count the number of alerts in a particular window. If that is above a certain threshold, it could go critical and immediately log a ticket, rather than our standard checks which will try three times, wait an hour and if it is still in an alert state, then log a ticket.
Thanks for those suggestions. Looking at it, I think the best way to do what I want to achieve is to write a nagios check to look at it's own log and count the number of alerts in a particular window. If that is above a certain threshold, it could go critical and immediately log a ticket, rather than our standard checks which will try three times, wait an hour and if it is still in an alert state, then log a ticket.
-
benjaminsmith
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: Identifying bursts of alerts?
HI Dave,
That sounds great. Here's a link to the official doc on plugin development to help get you started.
https://nagios-plugins.org/doc/guidelines.html
Let us know if you need anything else on this one.
Thanks,
Benjamin
Reference:
Nagios Exchange
That sounds great. Here's a link to the official doc on plugin development to help get you started.
https://nagios-plugins.org/doc/guidelines.html
Let us know if you need anything else on this one.
Thanks,
Benjamin
Reference:
Nagios Exchange
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!