Identifying bursts of alerts?
Posted: Fri Jan 29, 2021 10:11 am
Is anyone aware of the best way to identify bursts of alerts?
We have an issue at the moment we are trying to track down where we have a sudden burst of alerts, which clear within a minute.That makes most of them soft alerts. It can happen up to 5 times per day and so far we have not spotted a pattern. We are trying to pin down the cause but finding it difficult to use nagios to identify a time. I really want a report that can report something like the number of alerts per minute, flagging over a threshold like 50 or 100.
The best I have found so far is the Alert Stream, which can not be exported, scheduled and has no vertical scale, so it is hard to tell busy days from quiet days. The other is the alert histogram, which is broken down per hour, and is very coarse for what we need. The more exact the time, the better chance we have of finding a smoking gun in our logs somewhere.
I'm also happy to consider making our own report from the API if someone can suggest a few attributes that might be a good starting point?
We are a little behind on Nagiox XI 5.5.11, have 1588 host checks and 5335 service checks.
We have an issue at the moment we are trying to track down where we have a sudden burst of alerts, which clear within a minute.That makes most of them soft alerts. It can happen up to 5 times per day and so far we have not spotted a pattern. We are trying to pin down the cause but finding it difficult to use nagios to identify a time. I really want a report that can report something like the number of alerts per minute, flagging over a threshold like 50 or 100.
The best I have found so far is the Alert Stream, which can not be exported, scheduled and has no vertical scale, so it is hard to tell busy days from quiet days. The other is the alert histogram, which is broken down per hour, and is very coarse for what we need. The more exact the time, the better chance we have of finding a smoking gun in our logs somewhere.
I'm also happy to consider making our own report from the API if someone can suggest a few attributes that might be a good starting point?
We are a little behind on Nagiox XI 5.5.11, have 1588 host checks and 5335 service checks.