Nagios sending out too many alerts
Nagios sending out too many alerts
Hi all,
About our setup: We have nagios being distributed with passive NRDP on our servers and freshness enabled.
Problem: Nagios is sending way too many alerts when a single host is unavailable... Example: if a host goes down I get about (say 10) results (one per service) of the service problem and then 10 more when the server comes back up... As you can imagine this is not ideal...
Question: Is there a way to combine these into one alert only? Or atleast fewer?
About our setup: We have nagios being distributed with passive NRDP on our servers and freshness enabled.
Problem: Nagios is sending way too many alerts when a single host is unavailable... Example: if a host goes down I get about (say 10) results (one per service) of the service problem and then 10 more when the server comes back up... As you can imagine this is not ideal...
Question: Is there a way to combine these into one alert only? Or atleast fewer?
Nagios XI 2012R2.8c Running on Ubuntu 12.04 Using 99% passive checks for monitoring
Monitoring nearly 800 Passive services spread through roughly 40 machines
Running on an 8 core, KVM virtualized VM, with 15 GB of RAM and using RAMDisk
Monitoring nearly 800 Passive services spread through roughly 40 machines
Running on an 8 core, KVM virtualized VM, with 15 GB of RAM and using RAMDisk
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios sending out too many alerts
You can setup host/service dependencies to reduce this
Configure -> Core Config Manager -> Advanced -> Host/Service Dependencies
Configure -> Core Config Manager -> Advanced -> Host/Service Dependencies
Re: Nagios sending out too many alerts
Thanks for the reply, I had a look at host and service dependencies however from what i could understand I'd have to set that manually for each existing host and manually again for every new host we add... I guess that's not viable on a large-expanding environment where machines are being added on a constant basis...
I came across this post whilst searching for a solution for this: http://forums.meulie.net/viewtopic.php? ... 281#p17288
I could change our check times to adapt to this as mentioned in the quoted post (Changing HOST checks to be more frequent than SERVICE checks) however i wonder if this is the right path to follow before I start changing everything which will be quite a time-consuming task...
It is worth noting that NagiosXI DOES NOT have access to ping the host or any active checks, our monitoring infrastructure is based mainly in passive checks and freshness.
I came across this post whilst searching for a solution for this: http://forums.meulie.net/viewtopic.php? ... 281#p17288
Although I think the solution the poster had could suit us I have slight problem with it... As previously mentioned we have all checks coming via NRDP and at the moment both HOST checks and SERVICE checks are coming at the same interval, meaning the HOST checks don't necessarily arrive before the service checks...I have "sort of" resolved this by ensuring that hosts go "down" before services go "critical". Bascically, both hosts and services are both set to be checked every 15 seconds, but hosts have max_check_attempts set to 2 and services have this value set to 4. This means that when a host goes down, it generates a host alert 30 seconds before the first service alert, and this suppresses service notifications for hosts that are down.
It's a kludge, but it's better than getting 14 e-mail alerts (one for the server and one for each service on that server) every time a server reboots.
If anyone has any advice about how to get Nagios to check the host automatically when a service goes down before it sends a service alert, that would still be better.
Cheers!
Miles
I could change our check times to adapt to this as mentioned in the quoted post (Changing HOST checks to be more frequent than SERVICE checks) however i wonder if this is the right path to follow before I start changing everything which will be quite a time-consuming task...
It is worth noting that NagiosXI DOES NOT have access to ping the host or any active checks, our monitoring infrastructure is based mainly in passive checks and freshness.
Nagios XI 2012R2.8c Running on Ubuntu 12.04 Using 99% passive checks for monitoring
Monitoring nearly 800 Passive services spread through roughly 40 machines
Running on an 8 core, KVM virtualized VM, with 15 GB of RAM and using RAMDisk
Monitoring nearly 800 Passive services spread through roughly 40 machines
Running on an 8 core, KVM virtualized VM, with 15 GB of RAM and using RAMDisk
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios sending out too many alerts
This can get into a sort of chicken and the egg thing... One solution would be to use BPI Addon and setup a business process for each group and then only process notifications for the whole process
Re: Nagios sending out too many alerts
Thanks Scott,
So I've set up a BPI entry per host with all the services inside (needless to say this makes nagios even messier...)
Now here's my current issue: on the services I have "Notification enabled" set to "Skip" and each service uses the template "xiwizard_passive_service" which in turn has "Notification enabled" set to "Off".
I thought this would mean that the notifications for that service would be disabled as it would inherit the "Off" status from the template (and then I would set up the notifications on the newly created BPI group)... However this is not what is happening, The services still send out notifications and don't seem to inherit the "Off" status...
Is my assumption wrong that it should inherit the "Off" status or ?
So I've set up a BPI entry per host with all the services inside (needless to say this makes nagios even messier...)
Now here's my current issue: on the services I have "Notification enabled" set to "Skip" and each service uses the template "xiwizard_passive_service" which in turn has "Notification enabled" set to "Off".
I thought this would mean that the notifications for that service would be disabled as it would inherit the "Off" status from the template (and then I would set up the notifications on the newly created BPI group)... However this is not what is happening, The services still send out notifications and don't seem to inherit the "Off" status...
Is my assumption wrong that it should inherit the "Off" status or ?
Nagios XI 2012R2.8c Running on Ubuntu 12.04 Using 99% passive checks for monitoring
Monitoring nearly 800 Passive services spread through roughly 40 machines
Running on an 8 core, KVM virtualized VM, with 15 GB of RAM and using RAMDisk
Monitoring nearly 800 Passive services spread through roughly 40 machines
Running on an 8 core, KVM virtualized VM, with 15 GB of RAM and using RAMDisk
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios sending out too many alerts
This is what should happen, and I just tested it on my test box running 2011R2.2 and it appears to be working as expected.
Are all your services that are sending notifications using the xiwizard_passive_service template?
And the obvious, did you Apply Configuration after making the change?
Are all your services that are sending notifications using the xiwizard_passive_service template?
And the obvious, did you Apply Configuration after making the change?
Re: Nagios sending out too many alerts
The configs are indeed synched, have a look at the attachments below
You do not have the required permissions to view the files attached to this post.
Nagios XI 2012R2.8c Running on Ubuntu 12.04 Using 99% passive checks for monitoring
Monitoring nearly 800 Passive services spread through roughly 40 machines
Running on an 8 core, KVM virtualized VM, with 15 GB of RAM and using RAMDisk
Monitoring nearly 800 Passive services spread through roughly 40 machines
Running on an 8 core, KVM virtualized VM, with 15 GB of RAM and using RAMDisk
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios sending out too many alerts
scottwilkerson wrote: Are all your services that are sending notifications using the xiwizard_passive_service template?
Re: Nagios sending out too many alerts
Yes, i'm testing this with just one for the time being and forcing it to Critical/OK to "Force" a notificationscottwilkerson wrote:scottwilkerson wrote: Are all your services that are sending notifications using the xiwizard_passive_service template?
Nagios XI 2012R2.8c Running on Ubuntu 12.04 Using 99% passive checks for monitoring
Monitoring nearly 800 Passive services spread through roughly 40 machines
Running on an 8 core, KVM virtualized VM, with 15 GB of RAM and using RAMDisk
Monitoring nearly 800 Passive services spread through roughly 40 machines
Running on an 8 core, KVM virtualized VM, with 15 GB of RAM and using RAMDisk
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios sending out too many alerts
when you view this in XI does it have the "no notifications" icon?
If not if could be because you have submitted a external command to Nagios disabling and re-enabling notifications.
You do not have the required permissions to view the files attached to this post.