Nagios sending out too many alerts

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
johndoe
Posts: 114
Joined: Fri Oct 28, 2011 10:14 am

Nagios sending out too many alerts

Post by johndoe »

Hi all,

About our setup: We have nagios being distributed with passive NRDP on our servers and freshness enabled.

Problem: Nagios is sending way too many alerts when a single host is unavailable... Example: if a host goes down I get about (say 10) results (one per service) of the service problem and then 10 more when the server comes back up... As you can imagine this is not ideal...

Question: Is there a way to combine these into one alert only? Or atleast fewer?
Nagios XI 2012R2.8c Running on Ubuntu 12.04 Using 99% passive checks for monitoring
Monitoring nearly 800 Passive services spread through roughly 40 machines
Running on an 8 core, KVM virtualized VM, with 15 GB of RAM and using RAMDisk
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios sending out too many alerts

Post by scottwilkerson »

You can setup host/service dependencies to reduce this
Configure -> Core Config Manager -> Advanced -> Host/Service Dependencies
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
johndoe
Posts: 114
Joined: Fri Oct 28, 2011 10:14 am

Re: Nagios sending out too many alerts

Post by johndoe »

Thanks for the reply, I had a look at host and service dependencies however from what i could understand I'd have to set that manually for each existing host and manually again for every new host we add... I guess that's not viable on a large-expanding environment where machines are being added on a constant basis...

I came across this post whilst searching for a solution for this: http://forums.meulie.net/viewtopic.php? ... 281#p17288
I have "sort of" resolved this by ensuring that hosts go "down" before services go "critical". Bascically, both hosts and services are both set to be checked every 15 seconds, but hosts have max_check_attempts set to 2 and services have this value set to 4. This means that when a host goes down, it generates a host alert 30 seconds before the first service alert, and this suppresses service notifications for hosts that are down.

It's a kludge, but it's better than getting 14 e-mail alerts (one for the server and one for each service on that server) every time a server reboots.

If anyone has any advice about how to get Nagios to check the host automatically when a service goes down before it sends a service alert, that would still be better.

Cheers!

Miles
Although I think the solution the poster had could suit us I have slight problem with it... As previously mentioned we have all checks coming via NRDP and at the moment both HOST checks and SERVICE checks are coming at the same interval, meaning the HOST checks don't necessarily arrive before the service checks...

I could change our check times to adapt to this as mentioned in the quoted post (Changing HOST checks to be more frequent than SERVICE checks) however i wonder if this is the right path to follow before I start changing everything which will be quite a time-consuming task...

It is worth noting that NagiosXI DOES NOT have access to ping the host or any active checks, our monitoring infrastructure is based mainly in passive checks and freshness.
Nagios XI 2012R2.8c Running on Ubuntu 12.04 Using 99% passive checks for monitoring
Monitoring nearly 800 Passive services spread through roughly 40 machines
Running on an 8 core, KVM virtualized VM, with 15 GB of RAM and using RAMDisk
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios sending out too many alerts

Post by scottwilkerson »

This can get into a sort of chicken and the egg thing... One solution would be to use BPI Addon and setup a business process for each group and then only process notifications for the whole process
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
johndoe
Posts: 114
Joined: Fri Oct 28, 2011 10:14 am

Re: Nagios sending out too many alerts

Post by johndoe »

Thanks Scott,

So I've set up a BPI entry per host with all the services inside (needless to say this makes nagios even messier...)

Now here's my current issue: on the services I have "Notification enabled" set to "Skip" and each service uses the template "xiwizard_passive_service" which in turn has "Notification enabled" set to "Off".
I thought this would mean that the notifications for that service would be disabled as it would inherit the "Off" status from the template (and then I would set up the notifications on the newly created BPI group)... However this is not what is happening, The services still send out notifications and don't seem to inherit the "Off" status...

Is my assumption wrong that it should inherit the "Off" status or ?
Nagios XI 2012R2.8c Running on Ubuntu 12.04 Using 99% passive checks for monitoring
Monitoring nearly 800 Passive services spread through roughly 40 machines
Running on an 8 core, KVM virtualized VM, with 15 GB of RAM and using RAMDisk
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios sending out too many alerts

Post by scottwilkerson »

This is what should happen, and I just tested it on my test box running 2011R2.2 and it appears to be working as expected.

Are all your services that are sending notifications using the xiwizard_passive_service template?

And the obvious, did you Apply Configuration after making the change?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
johndoe
Posts: 114
Joined: Fri Oct 28, 2011 10:14 am

Re: Nagios sending out too many alerts

Post by johndoe »

The configs are indeed synched, have a look at the attachments below
You do not have the required permissions to view the files attached to this post.
Nagios XI 2012R2.8c Running on Ubuntu 12.04 Using 99% passive checks for monitoring
Monitoring nearly 800 Passive services spread through roughly 40 machines
Running on an 8 core, KVM virtualized VM, with 15 GB of RAM and using RAMDisk
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios sending out too many alerts

Post by scottwilkerson »

scottwilkerson wrote: Are all your services that are sending notifications using the xiwizard_passive_service template?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
johndoe
Posts: 114
Joined: Fri Oct 28, 2011 10:14 am

Re: Nagios sending out too many alerts

Post by johndoe »

scottwilkerson wrote:
scottwilkerson wrote: Are all your services that are sending notifications using the xiwizard_passive_service template?
Yes, i'm testing this with just one for the time being and forcing it to Critical/OK to "Force" a notification
Nagios XI 2012R2.8c Running on Ubuntu 12.04 Using 99% passive checks for monitoring
Monitoring nearly 800 Passive services spread through roughly 40 machines
Running on an 8 core, KVM virtualized VM, with 15 GB of RAM and using RAMDisk
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios sending out too many alerts

Post by scottwilkerson »

when you view this in XI does it have the "no notifications" icon?
notif.PNG
If not if could be because you have submitted a external command to Nagios disabling and re-enabling notifications.
You do not have the required permissions to view the files attached to this post.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked