Hi guys,
I am very new to Nagios, so my apologies if I am asking an obvious question.
We have an existing Nagios3 installation (3.2.3-3ubuntu1.1) running on a 12.04 server.
We're migrating it to a Nagios3 installation (3.5.1.dfsg-2.1ubuntu1.1) running on a 16.04 server once we have this figured out.
We would like to be alerted once a check for a host's service fails, and then continue to be alerted, about 5 times per host, and then have no more alerts.
Currently, we get bombarded with emails if a service fails until the checks stop failing.
For example, if someone mistypes 5 DNS records at 4:30 PM on Friday and leaves, we get 5 emails for the DNS check failure every 30 minutes from 4:30 PM on Friday until someone fixes it at 8AM Monday morning.
We would like to change the below example to a scenario where the first check sends an alert, and then every additional failed check sends alerts for each host that's failed for the next 5 alerts, and then stops sending alerts.
My first thought was to include all hosts in a hostgroup called "all-hosts" and then define an escalation rule for that group that, after 5th alert, uses a contact with a dummy email address.
The problem is that after 5 alerts, emails are now sent to both the original contact group and to the dummy contact.
Please let me know the best way to get such a configuration, or let me know if I have a fundamental misunderstanding of how Nagios works.
Thanks,
Ben
Limit total number of emails sent per check
-
- Posts: 3
- Joined: Wed Jun 07, 2017 9:39 am
-
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Limit total number of emails sent per check
This is mostly written for XI, but I think it should give you some clues: https://support.nagios.com/kb/article.php?id=36
Most importantly is the idea of acknowledging issues.
You may also want to check out timeperiods: https://support.nagios.com/kb/article.php?id=369
There's not an easy way to do exactly what you want, but there are certainly plenty of ways to minimize your number of alerts.
Most importantly is the idea of acknowledging issues.
You may also want to check out timeperiods: https://support.nagios.com/kb/article.php?id=369
There's not an easy way to do exactly what you want, but there are certainly plenty of ways to minimize your number of alerts.
-
- Posts: 3
- Joined: Wed Jun 07, 2017 9:39 am
Re: Limit total number of emails sent per check
Thanks for the reply.
is there a script I could write for nagios core that would automate acknowledging the alerts? Some of the hosts failing checks are Windows 2003, others 2008R2. The 2003 boxes use NC_Net and the 2008R2 boxes use NSClient++.
I think what I want to do is define my event handler and a new command as specified in the doc page I found but I am unclear what commands to insert to tell the nagios server to acknowledge that specific service on that specific host.
Based off of posts like this I think I need to have the remote host/monitored host execute something via NC_net or NSClient++. i.e. I need to define a command and then define the command as running a batch script running something similar to the below example, echoing the appropriate variables and appropriate value for the service name.
is there a script I could write for nagios core that would automate acknowledging the alerts? Some of the hosts failing checks are Windows 2003, others 2008R2. The 2003 boxes use NC_Net and the 2008R2 boxes use NSClient++.
I think what I want to do is define my event handler and a new command as specified in the doc page I found but I am unclear what commands to insert to tell the nagios server to acknowledge that specific service on that specific host.
Based off of posts like this I think I need to have the remote host/monitored host execute something via NC_net or NSClient++. i.e. I need to define a command and then define the command as running a batch script running something similar to the below example, echoing the appropriate variables and appropriate value for the service name.
Code: Select all
/usr/bin/printf "[%lu] ACKNOWLEDGE_HOST_PROBLEM;$hosttoack;1;1;1;nagiosadmin;Down I know\n"
Re: Limit total number of emails sent per check
You might find this helpful, though I'm not sure if the commands are consistent with Core 3 VS Core 4:bencundiff wrote: I am unclear what commands to insert to tell the nagios server to acknowledge that specific service on that specific host.
https://old.nagios.org/developerinfo/ex ... ndlist.php
https://old.nagios.org/developerinfo/ex ... mand_id=39
https://old.nagios.org/developerinfo/ex ... mand_id=40
In a nutshell, there's a file you can write to which allows you to do "nagios things" from the CLI. Having the Windows machine be responsible for that is a whole other deal, though. Having the Windows machines submit passive check results might be a better route:
https://assets.nagios.com/downloads/nag ... hecks.html
You could use something like NRDP/NSCA (which would live on the Nagios box) to act as the "bucket" that NSClient++/NC_Net sends the results to :
https://exchange.nagios.org/directory/A ... or/details
https://exchange.nagios.org/directory/A ... or/details
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
-
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Limit total number of emails sent per check
Thanks for the assist @mcapra!
-
- Posts: 3
- Joined: Wed Jun 07, 2017 9:39 am
Re: Limit total number of emails sent per check
mcapra wrote:You might find this helpful, though I'm not sure if the commands are consistent with Core 3 VS Core 4:bencundiff wrote: I am unclear what commands to insert to tell the nagios server to acknowledge that specific service on that specific host.
https://old.nagios.org/developerinfo/ex ... ndlist.php
https://old.nagios.org/developerinfo/ex ... mand_id=39
https://old.nagios.org/developerinfo/ex ... mand_id=40
In a nutshell, there's a file you can write to which allows you to do "nagios things" from the CLI. Having the Windows machine be responsible for that is a whole other deal, though. Having the Windows machines submit passive check results might be a better route:
https://assets.nagios.com/downloads/nag ... hecks.html
You could use something like NRDP/NSCA (which would live on the Nagios box) to act as the "bucket" that NSClient++/NC_Net sends the results to :
https://exchange.nagios.org/directory/A ... or/details
https://exchange.nagios.org/directory/A ... or/details
Ok, that helps a lot!
If NC_Net isn't an active project and I can't figure out how to get passive checks to work on server 2003 hosts, perhaps that's just another indicator it's time for us to stop using Server 2003 in a production environment....
Would a third option be to have notifications sent to a dummy contact, and then escalate to a real contact group for alerts 2 to 6, and then escalate back to the dummy contact group for alerts 7+?
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Limit total number of emails sent per check
Yes, this is what I would do to limit by quantity of notificationsbencundiff wrote:Would a third option be to have notifications sent to a dummy contact, and then escalate to a real contact group for alerts 2 to 6, and then escalate back to the dummy contact group for alerts 7+?