Limit total number of emails sent per check
Posted: Wed Jun 07, 2017 12:11 pm
Hi guys,
I am very new to Nagios, so my apologies if I am asking an obvious question.
We have an existing Nagios3 installation (3.2.3-3ubuntu1.1) running on a 12.04 server.
We're migrating it to a Nagios3 installation (3.5.1.dfsg-2.1ubuntu1.1) running on a 16.04 server once we have this figured out.
We would like to be alerted once a check for a host's service fails, and then continue to be alerted, about 5 times per host, and then have no more alerts.
Currently, we get bombarded with emails if a service fails until the checks stop failing.
For example, if someone mistypes 5 DNS records at 4:30 PM on Friday and leaves, we get 5 emails for the DNS check failure every 30 minutes from 4:30 PM on Friday until someone fixes it at 8AM Monday morning.
We would like to change the below example to a scenario where the first check sends an alert, and then every additional failed check sends alerts for each host that's failed for the next 5 alerts, and then stops sending alerts.
My first thought was to include all hosts in a hostgroup called "all-hosts" and then define an escalation rule for that group that, after 5th alert, uses a contact with a dummy email address.
The problem is that after 5 alerts, emails are now sent to both the original contact group and to the dummy contact.
Please let me know the best way to get such a configuration, or let me know if I have a fundamental misunderstanding of how Nagios works.
Thanks,
Ben
I am very new to Nagios, so my apologies if I am asking an obvious question.
We have an existing Nagios3 installation (3.2.3-3ubuntu1.1) running on a 12.04 server.
We're migrating it to a Nagios3 installation (3.5.1.dfsg-2.1ubuntu1.1) running on a 16.04 server once we have this figured out.
We would like to be alerted once a check for a host's service fails, and then continue to be alerted, about 5 times per host, and then have no more alerts.
Currently, we get bombarded with emails if a service fails until the checks stop failing.
For example, if someone mistypes 5 DNS records at 4:30 PM on Friday and leaves, we get 5 emails for the DNS check failure every 30 minutes from 4:30 PM on Friday until someone fixes it at 8AM Monday morning.
We would like to change the below example to a scenario where the first check sends an alert, and then every additional failed check sends alerts for each host that's failed for the next 5 alerts, and then stops sending alerts.
My first thought was to include all hosts in a hostgroup called "all-hosts" and then define an escalation rule for that group that, after 5th alert, uses a contact with a dummy email address.
The problem is that after 5 alerts, emails are now sent to both the original contact group and to the dummy contact.
Please let me know the best way to get such a configuration, or let me know if I have a fundamental misunderstanding of how Nagios works.
Thanks,
Ben