Hi,
we are monitoring an environment with very frequent, totally random and perfectly legit downtimes. Due to these pecularities it is difficult to use a schedeuled downtime approach. Moreover we don't need an aggressive notifications in place because most of the services are low impact, so a lazy notification is ok. Just one email per day would be sufficient for the vast majority of services.
I'm seeking for suggestion to reduce the number of notifications in case of false positive/service check timeout situations: is there any way I can only be reported if a service is really in a fault state (i.e. its value is outside of the boundaries imposed with -c and -w) instead of when it isjust timing out for whatever reason?
Disable notifications for service check timeout
-
- Posts: 96
- Joined: Thu Oct 22, 2015 5:26 am
Re: Disable notifications for service check timeout
We have the same sort of thing with some services we monitor.
So what we do is set in nagios.cfg
service_check_timeout_state=u
So any timeouts go to an Unknown state and we don't notify on Unknown states.
That way we only get valid notifications when the check hasn't timed out and the thresholds have been broken.
This is a global setting so would affect all services in the same way, but we are ok with this.
Hope this helps.
So what we do is set in nagios.cfg
service_check_timeout_state=u
So any timeouts go to an Unknown state and we don't notify on Unknown states.
That way we only get valid notifications when the check hasn't timed out and the thresholds have been broken.
This is a global setting so would affect all services in the same way, but we are ok with this.
Hope this helps.
Re: Disable notifications for service check timeout
Thanks for the input @delboy1966!
Another options is to increase the notification_interval to reduce the number of notifications sent out. Increasing the retry_internval and max_check_attempts too could weed out some of the false positives.
Another options is to increase the notification_interval to reduce the number of notifications sent out. Increasing the retry_internval and max_check_attempts too could weed out some of the false positives.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Disable notifications for service check timeout
Thank you!
That was exactly what I was searching, funnily enough I had this option correctly configured on another instance of Nagios, and totally missed it on this one
Do you happen to have any other useful tips to share on dealing with such a peculiar environment? I mean, with random devices downtimes etc?
That was exactly what I was searching, funnily enough I had this option correctly configured on another instance of Nagios, and totally missed it on this one
Do you happen to have any other useful tips to share on dealing with such a peculiar environment? I mean, with random devices downtimes etc?
-
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: Disable notifications for service check timeout
@melmoth, All situations are different. In your case seems like increasing the number of max_check_attempts, and increasing the check_interval is the best way to go. Also, limiting notification states might help reduce the number of alerts. Also, depending on what service you're using you could add a timeout value to the command, therefore, overriding and increasing the default option. You could also change the service_timeout and host_timeout values globally in /usr/local/nagios/etc/nagios.cfg file.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.