Page 1 of 1

Disable notifications for service check timeout

Posted: Wed Feb 14, 2018 4:45 am
by melmoth
Hi,
we are monitoring an environment with very frequent, totally random and perfectly legit downtimes. Due to these pecularities it is difficult to use a schedeuled downtime approach. Moreover we don't need an aggressive notifications in place because most of the services are low impact, so a lazy notification is ok. Just one email per day would be sufficient for the vast majority of services.
I'm seeking for suggestion to reduce the number of notifications in case of false positive/service check timeout situations: is there any way I can only be reported if a service is really in a fault state (i.e. its value is outside of the boundaries imposed with -c and -w) instead of when it isjust timing out for whatever reason?

Re: Disable notifications for service check timeout

Posted: Wed Feb 14, 2018 5:23 am
by delboy1966
We have the same sort of thing with some services we monitor.

So what we do is set in nagios.cfg

service_check_timeout_state=u

So any timeouts go to an Unknown state and we don't notify on Unknown states.
That way we only get valid notifications when the check hasn't timed out and the thresholds have been broken.

This is a global setting so would affect all services in the same way, but we are ok with this.

Hope this helps.

Re: Disable notifications for service check timeout

Posted: Wed Feb 14, 2018 5:40 pm
by cdienger
Thanks for the input @delboy1966!

Another options is to increase the notification_interval to reduce the number of notifications sent out. Increasing the retry_internval and max_check_attempts too could weed out some of the false positives.

Re: Disable notifications for service check timeout

Posted: Thu Feb 15, 2018 5:05 am
by melmoth
Thank you!
That was exactly what I was searching, funnily enough I had this option correctly configured on another instance of Nagios, and totally missed it on this one :)

Do you happen to have any other useful tips to share on dealing with such a peculiar environment? I mean, with random devices downtimes etc?

Re: Disable notifications for service check timeout

Posted: Thu Feb 15, 2018 5:14 pm
by npolovenko
@melmoth, All situations are different. In your case seems like increasing the number of max_check_attempts, and increasing the check_interval is the best way to go. Also, limiting notification states might help reduce the number of alerts. Also, depending on what service you're using you could add a timeout value to the command, therefore, overriding and increasing the default option. You could also change the service_timeout and host_timeout values globally in /usr/local/nagios/etc/nagios.cfg file.