Page 1 of 1

Critical-unknown-critical state change notifications flood

Posted: Tue Mar 31, 2020 6:14 am
by gnikolov
Hello,

I have an issue which, I guess, other people have encountered, but I can't seem to find a solution for. The situation is like this: NagiosXI 5.4.13. Traffic going through a lot of unstable tunnels. A lot of SNMP queries timing out(which is expected due to the tunnels' instability). When I have a device with a critical state(for instance service "httpd" is down) a notification for critical comes. After that the connection becomes lousy and I get Unknown notification, after that the connection becomes good again and I get Critical notification. I can deal with the Unknown notifications by simply configuring notification settings, but then I just receive critical notifications every time the quality of the tunnels is restored. I cannot use flapping control, because the time interval between tunnel degradation differs between 5 min and 4 hours.
So my question is - does anyone have a suggestion how to stop sending notifications, when jumping from Unknown to Critical(or Warning for that matter)?

Cheers.

Re: Critical-unknown-critical state change notifications flo

Posted: Tue Mar 31, 2020 12:46 pm
by scottwilkerson
I'll leave this open for others to chime in, but the only solution that makes sense to me is resolving the problem that causes the VPN to degrade

Re: Critical-unknown-critical state change notifications flo

Posted: Tue Mar 31, 2020 3:36 pm
by gnikolov
Unfortunately the tunnels are between continents and stability cannot be fixed. This is why I am looking to eliminate, or at least reduce the spam.
Is there a way to make Nagios think that Unknown status is actually OK status regarding notifications(I guess expecting one master "switch-case" where I can set OK and Unknown in the same case is naive, but perhaps by editing a couple of config files this can be done)?

Re: Critical-unknown-critical state change notifications flo

Posted: Tue Mar 31, 2020 4:01 pm
by scottwilkerson
In the nagios.cfg there is this configuration

Code: Select all

service_check_timeout_state=u
If you set it to

Code: Select all

service_check_timeout_state=o
and restart nagios

It will mark ALL checks that timeout as OK

This is the only thing I can think of but it will affect ALL checks

Re: Critical-unknown-critical state change notifications flo

Posted: Wed Apr 08, 2020 5:24 am
by gnikolov
Thanks, not what I was really looking for, but I will see if I can make it work somehow.

Re: Critical-unknown-critical state change notifications flo

Posted: Wed Apr 08, 2020 7:51 am
by scottwilkerson
gnikolov wrote:Thanks, not what I was really looking for, but I will see if I can make it work somehow.
Sounds good!