OK (return to normal) state delay

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
Maxwellb99
Posts: 97
Joined: Tue Jan 26, 2016 5:29 pm

OK (return to normal) state delay

Post by Maxwellb99 »

Hi Nagios,

Use Case: we have a bunch of alerts that going critical or unknown due to timeout. (I'll start a separate thread for that.) The problem becomes, they'll send out an alert after getting a single OK state. Then they flip back to critical or unknown. This is causing way too many alerts.

Question:
- Is there a way to set a required threshold for OK alerts (ie. Is there a soft OK state)?

Note:
- We'd prefer not to use flapping.

Thanks,
Max
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: OK (return to normal) state delay

Post by ssax »

Unfortunately, flapping is really the only thing you can do for this outside of increasing your max_check_attempts:

https://assets.nagios.com/downloads/nag ... pping.html

There are SOFT RECOVERY states but they only occur if you are in a SOFT PROBLEM state and then go back to an OK/UP. If the status is in a HARD PROBLEM state and an OK/UP is received it will always try to send the notification unless you have flapping set up to stop that from occurring.

Do the hosts show as down for them or is it only the services that are showing an issue? If the hosts show as down you can increase your service check_intervals to be higher than the host check and set host_down_disable_service_checks=1 in your /usr/local/nagios/etc/nagios.cfg (and restart the nagios service), that way if the host is down it won't even try to run the service checks. Also, make sure you're selecting the parents on the hosts so that the reachability logic works:

https://assets.nagios.com/downloads/nag ... ility.html

Let us know if you have any questions.

Thank you!
Maxwellb99
Posts: 97
Joined: Tue Jan 26, 2016 5:29 pm

Re: OK (return to normal) state delay

Post by Maxwellb99 »

Hi Nagios,

Thanks for your response. I've got "host_down_disable_service_checks=1" enabled. Unfortunately the hosts are still ping-able. I'll open up another thread but the two cases we've found are 1. it goes unknown when port 5693 connection closes. (still troubleshooting this). 2. If Nagios doesn't get a response in a timely manner. Alright, I'll try to sell flapping to my management.

Thanks, I think you can close this thread.

Cheers,
Max
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: OK (return to normal) state delay

Post by scottwilkerson »

Maxwellb99 wrote:Hi Nagios,

Thanks for your response. I've got "host_down_disable_service_checks=1" enabled. Unfortunately the hosts are still ping-able. I'll open up another thread but the two cases we've found are 1. it goes unknown when port 5693 connection closes. (still troubleshooting this). 2. If Nagios doesn't get a response in a timely manner. Alright, I'll try to sell flapping to my management.

Thanks, I think you can close this thread.

Cheers,
Max
Ok

Closing thread
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked