Page 1 of 1

No recovery alert; "OK;SOFT;1" state.

Posted: Wed Sep 18, 2019 1:02 pm
by yo_marc
Hello Nagios Support,

This morning a critical server that we monitor triggered some alerts when it went down. We got the initial Host Down and one Service Problem alert, but when everything recovered we never got an important Service Recovery alert. (We have automation hooked into the alerts, and to make a long story short its important that we get ALL alerts.)

Looking through the event log, I see some strange behavior. To summarize succinctly:

1. We had some Service checks associated with the critical server unexpectedly get marked as "CRITICAL;HARD;1" - seemingly by passing the max-check-attempt counter.

2. Upon recovery, all of these Services got marked as "OK;SOFT;1" -- bypassing any notification process.

3. Looking at the XI UI now, I see that these Services are set to HARD states. I don't see any event log entry where/when that took place.


The one Service that did alert had one expected "CRITICAL;SOFT;1" entry before it logged the abnormal "CRITICAL;HARD;1". It was set to alert after 2 max-attempts, so this makes some degree of sense that it sent a notification - but there is obviously still something wrong here.

Have you seen this problem before, and do you know of a fix?

I am running XI 5.5.11 on a Centos 7.6 box.

Re: No recovery alert; "OK;SOFT;1" state.

Posted: Wed Sep 18, 2019 3:01 pm
by benjaminsmith
Hello @yo_marc,

Appreciate the detailed description of the issue. It looks like you are hitting this bug in Nagios Core.

https://github.com/NagiosEnterprises/na ... issues/651

Please upgrade to the latest version as this as been patched in Core 4.4.4.

Re: No recovery alert; "OK;SOFT;1" state.

Posted: Thu Sep 19, 2019 10:11 am
by yo_marc
Thank you! Glad to hear its' been addressed.

Am I missing something, or is Core 4.4.4 not yet included in the latest rev of XI? Looks like the latest bump was 4.4.3 in XI version 5.5.9?

https://assets.nagios.com/downloads/nag ... NGES-5.TXT

Re: No recovery alert; "OK;SOFT;1" state.

Posted: Thu Sep 19, 2019 10:39 am
by benjaminsmith
Hello @yo_mar,

No you are not missing something, my mistake. Sorry about that, we typically wait sometime before pulling the latest core version into Nagios XI for stability. We should have this updated soon (likely 5.6.7).

No recovery alert; "OK;SOFT;1" state.

Posted: Thu Sep 19, 2019 10:40 am
by benjaminsmith
Hello @yo_mar,

No you are not missing something, my mistake. Sorry about that. We typically wait sometime before pulling the latest core version into Nagios XI for stability. We should have this updated soon (likely 5.6.7).

Re: No recovery alert; "OK;SOFT;1" state.

Posted: Thu Sep 19, 2019 1:45 pm
by yo_marc
Thanks! I'll keep an eye out for that next release. Feel free to close this if/as needed.

Re: No recovery alert; "OK;SOFT;1" state.

Posted: Thu Sep 19, 2019 2:05 pm
by benjaminsmith
Hi,

Sounds good. We'll close this up. If you have any new questions feel free to open another.

Thank you for using the Nagios Support Forum.