Hello Nagios Support,
This morning a critical server that we monitor triggered some alerts when it went down. We got the initial Host Down and one Service Problem alert, but when everything recovered we never got an important Service Recovery alert. (We have automation hooked into the alerts, and to make a long story short its important that we get ALL alerts.)
Looking through the event log, I see some strange behavior. To summarize succinctly:
1. We had some Service checks associated with the critical server unexpectedly get marked as "CRITICAL;HARD;1" - seemingly by passing the max-check-attempt counter.
2. Upon recovery, all of these Services got marked as "OK;SOFT;1" -- bypassing any notification process.
3. Looking at the XI UI now, I see that these Services are set to HARD states. I don't see any event log entry where/when that took place.
The one Service that did alert had one expected "CRITICAL;SOFT;1" entry before it logged the abnormal "CRITICAL;HARD;1". It was set to alert after 2 max-attempts, so this makes some degree of sense that it sent a notification - but there is obviously still something wrong here.
Have you seen this problem before, and do you know of a fix?
I am running XI 5.5.11 on a Centos 7.6 box.
No recovery alert; "OK;SOFT;1" state.
-
benjaminsmith
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: No recovery alert; "OK;SOFT;1" state.
Hello @yo_marc,
Appreciate the detailed description of the issue. It looks like you are hitting this bug in Nagios Core.
https://github.com/NagiosEnterprises/na ... issues/651
Please upgrade to the latest version as this as been patched in Core 4.4.4.
Appreciate the detailed description of the issue. It looks like you are hitting this bug in Nagios Core.
https://github.com/NagiosEnterprises/na ... issues/651
Please upgrade to the latest version as this as been patched in Core 4.4.4.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: No recovery alert; "OK;SOFT;1" state.
Thank you! Glad to hear its' been addressed.
Am I missing something, or is Core 4.4.4 not yet included in the latest rev of XI? Looks like the latest bump was 4.4.3 in XI version 5.5.9?
https://assets.nagios.com/downloads/nag ... NGES-5.TXT
Am I missing something, or is Core 4.4.4 not yet included in the latest rev of XI? Looks like the latest bump was 4.4.3 in XI version 5.5.9?
https://assets.nagios.com/downloads/nag ... NGES-5.TXT
-
benjaminsmith
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: No recovery alert; "OK;SOFT;1" state.
Hello @yo_mar,
No you are not missing something, my mistake. Sorry about that, we typically wait sometime before pulling the latest core version into Nagios XI for stability. We should have this updated soon (likely 5.6.7).
No you are not missing something, my mistake. Sorry about that, we typically wait sometime before pulling the latest core version into Nagios XI for stability. We should have this updated soon (likely 5.6.7).
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
benjaminsmith
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
No recovery alert; "OK;SOFT;1" state.
Hello @yo_mar,
No you are not missing something, my mistake. Sorry about that. We typically wait sometime before pulling the latest core version into Nagios XI for stability. We should have this updated soon (likely 5.6.7).
No you are not missing something, my mistake. Sorry about that. We typically wait sometime before pulling the latest core version into Nagios XI for stability. We should have this updated soon (likely 5.6.7).
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: No recovery alert; "OK;SOFT;1" state.
Thanks! I'll keep an eye out for that next release. Feel free to close this if/as needed.
-
benjaminsmith
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: No recovery alert; "OK;SOFT;1" state.
Hi,
Sounds good. We'll close this up. If you have any new questions feel free to open another.
Thank you for using the Nagios Support Forum.
Sounds good. We'll close this up. If you have any new questions feel free to open another.
Thank you for using the Nagios Support Forum.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!