Hi!
I have nagios core running, and defined parent relations. The relations work okay, when a parent goes down, the child hosts become unreachable. The services on those however become critical. That sounds a bit wrong. I'd guess they should either change to undetermined, retain the last know state, or not be checked at all. How do I change that behaviour?
Thnx, Anton
Service critical on unreachable hosts
Re: Service critical on unreachable hosts
There's a setting in nagios.cfg that should handle this.
Re: Service critical on unreachable hosts
Hi, Thnx, I already had "host_down_disable_service_checks=1" but the child isn't down so that has no impact.
I also tried "service_check_timeout_state=u" but no change there either.
Those where the options i've seen and considered to have an impact (but did not). Is there a specific setting you refer to?
I also tried "service_check_timeout_state=u" but no change there either.
Those where the options i've seen and considered to have an impact (but did not). Is there a specific setting you refer to?
Re: Service critical on unreachable hosts
Hello @tonnag,
It looks like there is a known issue with how Nagios Core handles this situation that an issue has already been filed for Nagios Core: Nagios Core Github Issue: Service notifications despite parent host being down.
I will add a note there and update the weight of the issue.
It looks like there is a known issue with how Nagios Core handles this situation that an issue has already been filed for Nagios Core: Nagios Core Github Issue: Service notifications despite parent host being down.
I will add a note there and update the weight of the issue.
Actively advancing awesome answers with ardent alliteration, aptly addressing all ambiguities. Amplify your acumen and avail our amicable assistance. Eagerly awaiting your astute assessments of our advice.
Re: Service critical on unreachable hosts
Thank you, much appreciated
Re: Service critical on unreachable hosts
Well ..... It's working now as expected. The childs show up as unreachable en their services are no longer critical.
I will try to reproduce it later, but I THINK the sequence was:
* Added all hosts (two from two subnets; production and lab)
* Turned off the lab network
* Defined Parent/child relations & restarted nagios
-> lab shows unreachable with critical services
* Serveral nagios restarts/server reboots ; no change
* Turned on lab network
-> Everything shows okay
* Turned off the lab network
-> lab devices shows unreachable, their services are no longer reported under "services problems" If you go to the host, the services are shown Ok (or just the last state, not sure)
So it does work as expected (after all
)
I will try to reproduce it later, but I THINK the sequence was:
* Added all hosts (two from two subnets; production and lab)
* Turned off the lab network
* Defined Parent/child relations & restarted nagios
-> lab shows unreachable with critical services
* Serveral nagios restarts/server reboots ; no change
* Turned on lab network
-> Everything shows okay
* Turned off the lab network
-> lab devices shows unreachable, their services are no longer reported under "services problems" If you go to the host, the services are shown Ok (or just the last state, not sure)
So it does work as expected (after all
Re: Service critical on unreachable hosts
Well, I re-used an test server I had, and found it's still a bit off.
* Stop nagios process
* Delete all status files in var folder
* Copied cfg files with parent/child config
* Started LAB devices
* 5 min. later started nagios
* 10 min. later all shows OK/green
* 15 min later shutdown lab devices
Five hosts show as unreachable which is correct. But ...
Four of those five unreachables are connected identical, but only two of them show up with critical services. (one ping service, the other an application port)
The fifth device shows 3 of it's 9 services as critical. After start/stop of the lab, which and the number of critical services vary, but till now, it's always the same hosts that have a service as critical.
* Stop nagios process
* Delete all status files in var folder
* Copied cfg files with parent/child config
* Started LAB devices
* 5 min. later started nagios
* 10 min. later all shows OK/green
* 15 min later shutdown lab devices
Five hosts show as unreachable which is correct. But ...
Four of those five unreachables are connected identical, but only two of them show up with critical services. (one ping service, the other an application port)
The fifth device shows 3 of it's 9 services as critical. After start/stop of the lab, which and the number of critical services vary, but till now, it's always the same hosts that have a service as critical.