Service critical on unreachable hosts

tonnag · Post by **tonnag** » Sun Jun 30, 2024 3:18 am

Hi!
I have nagios core running, and defined parent relations. The relations work okay, when a parent goes down, the child hosts become unreachable. The services on those however become critical. That sounds a bit wrong. I'd guess they should either change to undetermined, retain the last know state, or not be checked at all. How do I change that behaviour?
Thnx, Anton

kg2857 · Post by **kg2857** » Sun Jun 30, 2024 11:49 pm

There's a setting in nagios.cfg that should handle this.

tonnag · Post by **tonnag** » Mon Jul 01, 2024 6:36 am

Hi, Thnx, I already had "host_down_disable_service_checks=1" but the child isn't down so that has no impact.
I also tried "service_check_timeout_state=u" but no change there either.
Those where the options i've seen and considered to have an impact (but did not). Is there a specific setting you refer to?

bbahn · Post by **bbahn** » Mon Jul 01, 2024 10:42 am

Hello @tonnag,

It looks like there is a known issue with how Nagios Core handles this situation that an issue has already been filed for Nagios Core: Nagios Core Github Issue: Service notifications despite parent host being down.

I will add a note there and update the weight of the issue.

tonnag · Post by **tonnag** » Mon Jul 01, 2024 1:38 pm

Thank you, much appreciated

tonnag · Post by **tonnag** » Tue Jul 02, 2024 12:50 pm

Well ..... It's working now as expected. The childs show up as unreachable en their services are no longer critical.
I will try to reproduce it later, but I THINK the sequence was:

* Added all hosts (two from two subnets; production and lab)
* Turned off the lab network
* Defined Parent/child relations & restarted nagios
-> lab shows unreachable with critical services
* Serveral nagios restarts/server reboots ; no change
* Turned on lab network
-> Everything shows okay
* Turned off the lab network
-> lab devices shows unreachable, their services are no longer reported under "services problems" If you go to the host, the services are shown Ok (or just the last state, not sure)

So it does work as expected (after all

)

tonnag · Post by **tonnag** » Wed Jul 10, 2024 2:47 am

Well, I re-used an test server I had, and found it's still a bit off.
* Stop nagios process
* Delete all status files in var folder
* Copied cfg files with parent/child config
* Started LAB devices
* 5 min. later started nagios
* 10 min. later all shows OK/green
* 15 min later shutdown lab devices

Five hosts show as unreachable which is correct. But ...
Four of those five unreachables are connected identical, but only two of them show up with critical services. (one ping service, the other an application port)
The fifth device shows 3 of it's 9 services as critical. After start/stop of the lab, which and the number of critical services vary, but till now, it's always the same hosts that have a service as critical.

Nagios Support Forum

Service critical on unreachable hosts

Service critical on unreachable hosts

Re: Service critical on unreachable hosts

Re: Service critical on unreachable hosts

Re: Service critical on unreachable hosts

Re: Service critical on unreachable hosts

Re: Service critical on unreachable hosts

Re: Service critical on unreachable hosts