Page 1 of 1

Service critical on unreachable hosts

Posted: Sun Jun 30, 2024 3:18 am
by tonnag
Hi!
I have nagios core running, and defined parent relations. The relations work okay, when a parent goes down, the child hosts become unreachable. The services on those however become critical. That sounds a bit wrong. I'd guess they should either change to undetermined, retain the last know state, or not be checked at all. How do I change that behaviour?
Thnx, Anton

Re: Service critical on unreachable hosts

Posted: Sun Jun 30, 2024 11:49 pm
by kg2857
There's a setting in nagios.cfg that should handle this.

Re: Service critical on unreachable hosts

Posted: Mon Jul 01, 2024 6:36 am
by tonnag
Hi, Thnx, I already had "host_down_disable_service_checks=1" but the child isn't down so that has no impact.
I also tried "service_check_timeout_state=u" but no change there either.
Those where the options i've seen and considered to have an impact (but did not). Is there a specific setting you refer to?

Re: Service critical on unreachable hosts

Posted: Mon Jul 01, 2024 10:42 am
by bbahn
Hello @tonnag,

It looks like there is a known issue with how Nagios Core handles this situation that an issue has already been filed for Nagios Core: Nagios Core Github Issue: Service notifications despite parent host being down.

I will add a note there and update the weight of the issue.

Re: Service critical on unreachable hosts

Posted: Mon Jul 01, 2024 1:38 pm
by tonnag
Thank you, much appreciated

Re: Service critical on unreachable hosts

Posted: Tue Jul 02, 2024 12:50 pm
by tonnag
Well ..... It's working now as expected. The childs show up as unreachable en their services are no longer critical.
I will try to reproduce it later, but I THINK the sequence was:

* Added all hosts (two from two subnets; production and lab)
* Turned off the lab network
* Defined Parent/child relations & restarted nagios
-> lab shows unreachable with critical services
* Serveral nagios restarts/server reboots ; no change
* Turned on lab network
-> Everything shows okay
* Turned off the lab network
-> lab devices shows unreachable, their services are no longer reported under "services problems" If you go to the host, the services are shown Ok (or just the last state, not sure)

So it does work as expected (after all :D )

Re: Service critical on unreachable hosts

Posted: Wed Jul 10, 2024 2:47 am
by tonnag
Well, I re-used an test server I had, and found it's still a bit off.
* Stop nagios process
* Delete all status files in var folder
* Copied cfg files with parent/child config
* Started LAB devices
* 5 min. later started nagios
* 10 min. later all shows OK/green
* 15 min later shutdown lab devices

Five hosts show as unreachable which is correct. But ...
Four of those five unreachables are connected identical, but only two of them show up with critical services. (one ping service, the other an application port)
The fifth device shows 3 of it's 9 services as critical. After start/stop of the lab, which and the number of critical services vary, but till now, it's always the same hosts that have a service as critical.