[Nagios-devel] BUG: Service Reaper does not reschedule

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

[Nagios-devel] BUG: Service Reaper does not reschedule

Post by Guest »

Hello Ethan and others,

we are using a redundant Nagios-System with keepalived for
IP-Transition. The Problem now occuring is that service checks get
"lost" and are never scheduled again.
I've located the problem in schedule_service_check(). In case of an
keepalived transition, nagios gets a STOP_EXECUTING_SVC_CHECKS,
DISABLE_NOTIFICATIONS or ENABLE_NOTIFICATIONS,
START_EXECUTING_SVC_CHECKS on the other hand. If nagios got outstanding
checks while receiving "disable notifications" it sets the global status
accordingly. reap_service_checks() gets the check results from the
outstanding properly scheduled service checks and trys to reschedule the
servicecheck via schedule_service_check(). This function immediately
exists without rescheduling, because active checks are disabled globaly.
In the end, the service is lost and could not be rescheduled.
check_for_orphaned_services() could not solve this problem, because the
check is marked as "not executing/running" by reap_service_checks().

My first solution is to adapt schedule_service_check() to schedule all
services (including the not active ones), but i believe this could break
some other stuff. Ethan could you please take a closer look at this?

I'm using Nagios version 2.6 and checked the Changelog, but nothing
concerning my problem is mentioned. In the meanwhile i solved the
problem for my case, via "sighup"ing nagios in case of an transition.

best regards
Percy Jahn






This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked