Page 1 of 1

[Nagios-devel] BUG: Service Reaper does not reschedule

Posted: Thu Aug 30, 2007 5:53 am
by Guest
Hello Ethan and others,

we are using a redundant Nagios-System with keepalived for
IP-Transition. The Problem now occuring is that service checks get
"lost" and are never scheduled again.
I've located the problem in schedule_service_check(). In case of an
keepalived transition, nagios gets a STOP_EXECUTING_SVC_CHECKS,
DISABLE_NOTIFICATIONS or ENABLE_NOTIFICATIONS,
START_EXECUTING_SVC_CHECKS on the other hand. If nagios got outstanding
checks while receiving "disable notifications" it sets the global status
accordingly. reap_service_checks() gets the check results from the
outstanding properly scheduled service checks and trys to reschedule the
servicecheck via schedule_service_check(). This function immediately
exists without rescheduling, because active checks are disabled globaly.
In the end, the service is lost and could not be rescheduled.
check_for_orphaned_services() could not solve this problem, because the
check is marked as "not executing/running" by reap_service_checks().

My first solution is to adapt schedule_service_check() to schedule all
services (including the not active ones), but i believe this could break
some other stuff. Ethan could you please take a closer look at this?

I'm using Nagios version 2.6 and checked the Changelog, but nothing
concerning my problem is mentioned. In the meanwhile i solved the
problem for my case, via "sighup"ing nagios in case of an transition.

best regards
Percy Jahn






This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]