Re: [Nagios-devel] BUG: Service Reaper does not
Posted: Sun Oct 21, 2007 6:59 am
Percy Jahn wrote:
> Hello Ethan and others,
>
> we are using a redundant Nagios-System with keepalived for
> IP-Transition. The Problem now occuring is that service checks get
> "lost" and are never scheduled again.
> I've located the problem in schedule_service_check(). In case of an
> keepalived transition, nagios gets a STOP_EXECUTING_SVC_CHECKS,
> DISABLE_NOTIFICATIONS or ENABLE_NOTIFICATIONS,
> START_EXECUTING_SVC_CHECKS on the other hand. If nagios got outstanding
> checks while receiving "disable notifications" it sets the global status
> accordingly. reap_service_checks() gets the check results from the
> outstanding properly scheduled service checks and trys to reschedule the
> servicecheck via schedule_service_check(). This function immediately
> exists without rescheduling, because active checks are disabled globaly.
> In the end, the service is lost and could not be rescheduled.
> check_for_orphaned_services() could not solve this problem, because the
> check is marked as "not executing/running" by reap_service_checks().
>
> My first solution is to adapt schedule_service_check() to schedule all
> services (including the not active ones), but i believe this could break
> some other stuff. Ethan could you please take a closer look at this?
>
> I'm using Nagios version 2.6 and checked the Changelog, but nothing
> concerning my problem is mentioned. In the meanwhile i solved the
> problem for my case, via "sighup"ing nagios in case of an transition.
>
> best regards
> Percy Jahn
>
Thanks for reporting this. I have included a fix for this problem in
the latest 2.10 release.
Ethan Galstad
Nagios Developer
___
Email: [email protected]
Web: www.nagios.org
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
> Hello Ethan and others,
>
> we are using a redundant Nagios-System with keepalived for
> IP-Transition. The Problem now occuring is that service checks get
> "lost" and are never scheduled again.
> I've located the problem in schedule_service_check(). In case of an
> keepalived transition, nagios gets a STOP_EXECUTING_SVC_CHECKS,
> DISABLE_NOTIFICATIONS or ENABLE_NOTIFICATIONS,
> START_EXECUTING_SVC_CHECKS on the other hand. If nagios got outstanding
> checks while receiving "disable notifications" it sets the global status
> accordingly. reap_service_checks() gets the check results from the
> outstanding properly scheduled service checks and trys to reschedule the
> servicecheck via schedule_service_check(). This function immediately
> exists without rescheduling, because active checks are disabled globaly.
> In the end, the service is lost and could not be rescheduled.
> check_for_orphaned_services() could not solve this problem, because the
> check is marked as "not executing/running" by reap_service_checks().
>
> My first solution is to adapt schedule_service_check() to schedule all
> services (including the not active ones), but i believe this could break
> some other stuff. Ethan could you please take a closer look at this?
>
> I'm using Nagios version 2.6 and checked the Changelog, but nothing
> concerning my problem is mentioned. In the meanwhile i solved the
> problem for my case, via "sighup"ing nagios in case of an transition.
>
> best regards
> Percy Jahn
>
Thanks for reporting this. I have included a fix for this problem in
the latest 2.10 release.
Ethan Galstad
Nagios Developer
___
Email: [email protected]
Web: www.nagios.org
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]