Re: [Nagios-devel] Race condition in freshness checking

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] Race condition in freshness checking

Post by Guest »

Ton Voon wrote:
> Hi!
>
> We found a bug in the calculation of the latency for a passive check.
> This has highlighted a possible race condition re: freshness checking.
> We wanted to get some ideas on what is the best approach to fix this.
>
> Background:
>
> We have a master/slave arrangement, with freshness checking
> (freshness_threshold=0) of slave services on the master.
>
> Looking in the NDO db, we realised that the latency values for passive
> results were incorrectly calculate - sometimes latency values could be
> 1000x out. The patch is attached. However, since using this patch, we've
> seen occasional race conditions.
>
> Problem:
>
> Within checks.c:check_service_result_freshness, if a service has past
> its expiration_time, it is marked as is_being_freshened and a forced
> service check is scheduled. However, if a passive result for this
> service is processed before this forced check is run, then the service
> is marked as stale and the state is inconsistent between master and slave.
>
> Possible solutions:
>
> - If a check result is processed with is_being_freshened set for the
> service, then remove forced check from schedule if it exists.
> - Change is_being_freshened to stale_time (0 if not stale). On running
> the forced check, if stale_time is less than last_check_time (+
> latency?), break out of running the forced check.
>
> None of these sound particularly appealing to us. Are there other
> possible solutions? Any opinions?
>
> Ton
>

I think this race condition was brought up once before on the list, so
I'll take a look at what can be done. I think a reasonable solution can
be found to work for Nagios 3, but backporting it to Nagios 2 will be
more challenging due to the different check result IPC.


Ethan Galstad,
Nagios Developer
---
Email: [email protected]
Website: http://www.nagios.org





This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked