Re: [Nagios-devel] [PATCH] Re: alternative scheduler

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] [PATCH] Re: alternative scheduler

Post by Guest »

On 12/01/2010 08:55 PM, Adam Augustine wrote:
> While DNX and mod_gearman do implement that specific functionality,
> they are still subject to the scheduler/reaper bottlenecks. We (the
> institution that started the DNX project) have played around with the
> check scheduling parameters quite a bit over the years and even with
> our best scheduling parameters and DNX actually executing the plugins,
> we still see checks scheduled such that we have a large number of
> checks scheduled to execute in a single second with several seconds
> (3-5) of nothing scheduled to execute between.

Agreed. That's also the reason why I don't use either so far; I don't
have a problem (yet ...) with the short-term scheduling (scheduling "due
now" checks onto executors), but I see unnecessary churn in the mid-term
scheduling (schedule next due time of checks just completed).

Unless I *really* need new glasses, there's only three different kinds
of such rescheduling code in the 3.2.x Nagios core:

1. Reschedule *exactly* check_interval / retry_interval from last due
time (iff check_period allows this) - e.g., base/checks.c::1301ff :

if(reschedule_check=3D=3DTRUE)
next_service_check=3D(time_t)(temp_service->last_check
+(temp_service->check_interval*interval_length));
}

2. Reschedule to the *very first second* permitted by check_period -
e.g., base/checks.c::278ff :

/* make sure we rescheduled the next service check at a valid time */
get_next_valid_time(preferred_time,
&next_valid_time,svc->check_period_ptr);
[...]
svc->next_check=3Dnext_valid_time;

3. Special (error) cases falling back to some hardcoded "check interval"
(five minutes, one week, ...).

Neither case even *looks* at the list of already-scheduled check
executions around the target time, much less does any smoothing.

(For sake of completeness: A smoothing algorithm IMHO should:
Case 1: *Decrease* next_check for at most a certain percentage of
check_interval/retry_interval, so as to avoid consecutive faults in
freshness checks and performance data processing (in the case of RRDs,
violation of xff);
Case 2: *Increase* next_check so as to stay within the check_period, but
determining a max increment which simultaneously smoothes out the
(potentially MANY) affected checks and avoids pushing the chain of
subsequent processing (retry_interval / max_check_attempts if found
non-OK, running event handlers, ...) *beyond* the valid timeframe is
definitely nontrivial.)

Kind regards,
J. Bern
--=20
Jochen Bern, Systemingenieur --- LINworks GmbH
Postfach 100121, 64201 Darmstadt | Robert-Koch-Str. 9, 64331 Weiterstadt
PGP (1024D/4096g) FP =3D D18B 41B1 16C0 11BA 7F8C DCF7 E1D5 FAF4 444E 1C2=
7
Tel. +49 6151 9067-231, Zentr. -0, Fax -299 - Amtsg. Darmstadt HRB 85202
Unternehmenssitz Weiterstadt, Gesch=E4ftsf=FChrer Metin Dogan, Oliver Mic=
hel





This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked