Page 1 of 1

Re: [Nagios-devel] [PATCH] circular dependency check speedup

Posted: Mon Dec 20, 2010 8:02 pm
by Guest
>> I've done some organoleptic analysis of the patch (looked at it).

The trouble with organoleptic analysis is that it's axiomatic in the
benchmarking business that your guess (or mine) as to where time is
being spent is simply wrong. The software+hardware computational
model is much too complex for people to have accurate intuition
about how all the elements interact. Actual measurements are needed.

> I'll respond in more detail in a few days.

GroundWork developed the circular dependency patch by profiling
the code while running preflight operations on a large customer
configuration. A pre-flight operation lists the following counts:

Checked 35471 services.
Checked 3362 hosts.
Checked 116 host groups.
Checked 0 service groups.
Checked 290 contacts.
Checked 225 contact groups.
Checked 0 service escalations.
Checked 51031 service dependencies.
Checked 0 host escalations.
Checked 0 host dependencies.
Checked 475 commands.
Checked 256 time periods.

Total size of the set of Nagios config files is under 18 MB.

To skip immediately to the final results, here are the timings from
test runs while successive stages of this patch were being developed:

baseline: pre-flight: 859.113 seconds
stage 1: pre-flight: 185.139 seconds
stage 2: pre-flight: 42.856 seconds
stage 3: pre-flight: 3.107 seconds

The baseline run was taken using a stock Nagios 3.2.0 release.
The remaining runs were taken using patched versions of Nagios 3.2.1,
which does not differ from 3.2.0 in its calculations for circular
dependency analysis.

The timings shown above are wall-clock measurements for end-to-end
pre-flight operations. We also took detailed call-graph profile
timings (using gprof) during this development effort to guide
the code changes, but those reports were not retained afterward,
so I don't have corresponding function-level timings available
now to document each stage of patch development. The claim of a
speedup of around 20,000x for the circular dependency analysis is
based on improvements seen in those function-level timings during
the development.

========================================================

Stage 1:

Initial profiling showed that most of the time was being spent
in pre_flight_circular_check(). There's no i/o in that routine
or the ones it calls, so right away we know the problem we face is
computational, not due to any system file-cacheing effects. Also,
there's not much code in that routine that could be the source of the
problem, so the nature of the bad performance was fairly obvious:
an O(n^^2) clearing of the circular_path_checked values within
independent data structures in a linked list. To address that,
we made the following changes for the first stage of patching:

* make a contiguous array for
servicedependency_circular_path_checked, effectively removing the
circular_path_checked flags from the individual data structures
and also changing the boolean flags from 4 bytes down to 1 byte
* use a call to memset() to clear the entire array in one operation,
rather than walking a linked list

Extra time is now taken to prepare the new contiguous array, but
the timing results clearly show the benefit far outweighs the cost.

The revised code is still O(n^^2) (as we still have to clear just
as many flags, and just as often), but one dimension of this square
has been radically shrunk. Modern processors are highly optimized
for clearing a continuous swath of memory to a fixed value, since
clearing large areas to zero is such a commonly needed action.
So we use that mechanism instead of jumping randomly around in
memory. (Aside from benefits of using probably-microcoded processor
capabilities for the contiguous clearing, such as highly-efficient
multimedia instructions, we also avoid a tremendous amount of
potential cache-line aliasing and attendant inefficiency in memory
accesses, and possibly extra page faults.) With contiguous boolean
flags, we were also able to change their type from int to unsigned
char, reducing the space and thus further reducing the amount of
time needed for the clearing act

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]