Re: [Nagios-devel] [PATCH] circular dependency check speedup
Posted: Mon Dec 20, 2010 10:58 pm
On 12/20/2010 09:02 PM, Glenn Herteg wrote:
> >> I've done some organoleptic analysis of the patch (looked at it).
>
> The trouble with organoleptic analysis is that it's axiomatic in the
> benchmarking business that your guess (or mine) as to where time is
> being spent is simply wrong. The software+hardware computational
> model is much too complex for people to have accurate intuition
> about how all the elements interact. Actual measurements are needed.
>
Agreed. But experienced people are rarely surprised when the bottleneck
happens to be in a particular place. Thanks for the writeup btw. I'll
respond more in-depth below.
> > I'll respond in more detail in a few days.
>
> GroundWork developed the circular dependency patch by profiling
> the code while running preflight operations on a large customer
> configuration. A pre-flight operation lists the following counts:
>
> Checked 35471 services.
> Checked 3362 hosts.
> Checked 116 host groups.
> Checked 0 service groups.
> Checked 290 contacts.
> Checked 225 contact groups.
> Checked 0 service escalations.
> Checked 51031 service dependencies.
> Checked 0 host escalations.
> Checked 0 host dependencies.
> Checked 475 commands.
> Checked 256 time periods.
>
> Total size of the set of Nagios config files is under 18 MB.
>
> To skip immediately to the final results, here are the timings from
> test runs while successive stages of this patch were being developed:
>
> baseline: pre-flight: 859.113 seconds
> stage 1: pre-flight: 185.139 seconds
> stage 2: pre-flight: 42.856 seconds
> stage 3: pre-flight: 3.107 seconds
>
> The baseline run was taken using a stock Nagios 3.2.0 release.
> The remaining runs were taken using patched versions of Nagios 3.2.1,
> which does not differ from 3.2.0 in its calculations for circular
> dependency analysis.
>
> The timings shown above are wall-clock measurements for end-to-end
> pre-flight operations. We also took detailed call-graph profile
> timings (using gprof) during this development effort to guide
> the code changes, but those reports were not retained afterward,
> so I don't have corresponding function-level timings available
> now to document each stage of patch development. The claim of a
> speedup of around 20,000x for the circular dependency analysis is
> based on improvements seen in those function-level timings during
> the development.
>
> ========================================================
>
> Stage 1:
>
> Initial profiling showed that most of the time was being spent
> in pre_flight_circular_check(). There's no i/o in that routine
> or the ones it calls, so right away we know the problem we face is
> computational, not due to any system file-cacheing effects. Also,
> there's not much code in that routine that could be the source of the
> problem, so the nature of the bad performance was fairly obvious:
> an O(n^^2) clearing of the circular_path_checked values within
> independent data structures in a linked list. To address that,
> we made the following changes for the first stage of patching:
>
> * make a contiguous array for
> servicedependency_circular_path_checked, effectively removing the
> circular_path_checked flags from the individual data structures
> and also changing the boolean flags from 4 bytes down to 1 byte
> * use a call to memset() to clear the entire array in one operation,
> rather than walking a linked list
>
Why not calloc() it from the beginning? Since it's created when
Nagios is started one can often avoid even the memset(), since
memory gotten from the kernel is already guaranteed to be zero'd
out. This also avoids libc going through fortification routines
and setting the allocated memory to 0xb5b5b5b5 (or whatever magic
sequence it uses to denote allocated-but-not-written-to memory).
> Extra time is now taken to prepare the new contiguous array, but
> the timing results clearly show the benefit far outweighs the cost.
>
> The revised code is still O
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
> >> I've done some organoleptic analysis of the patch (looked at it).
>
> The trouble with organoleptic analysis is that it's axiomatic in the
> benchmarking business that your guess (or mine) as to where time is
> being spent is simply wrong. The software+hardware computational
> model is much too complex for people to have accurate intuition
> about how all the elements interact. Actual measurements are needed.
>
Agreed. But experienced people are rarely surprised when the bottleneck
happens to be in a particular place. Thanks for the writeup btw. I'll
respond more in-depth below.
> > I'll respond in more detail in a few days.
>
> GroundWork developed the circular dependency patch by profiling
> the code while running preflight operations on a large customer
> configuration. A pre-flight operation lists the following counts:
>
> Checked 35471 services.
> Checked 3362 hosts.
> Checked 116 host groups.
> Checked 0 service groups.
> Checked 290 contacts.
> Checked 225 contact groups.
> Checked 0 service escalations.
> Checked 51031 service dependencies.
> Checked 0 host escalations.
> Checked 0 host dependencies.
> Checked 475 commands.
> Checked 256 time periods.
>
> Total size of the set of Nagios config files is under 18 MB.
>
> To skip immediately to the final results, here are the timings from
> test runs while successive stages of this patch were being developed:
>
> baseline: pre-flight: 859.113 seconds
> stage 1: pre-flight: 185.139 seconds
> stage 2: pre-flight: 42.856 seconds
> stage 3: pre-flight: 3.107 seconds
>
> The baseline run was taken using a stock Nagios 3.2.0 release.
> The remaining runs were taken using patched versions of Nagios 3.2.1,
> which does not differ from 3.2.0 in its calculations for circular
> dependency analysis.
>
> The timings shown above are wall-clock measurements for end-to-end
> pre-flight operations. We also took detailed call-graph profile
> timings (using gprof) during this development effort to guide
> the code changes, but those reports were not retained afterward,
> so I don't have corresponding function-level timings available
> now to document each stage of patch development. The claim of a
> speedup of around 20,000x for the circular dependency analysis is
> based on improvements seen in those function-level timings during
> the development.
>
> ========================================================
>
> Stage 1:
>
> Initial profiling showed that most of the time was being spent
> in pre_flight_circular_check(). There's no i/o in that routine
> or the ones it calls, so right away we know the problem we face is
> computational, not due to any system file-cacheing effects. Also,
> there's not much code in that routine that could be the source of the
> problem, so the nature of the bad performance was fairly obvious:
> an O(n^^2) clearing of the circular_path_checked values within
> independent data structures in a linked list. To address that,
> we made the following changes for the first stage of patching:
>
> * make a contiguous array for
> servicedependency_circular_path_checked, effectively removing the
> circular_path_checked flags from the individual data structures
> and also changing the boolean flags from 4 bytes down to 1 byte
> * use a call to memset() to clear the entire array in one operation,
> rather than walking a linked list
>
Why not calloc() it from the beginning? Since it's created when
Nagios is started one can often avoid even the memset(), since
memory gotten from the kernel is already guaranteed to be zero'd
out. This also avoids libc going through fortification routines
and setting the allocated memory to 0xb5b5b5b5 (or whatever magic
sequence it uses to denote allocated-but-not-written-to memory).
> Extra time is now taken to prepare the new contiguous array, but
> the timing results clearly show the benefit far outweighs the cost.
>
> The revised code is still O
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]