Page 1 of 1

Re: [Nagios-devel] Multithreaded Macro Support wrapper proposal

Posted: Sat Aug 22, 2009 5:48 am
by Guest
Hi Steven,

On 21 Aug 2009, at 18:35, Steven D. Morrey wrote:

> To that end I have decided to robustify the macro system by creating
> a handful of wrapper functions that will make the macros thread safe
> (as long as all macro calls are passed through them).
> These functions are

Taking a different approach, which part of the macro setting routines
is taking the most time? My guess is that the summary macros takes the
most time because it has to walk through the entire list of hosts and
services. http://nagios.sourceforge.net/docs/3_0/macrolist.html

You could disable summary macro processing with the large installation
tweaks (http://nagios.sourceforge.net/docs/3_0/
largeinstalltweaks.html) and see if the timings still show the macro
portion to be causing the bottleneck. I think you are on Nagios 2
though, so this option is not available. You could try just commenting
out that entire block and see how it affects the profiling.

For Opsview, we found for a customer that their CPU was spinning at
100%. Using strace, we found it was in the notifications logic setting
all the macro environment variables. But we knew that the customer
**didn't have notifications enabled for any contacts**. Turns out that
when nagios got an alert event, it would set macros first, and then
work out if the contact should be notified. We changed the loop so
that it checked if the contact should be notified and then calculated
the macros. This reduced their CPU down to 10%.

Patch for Nagios 2.10: https://secure.opsera.com/svn/opsview/b ... load.patch

Patch for Nagios 3: https://secure.opsera.com/svn/opsview/b ... load.patch

I haven't put this into core code yet because I'm trying to work out a
way to test this. Even though I know this works for the thousands of
users using Opsview, I set myself a different standard when it comes
to the hundreds of thousands of users of Nagios :)

I'd be grateful if anyone wants to write a libtap test that proves
this problem, so then I can get it applied to core code.

Ton






This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]