[PATCH] Don't idle when there is more work to do.

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

[PATCH] Don't idle when there is more work to do.

Post by Guest »


--=-xPMwxuO2TA1ox53GccuX
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit

On Wed, 2010-11-24 at 12:52 +0100, Andreas Ericsson wrote:
> On 11/23/2010 01:59 PM, Fredrik Thulin wrote:
> > In case that was too long for y'all, the short story is that I got lousy
> > performance from the service check scheduler in Nagios (3.2.0, sorry -
> > forgot to mention that).
> >
> > I was able to write a brand new scheduler that works MUCH better - 1160
> > checks per minute, compared to ~60. Any plans to do something drastic
> > about the Nagios service check scheduler?
> >
>
> I have no idea what you did to Nagios to make it run only 60 checks per
> minute.

I figured it out, by looking at the scheduler source code to actually
realize what was important in the debug logs.

I was running a distributed check slave without active host checks,
based on my reading of docs/3_0/distributed.html. Crazy, eh? ;)

Why was this a problem?

Host checks were still being scheduled, and every time a host check was
found at the front of event_list_low, Nagios would log "We're not
executing host checks right now, so we'll skip this event." and then
sleep for sleep_time seconds (0.25 was my setting, based on (Ubuntu)
defaults) (!!!).

I made the attached minimalistic patch to not sleep if the next event in
the event list is already due.

This removed the total lack of performance in my installation, but
service reaping is still killing me slowly on my virtual development
server. The dedicated production server is actually fast enough to
execute my ~6k service checks AND endure the painful reaping pause every
5 minutes.

The scheduler really needs much more work (like sub-second precision for
when to start checks - that gave me roughly 25% additional performance
in my Erlang based scheduler), but this at least makes Nagios usable
again for me. Also, why not fork a separate thread to do the reaping?

For more evidence, see the graphs at

http://people.su.se/~ft/test/mrtg_nagio ... ios-e.html

additional graphs at

http://people.su.se/~ft/test/mrtg_nagio ... index.html

Yesterday between 10:00 and 12:30 I was running with
execute_host_checks=1, then some iterations of testing and patching, and
from about 17:00 onwards I'm running a patched version of Nagios (3.2.0)
with execute_host_checks disabled again. Notice the slow decrease in
performance over night - I'm pretty sure it's about reaping etc. which
is making my latency creep upwards. At 10:00 today Nagios was restarted
again, resulting in a small increase.

/Fredrik


--=-xPMwxuO2TA1ox53GccuX
Content-Disposition: attachment;
filename="0001-Don-t-idle-when-there-is-more-work-to-do.patch"
Content-Type: text/x-patch;
name="0001-Don-t-idle-when-there-is-more-work-to-do.patch";
charset="UTF-8"
Content-Transfer-Encoding: 7bit

From 08a565c3584aa2322cf2a928e52906ea069263c8 Mon Sep 17 00:00:00 2001
From: Fredrik Thulin
Date: Wed, 1 Dec 2010 09:32:16 +0100
Subject: [PATCH] Don't idle when there is more work to do.

Even if active host checks are disabled, they still get
scheduled.

When the scheduler finds a host check in the job list,
and then decides not to run it because active host
checks are disabled, it should not sleep sleep_time seconds
(default: 0.25 seconds) in case the next job in the queue
also is supposed to be started right away.
---
base/events.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/base/events.c b/base/events.c
index 0348e3e..fabe830 100644
--- a/base/events.c
+++ b/base/events.c
@@ -1153,8 +1153,12 @@ int event_execution_loop(void){

/* wait a while so we don't hog the CPU... */
else{
+ if (event_list_low!=NULL && current_time>=event_list_low->run_time) {
+ log_debug_info(DEBUGL_EVENTS,2,"Did not execute scheduled event. Go to next event immediately.\n");
+ continue;
+ }
+ log_debug_info(DEBUGL_EVENTS,2,"Did not execute scheduled event, and next check is not ready to run. Idling for a bit...\n");

- log_debug_info(DEBUGL_EVENTS,2,"Did not execute scheduled e

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked