[Nagios-devel] Skiplists (was: Re: Logging API revamp)

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

[Nagios-devel] Skiplists (was: Re: Logging API revamp)

Post by Guest »

John P. Rouillard wrote:
> In message ,
> Andreas Ericsson writes:
>
>> Andreas Ericsson wrote:
>>> Ethan Galstad wrote:
>>>> For the event queue, I was thinking that a skip list structure might be
>>>> best for efficiency (http://en.wikipedia.org/wiki/Skip_list). The event
>>>> queue is used in primarily two situations:
>>>>
>>>> 1. Popping events from the head of the list to be executed
>>>> 2. Inserting events into the list (mid- or endpoint).
>>>>
>>>> #1 is very efficient with a linked list, but performance with #2 can be
>>>> quite bad in large lists. Since a check event usually appears for each
>>>> host/service that is defined, this can lead to bad performance - O(n^2)
>>>> I believe - with large installations. A skip list would bring the
>>>> performance closer to O(log n).
>>>>
>>>> Anyone have comments/experiences they care to share about the
>>>> performance of skip lists and/or better alternatives?
>>>>
>>> A skiplist would probably do wonders. I've been experimenting with one
>>> now, actually using the timestamp for when next to execute
>> Scratch that. I only glossed the skiplist page you linked, and
>> misunderstood completely. I've been experimenting with a hash, except
>> the time used for input is the key. Implementation is trivial and
>> performance improvement should be fairly close to or better than a
>> skip-list.
>
> Hmm, maybe I am missing something, but if I have an element with a
> known key, I see how a hash table pointing into a conventional linked
> list will speed up access. But doesn't case 2 take the form:
>
> insert node X that should run at time t into the linked list
>
> mean a sequential scan of the linked list? How do you know what time
> to hash to find the node before time t?

It's a sequential scan of a small part of the linked list, just like
any non-perfect hash. Since perfect hashes aren't available for this,
we'll have to use a non-perfect one.


>
> E.G. I want node X inserted at 10:24:33, how do I know if the prior
> node is at 10:24:32, or 10:14:32?
>

With a 60 buckets in the hash-array, you wouldn't be able to tell the
difference without comparing the timestamp of the other scheduled items
in the list. With more than 600 buckets, they'd end up in separate lists.

However, we won't ever have to walk through *all* the lists, we just have
to find the right "end" of things.

Assume for a moment that the highest normal_check_interval you have for
your services is 15 minutes. 1024 buckets will hold your 15 minutes, with
2.067 minutes to spare. Now let's say you want to schedule a service-check
x seconds into the future.

slot = time(NULL) + x;
if (x > num_buckets) {
walk_from_tail(event_array[slot]);
}
else {
walk_from_head(event_array[slot]);
}


Since we're not likely to schedule checks several hours away, and we don't
have to walk *past* all items with the same scheduled time as our own event
(since the list is going to be sorted based on a non-unique index), the most
common case will be that we do only two operations to find our slot. The
above being one, and the below being the second

if (walk_from_head)
if head->when >= our->when /* bingo, most common case */
else
if tail->when when /* bingo again, second most common case */

Only when we have items scheduled more than (num_buckets*2)+1 seconds into
the future and we want to schedule an event where

num_buckets < (when - now) < (num_buckets*2)+1

will we end up actually walking *anything*, and since we're more or less
guaranteed to start in the right end when searching (as the scheduling
queue usually gets more and more saturated the closer to "now" we come),
we shouldn't, in practice, ever walk any large amount of objects. Only
scheduled downtime entries should really ever end up in the scheduling
queue with a (when - now) time larger than (num_buckets*2)+1, and those
are much rarer than your average service checks.

One good thing about this is that it's easy to check and get figures
from, and since the implementation is so trivial, we should be able
to have it u

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked