Page 1 of 1

Re: [Nagios-devel] [Nagios-users] [PATCH] reduce notification load;

Posted: Tue Nov 22, 2011 12:02 am
by Guest
On 21.11.2011 20:56, Andreas Ericsson wrote:
> On 11/01/2011 02:05 PM, Michael Friedrich wrote:
>> hi,
>>
>> recently we've been debugging on team icinga in the middle of
>> notifications and macros, and while investigating on a users problem,
>> we've digged a bit deeper into the notification viability checks,
>> resulting in deeper analysis of an Opsview patch to reduce the
>> notification load significantly by moving the viability checks from
>> the actual notification into the creation of the contacts notified,
>> passing only a list of 'qualified' contacts to the actual
>> notification logic. the only thing to remark over here is that the
>> checks against the valid notification_period now happen sooner, and
>> not actually when the notification is sent to each contact.
>>
>> while implementing that patch into current code (needs some macro
>> passing with current code), we did remember nagios bug #98 where the
>> $NOTIFICATIONRECEIPIENTS$ macro is demanded to be only populated with
>> the actual contacts to be notified, but not all of those assigned to
>> the host/service. while this is considered to be a real bug, further
>> investigation showed that thanks to the viability checks before
>> calling add_notification(), contacts won't be added to that macro as
>> the macro logic happens within that function too.
>>
>> so by applying the attached git patch, you will a. reduce
>> notification load and b. fix the $NOTIFICATIONRECEIPIENTS$ macro
>> holding all contacts, but not the viable contacts.
>>
>> since the code remains actually the same on icinga and nagios in this
>> stage, the tests can be found at the icinga dev tracker as usual.
>> https://dev.icinga.org/issues/1744
>> https://dev.icinga.org/issues/2023
>>
>
> I've started looking into this patch right now. It's good to get that
> issue (#98) fixed, but I fail to see any noticeable performance
> improvement. All contacts potentionally viable for being contacted are
> still looked at, but the difference with this patch is that it checks
> the viability before shipping it off to add_notification(), which does
> fix issue 98 but at the expense of quite a lot of code duplication.

normally, all contacts would have been added to the notification_list in
memory, even those not actually passing the viability checks. but at
this stage of the code, nobody is aware of that so the list gets
populated either way by calling add_notification().

/* add all individual contacts for this host */
^^^

having that notification_list created, this remains fully linked in
memory. let's say, you have a bunch of some 1k contacts for that
service, and actually the alarm would hit only those in the nonworkhours
or workhours timeperiod and only on critical, for the ops team e.g.
so by looping through the notification_list, you will encounter *all*
contacts for that host, only the duplicates have been removed.

/* notify each contact (duplicates have been removed) */

then you'll fire up the actual notification with calling
notify_contact_of_host - and actually in there, the current core checks
the viability for the contact to be notified.

you are right, if each contact gets notified 24x7 on all
notification_options, the algorithm stays the same. but if you happen to
have a lot of different contacts assigned to hosts and services, not
getting notified each time a notification is triggered, the overall
amount of looping through notification_list will be shorter and save
some cpu cycles, and probably on larger systems, a bit more than just
some as this means a reduction of the looping for each contact to be
checked to be notified on the actual end-of-the-line.

furthermore, where do you get the idea of code duplication from? the
only changes made by this patch is actually moving the viability checks
and therefore passing an additional function parameter which makes the
diff a bit more bloated than it should be.


>
> I'll see if I can improve on that a bit.
>
>> kudos to Opsview Team for their initial patch as well as Icinga
>> Development Team for the further analysis on the macro bug.

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]