On 11/22/2011 01:02 AM, Michael Friedrich wrote:
> On 21.11.2011 20:56, Andreas Ericsson wrote:
>> On 11/01/2011 02:05 PM, Michael Friedrich wrote:
>>> hi,
>>>
>>> recently we've been debugging on team icinga in the middle of
>>> notifications and macros, and while investigating on a users problem,
>>> we've digged a bit deeper into the notification viability checks,
>>> resulting in deeper analysis of an Opsview patch to reduce the
>>> notification load significantly by moving the viability checks from
>>> the actual notification into the creation of the contacts notified,
>>> passing only a list of 'qualified' contacts to the actual
>>> notification logic. the only thing to remark over here is that the
>>> checks against the valid notification_period now happen sooner, and
>>> not actually when the notification is sent to each contact.
>>>
>>> while implementing that patch into current code (needs some macro
>>> passing with current code), we did remember nagios bug #98 where the
>>> $NOTIFICATIONRECEIPIENTS$ macro is demanded to be only populated with
>>> the actual contacts to be notified, but not all of those assigned to
>>> the host/service. while this is considered to be a real bug, further
>>> investigation showed that thanks to the viability checks before
>>> calling add_notification(), contacts won't be added to that macro as
>>> the macro logic happens within that function too.
>>>
>>> so by applying the attached git patch, you will a. reduce
>>> notification load and b. fix the $NOTIFICATIONRECEIPIENTS$ macro
>>> holding all contacts, but not the viable contacts.
>>>
>>> since the code remains actually the same on icinga and nagios in this
>>> stage, the tests can be found at the icinga dev tracker as usual.
>>> https://dev.icinga.org/issues/1744
>>> https://dev.icinga.org/issues/2023
>>>
>>
>> I've started looking into this patch right now. It's good to get that
>> issue (#98) fixed, but I fail to see any noticeable performance
>> improvement. All contacts potentionally viable for being contacted are
>> still looked at, but the difference with this patch is that it checks
>> the viability before shipping it off to add_notification(), which does
>> fix issue 98 but at the expense of quite a lot of code duplication.
>
> normally, all contacts would have been added to the notification_list in
> memory, even those not actually passing the viability checks. but at
> this stage of the code, nobody is aware of that so the list gets
> populated either way by calling add_notification().
>
> /* add all individual contacts for this host */
> ^^^
>
> having that notification_list created, this remains fully linked in
> memory. let's say, you have a bunch of some 1k contacts for that
> service, and actually the alarm would hit only those in the nonworkhours
> or workhours timeperiod and only on critical, for the ops team e.g.
> so by looping through the notification_list, you will encounter *all*
> contacts for that host, only the duplicates have been removed.
>
> /* notify each contact (duplicates have been removed) */
>
> then you'll fire up the actual notification with calling
> notify_contact_of_host - and actually in there, the current core checks
> the viability for the contact to be notified.
>
> you are right, if each contact gets notified 24x7 on all
> notification_options, the algorithm stays the same. but if you happen to
> have a lot of different contacts assigned to hosts and services, not
> getting notified each time a notification is triggered, the overall
> amount of looping through notification_list will be shorter and save
> some cpu cycles, and probably on larger systems, a bit more than just
> some as this means a reduction of the looping for each contact to be
> checked to be notified on the actual end-of-the-line.
>
Right, but all contacts are still checked for viability, so the amount
of looping is reduced once for all those who aren't viable, while the
number of viability checks (which I presume is the expensive part) will
remain the same.
> furthermore, where do you get the idea of cod
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]