Re: [Nagios-devel] [PATCH] notifications: Fix

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] [PATCH] notifications: Fix

Post by Guest »

On 12/10/2012 09:50 AM, Robin Sonefors wrote:
> Great summary, thanks!
>
> On 2012-12-09 05:04, eponymous alias wrote:
>> Here is full background on this bug.
>>
>> These two sections of code in base/notifications.c are broken:
> [snip]
>> If you look at it closely, you'll see that most of the central if()'s
>> are really just instances of:
>>
>> if (B A) ...
>>
>> which of course will never be satisfied, given the original value of
>> first_problem_time. And the second similar if() in the second section
>> is really:
>>
>> if (B B) ...
>>
>> which is certainly non-functional.
>
> So I noticed :)
>
>> Bug history:
>>
>> A problem reported by Pawel Malachowski, May 20, 2010:
>> http://comments.gmane.org/gmane.network ... devel/7402
>>
>> Ethan Galstad's code change trying to address the reported issue,
>> which introduced the bad code above, 2 Jun 2010:
>> http://git.op5.org/git/?p=nagios.git;a= ... 4cf41db425
>>
>> Jochen Bern noticing the problem with the patch, mentioning it publicly,
>> and proposing some in-depth thinking about what is really desired,
>> September 22, 2010:
>> http://permalink.gmane.org/gmane.networ ... devel/7521
>> There was apparently no follow-up by anyone.
>
> My two cents:
>
> max_check_attempts and retry_interval already makes it very easy to set a delay for the first notification - in fact, I'd say they're way better than this mechanism, because they'll make sure a check is triggered when you want the notification, which first_notification_delay does not (and seemingly isn't supposed to).
>
> Thus, as far as I can see, the value of having first_notification_delay is to set a delay that works regardless of state changes. Therefore, my patch implements point two, but none of the other, in Jochen's mail.
>

AFAIR, the original use-case was to allow operators to react to HARD
alerts and acknowledge or fix them before notifications were sent out.
As such, the logic that makes "first_notification_delay" only trigger
notifications after a new check makes perfect sense.

It's unfortunate that the original algorithm didn't schedule a check
to run at the exact time when a notification was supposed to be sent
and then, pending non-OK check status, sent the notification, but
ignoring checks between the hard failure and the notification is not
a viable solution either, and adding support for having multiple
checks scheduled at the same time would make this more complex than
necessary. It would also go against all the online documentation
regarding this feature (such as blogposts and what not).

Based on that reasoning, it seems the following rules would make the
most sense:
* first_notification_delay should delay notifications since the most
recent HARD problem state but await the result of a check before it
actually sends a notification.
* If delaying the notification causes it to end up in a time where
notifications should be sent, it should be sent even if the time of
the alert happened during a period when no notifications should have
been sent.
* If delaying the check causes it to switch to a state which should
not result in a notification, no notification should be sent out.
* Delaying a notification should not increase its notification_number,
and will, as such, affect both regular and escalated notifications.
* Custom-, downtime-, acknowledgement and flapping notifications will
never be delayed (flapping is arguable, but matches current code).

Comments on that? I'm busy writing documentation a while longer, so
feel free to chip in. I'll apply something on wednesday if I haven't
heard any arguments for or against before that.

And yes, this is now officially the thinking session for what to do
with it, so we'll make a decision here and get rid of the wretched
issue once and for all.

--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terr

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: ae@op5.se
Locked