Re: [Nagios-devel] [PATCH] notifications: Fix

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] [PATCH] notifications: Fix

Post by Guest »

Great summary, thanks!

On 2012-12-09 05:04, eponymous alias wrote:
> Here is full background on this bug.
>
> These two sections of code in base/notifications.c are broken:
[snip]
> If you look at it closely, you'll see that most of the central if()'s
> are really just instances of:
>
> if (B A) ...
>
> which of course will never be satisfied, given the original value of
> first_problem_time. And the second similar if() in the second section
> is really:
>
> if (B B) ...
>
> which is certainly non-functional.

So I noticed :)

> Bug history:
>
> A problem reported by Pawel Malachowski, May 20, 2010:
> http://comments.gmane.org/gmane.network ... devel/7402
>
> Ethan Galstad's code change trying to address the reported issue,
> which introduced the bad code above, 2 Jun 2010:
> http://git.op5.org/git/?p=nagios.git;a= ... 4cf41db425
>
> Jochen Bern noticing the problem with the patch, mentioning it publicly,
> and proposing some in-depth thinking about what is really desired,
> September 22, 2010:
> http://permalink.gmane.org/gmane.networ ... devel/7521
> There was apparently no follow-up by anyone.

My two cents:

max_check_attempts and retry_interval already makes it very easy to set
a delay for the first notification - in fact, I'd say they're way better
than this mechanism, because they'll make sure a check is triggered when
you want the notification, which first_notification_delay does not (and
seemingly isn't supposed to).

Thus, as far as I can see, the value of having first_notification_delay
is to set a delay that works regardless of state changes. Therefore, my
patch implements point two, but none of the other, in Jochen's mail.

> Rogerio F Cunha noticing that the code wasn't working as desired, and
> proposing a patch, September 2, 2011. Also, Andreas Ericsson thinking
> the patch was reasonable, but being busy that week, he apparently forgot
> about it.
> http://comments.gmane.org/gmane.network ... devel/8176
>
> The code fragments above are the state of the code across all of the
> Nagios 3.2.3, 3.3.1, and 3.4.1 releases. Perhaps it's time to take
> another look at Jochen Bern's proposal for thinking about the problem,
> and then seeing whether Rogerio F Cunha's patch actually moves the code
> in the desired direction.

Based on my reasoning above, I think the behavior he implements is less
helpful than what I did.

However, I'd be more than happy to see my patch reverted and his
applied, if it means that we stop basing the value on the last OK check,
like we do currently :)

> ================================================================
> Related but different fix:
> http://blogs.op5.org/git/?p=nagios.git; ... 797d151203
>
> See also:
> http://tracker.nagios.org/view.php?id=297
> for the official bug report, and
> https://dev.icinga.org/issues/1145
> for a possible fix.

Those two are the same, and that's in nagios SVN already. It doesn't
help the fact that we're still testing if ((A B)) over and
over.

> --- On Mon, 12/3/12, Robin Sonefors wrote:
>
>> From: Robin Sonefors
>> Subject: [Nagios-devel] [PATCH] notifications: Fix first_notification_delay
>> To: nagios-devel@lists.sourceforge.net
>> Cc: ae@op5.com
>> Date: Monday, December 3, 2012, 5:24 AM
>> Nagios bug #297, while being light on
>> details, points out a problem with
>> first_notification_delay that I think this patch clears up.
>>
>> The problem is two-fold: first, the logic for finding out
>> which state to
>> use is incorrect - a comment rightly points out that
>> last_time_{ok,up}
>> is the wrong value to use, and then proceeds to use any
>> value that is
>> both more recent and less recent than last_time_{ok,up} -
>> this condition
>> is likely to fail, meaning we always use the incorrect
>> value.
>>
>> Second, the logic is impossible to correct, though, because
>> all the
>> other values are incorrect, too - we always set
>> last_time_$state
>> wh

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: robin.sonefors@op5.com
Locked