Re: [Nagios-devel] Fix for host dependency checks

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] Fix for host dependency checks

Post by Guest »

* Ethan Galstad [2006-03-21 19:17]:
> On 22 Mar 2006 at 1:48, Holger Weiss wrote:
> > * Ethan Galstad [2006-03-21 12:50]:
> > > I'll keep this on the TODO list for Nagios 3.x, but I think it might
> > > require some more thought. The last hard state of the host should
> > > only be used in the dependency logic if a state change occurred
> > > relatively recently. If, for example, the last hard state change
> > > occurred two days ago, you don't want that value used in the logic.
> >
> > Okay, but the current Nagios code uses _only_ the last hard state (no
> > matter how "old" it is), which is the reason why I've encountered the
> > problem in the first place. I thought about checking the freshness of
> > the last hard state myself (the information is available in the host
> > struct already, so this would be easy), but then I omitted that since
> > letting the dependency fail if either the current or the last hard
> > state matches the criteria seemed sufficiently safe to me. This way,
> > "false alarms" for the (dependent) host B should reliably be
> > prevented, while the risk of suppressing legitimate notifications for
> > B because the dependency fails due to an outdated last hard state of A
> > is the same as with the current Nagios code. I believe that in
> > practice, this risk is very low: I suppose that in almost all cases,
> > the configured dependency criteria will be a down and/or unreachable
> > state. So the risk would be that an outdated down or unreachable
> > state lets the dependency fail, but down and unreachable states should
> > normally be more or less up-to-date.
>
> Aha - I think we're using different terms. :-) The nagios 2.x code
> uses host->current_state in the dependency logic, but that's not
> necessarily "current" in terms of time.

Yes, that's what I meant. The 2.x code simply uses host->current_state.
My patch forces a new check of host A during the dependency check for B.
After this new host check was performed, the host->current_state value
used by the 2.x code is available as host->last_hard_state. My patch
then checks this host->last_hard_state value just as the 2.x code does
and additionally checks the now updated host->current_state.

> I made some major overhauls to the host check logic in the Nagios 3.x
> CVS code.

Ah, sorry, I must admit that I didn't find the time to look at the new
code yet---I'll do that really soon now[tm]! :-) Okay, forget about my
patch (apart from maybe as a bugfix for the 2.x branch) ;-)

> Those changes include parallel host checks and "predictive dependency
> checks". The predictive checks idea came from your earlier suggestion
> that all hosts that are depended upon for notification be checked
> before the notification gets sent out.
>
> Here's how the Nagios 3.x code does this... On the second to the last
> max host check attempt, Nagios will execute a parallel check of all
> hosts that are being depended upon. In Nagios 3.x, host checks are
> no longer performed immediately after each other, but at a
> retry_interval, just as services are re-checked. That means that
> theoretically all hosts that are being depended upon will have been
> checked before the dependency logic is tested and a decision to
> notify is made.

Having a retry_interval and parallel host checks sounds very, very nice!
I'm looking forward to testing the new code.

Thanks a lot, Holger

--
PGP fingerprint: F1F0 9071 8084 A426 DD59 9839 59D3 F3A1 B8B5 D3DE





This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked