Re: [Nagios-devel] RFC: Downtime and flapping

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] RFC: Downtime and flapping

Post by Guest »

On 02/04/2011 11:30 AM, Jochen Bern wrote:
> On 02/03/2011 11:59 PM, Andreas Ericsson wrote:
>> On 02/03/2011 07:53 PM, Ton Voon wrote:
>>> From the code, I can see that Nagios does not record any soft
>>> non-OK states in this state history. Any objections if I add "host
>>> or service in downtime" to that exception?
>> None at all. In fact, +1 on doing so. This way, downtime makes all
>> effects of statechanges void and null
>
> Umh, not quite, I'm afraid. It means that hosts/services will emerge
> from downtime with the history they had when they entered downtime
> way-back-when - which may well be the non-OK or FLAPPING which prompted
> you to schedule urgent repairs in the first place.
>

True, but urgent repairs often cause flapping.

> It IIUC also means that during the downtime, the CGI-bins will keep
> displaying the *historic* flapping state, along with the *current*
> host/service state.
>

Perhaps, but it should clear up fairly rapidly, and if a FLAPPING_START
notification was sent out, I'd expect to get a FLAPPING_STOP one when
repairs are done, assuming that happens after downtime has ended.

If flapping starts during downtime, no flapping start notifications
will be sent out, so no flapping stop notifications will go out either.

> Downtime disables notifications anyway, and there already is logic to
> trigger actions when downtime ends (*). IMHO, the proper way to provide
> a clean slate after a downtime would be to flush (**) the entire history
> at that point.
>

Effectively lying about state history? No thanks.

> (*) Notification type "s" - BTW,
> http://nagios.sourceforge.net/docs/3_0/ ... ml#contact
> lists services-"s" in the Definition Format but not in the Directive
> Descriptions.
>
> (**) Whether the bins should be reset to OK, PENDING,
> last-before-downtime or the current post-downtime $*STATE$ (if one is
> already available) is up for discussion ...
>

Current state will always be current state. I'm not going to change
that, ever. Most of our customers regularly check the ui during repairs
to see if the service is up and running as expected. Showing anything
but the *real* current state there would be counterproductive for all
nagios users.

--
Andreas Ericsson [email protected]
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.





This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked