Re: [Nagios-devel] fun with (silent) change from HARD to SOFT state
Posted: Fri Jan 23, 2009 4:22 pm
Michal Svoboda wrote:
> Hello,
>
> I've discovered a weird behavior, which can be replicated thus:
>
> 1. Let a service be configured for max attempts N before going to HARD
> non-ok state
>
> 2. Make the service fail and wait for N checks to pass (ie. until the
> service enters N/N HARD non-ok state); at this point notifications
> are sent, etc.
>
> 3. Change the configuration of the service to have M > N max attempts
> and restart nagios
>
> 4. Now the state of the service is N/M _HARD_ non-ok
>
> 5. If the N+1th check results in non-ok, then the service state goes to
> N+1/M _SOFT_
>
> 6. If some future check results in ok, then the service performs a SOFT
> recovery; this results at least in no recovery notifications
>
> 6a. if the condition in (5) does not occur, ie. the N+1th check results
> immediately in ok, the service still performs a SOFT recovery from
> an apparently HARD state (even according to the logs)
>
> Now, one way to look at this behavior is that it is logical, because
> I've fiddled with the config, and I can expect anomalies and blah blah.
>
> Another way to look at it is that there have been notifications sent in
> step (2), yet there are no recovery notifications; in other words, once
> the sirens have been sounded (and the fire brigade is on the way, and
> the president is being woken up), they should be also properly shut off.
>
> So the question is, whether or not introduce a patch that prevents
> entering a SOFT state once a service (or a host) is already in a HARD
> non-ok state?
>
>
> With regards,
> Michal Svoboda
Nice catch. I just added some code that will readjust current check
attempt at startup if the host/service was in a hard problem state.
That will accommodate config changes related to max check attempts that
are made before (re)start.
- Ethan Galstad
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
> Hello,
>
> I've discovered a weird behavior, which can be replicated thus:
>
> 1. Let a service be configured for max attempts N before going to HARD
> non-ok state
>
> 2. Make the service fail and wait for N checks to pass (ie. until the
> service enters N/N HARD non-ok state); at this point notifications
> are sent, etc.
>
> 3. Change the configuration of the service to have M > N max attempts
> and restart nagios
>
> 4. Now the state of the service is N/M _HARD_ non-ok
>
> 5. If the N+1th check results in non-ok, then the service state goes to
> N+1/M _SOFT_
>
> 6. If some future check results in ok, then the service performs a SOFT
> recovery; this results at least in no recovery notifications
>
> 6a. if the condition in (5) does not occur, ie. the N+1th check results
> immediately in ok, the service still performs a SOFT recovery from
> an apparently HARD state (even according to the logs)
>
> Now, one way to look at this behavior is that it is logical, because
> I've fiddled with the config, and I can expect anomalies and blah blah.
>
> Another way to look at it is that there have been notifications sent in
> step (2), yet there are no recovery notifications; in other words, once
> the sirens have been sounded (and the fire brigade is on the way, and
> the president is being woken up), they should be also properly shut off.
>
> So the question is, whether or not introduce a patch that prevents
> entering a SOFT state once a service (or a host) is already in a HARD
> non-ok state?
>
>
> With regards,
> Michal Svoboda
Nice catch. I just added some code that will readjust current check
attempt at startup if the host/service was in a hard problem state.
That will accommodate config changes related to max check attempts that
are made before (re)start.
- Ethan Galstad
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]