Re: [Nagios-devel] Nagios retries checks too soon.

Guest · Post by **Guest** » Fri Jun 10, 2011 6:15 pm

On 06/10/2011 07:48 PM, Paul M Dubuc wrote:
> Jochen Bern wrote:
>> IIRC, the actual
>> code adds check_interval/retry_interval to the variable that holds the
>> (previous) scheduled check time - i.e., the time when the previous che=
ck
>> supposedly was *started* (assuming negligible check latency).
>=20
> I was under the impression that the retry interval
> was only counted from the time the previous check completes and the
> status (which is needed to determine if a retry is necessary) is known.
> Why is the retry time determined before it's know that one is needed?

Hmmmmmm. It seems that I misremembered ... partially.

> # egrep -n 'current_time.*(check|retry)_interval' nagios-3.2.3/base/che=
cks.c
> 276: preferred_time=3Dcurrent_time+((svc->ch=
eck_intervalcheck_interval*interval_length));
> 1825: preferred_time=3Dcurrent_time+check_interval;
> 1843: preferred_time=3Dcurrent_time+check_interval;
> 2814: preferred_time=3Dcurrent_time+((hst->ch=
eck_intervalcheck_interval*interval_length));
> 3446: next_check=3D(unsigned long)(current_time+(hst->check_interval*=
interval_length));
> 3482: next_check=3D(unsigned long)(current_time+(hst-=
>check_interval*interval_length));
> 3555: next_check=3D(unsigned long)(cu=
rrent_time+(hst->retry_interval*interval_length));
> 3559: next_check=3D(unsigned long)(cu=
rrent_time+(hst->check_interval*interval_length));
> 3585: next_check=3D(unsigned long)(current_ti=
me+(hst->check_interval*interval_length));
> 3603: next_check=3D(unsigned long)(current_ti=
me+(hst->check_interval*interval_length));
> 3705: next_check=3D(unsigned long)(cu=
rrent_time+(hst->retry_interval*interval_length));
> 3709: next_check=3D(unsigned long)(cu=
rrent_time+(hst->check_interval*interval_length));
> 3879: preferred_time=3Dcurrent_time+check_interval;
> 3893: preferred_time=3Dcurrent_time+check_interval;

> # egrep -n 'last_check.*(check|retry)_interval' nagios-3.2.3/base/check=
s.c
> 1304: next_service_check=3D(time_t)(temp_service->las=
t_check+(temp_service->check_interval*interval_length));
> 1450: next_service_check=3D(time_t)(t=
emp_service->last_check+(temp_service->check_interval*interval_length));
> 1478: next_service_check=3D(time_t)(t=
emp_service->last_check+(temp_service->retry_interval*interval_length));
> 1545: next_service_check=3D(time_t)(temp_serv=
ice->last_check+(temp_service->check_interval*interval_length));

Lemme have a closer look at the latter matches ...

They cover handle_async_service_check_result(). (Since there also is a
handle_async_host_check_result_3x() *elsewhere*, we clearly have
different behaviour between host and service checks.)

1304 is the catchall for STATE_OK results.
1450 is the special case for SOFT non-OK services on non-UP hosts.
1478 is its counterpart for UP hosts.
1545 covers HARD non-OK services.

Verification (looking at the *other* matches) ...

2814 through 3893 deal with *host* checks, 276 with *synchronous*
service checks (why is there no retry_interval??), 1825 and 1843 only
check viability, not results.

All in all, I'd say that async service checks, and *only* those, behave
the way I described. Not sure whether there may or may not be a *reason*
to ... anyone?

Kind regards,
J. Bern
--=20
Jochen Bern, Systemingenieur --- LINworks GmbH
Postfach 100121, 64201 Darmstadt | Robert-Koch-Str. 9, 64331 Weiterstadt
PGP (1024D/4096g) FP =3D D18B 41B1 16C0 11BA 7F8C DCF7 E1D5 FAF4 444E 1C2=
7
Tel. +49 6151 9067-231, Zentr. -0, Fax -299 - Amtsg. Darmstadt HRB 85202
Unternehmenssitz Weiterstadt, Gesch=E4ftsf=FChrer Metin Dogan, Oliver Mic=
hel

This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]