Re: [Nagios-devel] Nagios retries checks too soon.

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] Nagios retries checks too soon.

Post by Guest »

On 06/10/2011 07:48 PM, Paul M Dubuc wrote:
> Jochen Bern wrote:
>> IIRC, the actual
>> code adds check_interval/retry_interval to the variable that holds the
>> (previous) scheduled check time - i.e., the time when the previous che=
ck
>> supposedly was *started* (assuming negligible check latency).
>=20
> I was under the impression that the retry interval
> was only counted from the time the previous check completes and the
> status (which is needed to determine if a retry is necessary) is known.
> Why is the retry time determined before it's know that one is needed?

Hmmmmmm. It seems that I misremembered ... partially.

> # egrep -n 'current_time.*(check|retry)_interval' nagios-3.2.3/base/che=
cks.c
> 276: preferred_time=3Dcurrent_time+((svc->ch=
eck_intervalcheck_interval*interval_length));
> 1825: preferred_time=3Dcurrent_time+check_interval;
> 1843: preferred_time=3Dcurrent_time+check_interval;
> 2814: preferred_time=3Dcurrent_time+((hst->ch=
eck_intervalcheck_interval*interval_length));
> 3446: next_check=3D(unsigned long)(current_time+(hst->check_interval*=
interval_length));
> 3482: next_check=3D(unsigned long)(current_time+(hst-=
>check_interval*interval_length));
> 3555: next_check=3D(unsigned long)(cu=
rrent_time+(hst->retry_interval*interval_length));
> 3559: next_check=3D(unsigned long)(cu=
rrent_time+(hst->check_interval*interval_length));
> 3585: next_check=3D(unsigned long)(current_ti=
me+(hst->check_interval*interval_length));
> 3603: next_check=3D(unsigned long)(current_ti=
me+(hst->check_interval*interval_length));
> 3705: next_check=3D(unsigned long)(cu=
rrent_time+(hst->retry_interval*interval_length));
> 3709: next_check=3D(unsigned long)(cu=
rrent_time+(hst->check_interval*interval_length));
> 3879: preferred_time=3Dcurrent_time+check_interval;
> 3893: preferred_time=3Dcurrent_time+check_interval;


> # egrep -n 'last_check.*(check|retry)_interval' nagios-3.2.3/base/check=
s.c
> 1304: next_service_check=3D(time_t)(temp_service->las=
t_check+(temp_service->check_interval*interval_length));
> 1450: next_service_check=3D(time_t)(t=
emp_service->last_check+(temp_service->check_interval*interval_length));
> 1478: next_service_check=3D(time_t)(t=
emp_service->last_check+(temp_service->retry_interval*interval_length));
> 1545: next_service_check=3D(time_t)(temp_serv=
ice->last_check+(temp_service->check_interval*interval_length));

Lemme have a closer look at the latter matches ...

They cover handle_async_service_check_result(). (Since there also is a
handle_async_host_check_result_3x() *elsewhere*, we clearly have
different behaviour between host and service checks.)

1304 is the catchall for STATE_OK results.
1450 is the special case for SOFT non-OK services on non-UP hosts.
1478 is its counterpart for UP hosts.
1545 covers HARD non-OK services.

Verification (looking at the *other* matches) ...

2814 through 3893 deal with *host* checks, 276 with *synchronous*
service checks (why is there no retry_interval??), 1825 and 1843 only
check viability, not results.

All in all, I'd say that async service checks, and *only* those, behave
the way I described. Not sure whether there may or may not be a *reason*
to ... anyone?

Kind regards,
J. Bern
--=20
Jochen Bern, Systemingenieur --- LINworks GmbH
Postfach 100121, 64201 Darmstadt | Robert-Koch-Str. 9, 64331 Weiterstadt
PGP (1024D/4096g) FP =3D D18B 41B1 16C0 11BA 7F8C DCF7 E1D5 FAF4 444E 1C2=
7
Tel. +49 6151 9067-231, Zentr. -0, Fax -299 - Amtsg. Darmstadt HRB 85202
Unternehmenssitz Weiterstadt, Gesch=E4ftsf=FChrer Metin Dogan, Oliver Mic=
hel





This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked