--=_alternative 00316240C12574C0_=
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
Hi Bernd,
hi Andreas,
> To alleviate your issue, you should be running an ntp daemon
> on the Nagios server which slews the clock into its right
> time rather than sets it (slew =3D make it go slightly faster
> or slower until it matches the correct time). Are you running
> ntpdate via a cronjob or something?
>
> I'm not sure how one would go about debugging this, as the
> time required to run a single test is prohibitive for rapid
> repeated testing.
I already encountered that problem before and started debugging it,
so I'll just share my knowledge so far. Sadly I didn't get the time
yet to really pinpoint a solution to it and produce a patch.
I'm not that big fan of C
How to produce it:
- define a check "freaky_check" with limited check_period, let's
call it 7to11 and a check_interval of 3
- produce steady time-shifts backwards (nagios running in a VM someone?)
What happens:
1. it's 11pm, nagios schedules freaky_check for 7am according to its=20
check_period
2. Every X minutes timeshift -1 sec (jittering timesource)
3. nagios tries to compensate it and adjusts _all_ checks to the timeshift=
=20
(next_check =3D next_check - timeshift)
4. time goes by from 11pm to 6am, shifting time for - let's say - 8=20
minutes back
5. freaky_check is now scheduled for 6:52am because of the timeshifts
6. it's 6:52am and nagios tries to run the freaky_check according to the=20
schedule
7. sanity check says: ERROR: check outside check_period
8. nagios tries to compensate with a strange logic: next_check =3D=20
next_check + check_interval and just hopes it will fit
9. nagios reruns the sanity check: FATAL ERROR: check still outside=20
check_period - I have no clue what to do: rescheduling freaky_check:=20
next_check =3D next_check + 1year
10. user puzzled and nagios thinks it's all cool
Conclusion:
This behaviour turns up when the following criterias are met:
- check has a reduced check_period
- time is shifting back
- the timeshift outside the check_period is greater then 2 times the
check_interval
You can look it up in base/checks.c within the
run_scheduled_service_check(service *svc, int check_options, double=20
latency)
function for example.=20
After some basic checks this will be run:
/* attempt to run the check */
result=3Drun_async_service_check(svc,check_options,latency,TRUE,TRUE,&time_=
is_valid,&preferred_time);
which in turn ends up with:
/* is the service check viable at this time? */
if(check_service_check_viability(svc,check_options,time_is_valid,preferred_=
time)=3D=3DERROR)
return ERROR;
No, since nagios shifted it outside its check_period, the time is NOT=20
valid.
Back in run_scheduled_service_check we now enter the (if result=3D=3DERROR)=
=20
tree:
/* get current time */
time(¤t_time);
/* determine next time we should check the service if needed */
/* if service has no check interval, schedule it again for 5 minutes from=20
now */
if(current_time>=3Dpreferred_time)
=20
preferred_time=3Dcurrent_time+((svc->check_intervalcheck_i=
nterval*interval_length));
COMMENT: nagios added the check_interval to preferred_time
/* make sure we rescheduled the next service check at a valid time */
get_next_valid_time(preferred_time,&next_valid_time,svc->check_period_ptr);
COMMENT: No, it didn't do as adding check_interval was not enough to=20
compensate the backshift in time
/* the service could not be rescheduled properly - set the next check time=
=20
for next year, but don't
actually reschedule it */
if(time_is_valid=3D=3DFALSE && next_valid_time=3D=3Dpreferred_time){
COMMENT: nagios it bailing out here and just adding 1 year to=20
preferred_time to get the scheduler running again
svc->next_check=3D(time_t)(next_valid_time+(60*60*24*365));
svc->should_be_scheduled=3DFALSE;
log_debug_info(DEBUGL_CHECKS,1,"Unable to find any valid times to=20
reschedule the next service check!\n");
}
/* this service could be
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]