[Nagios-devel] Problem with initial service scheduling (2.0b3)
Posted: Thu Jun 09, 2005 4:58 am
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
------_=_NextPart_000_01C56CF2.8BB066AA
Content-Type: multipart/alternative;
boundary="----_=_NextPart_001_01C56CF2.8BB066AA"
------_=_NextPart_001_01C56CF2.8BB066AA
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: Quoted-Printable
Hi all,
I currently have a configuration with 4800 services : 600 active and 4200
passive. And, as the number was growing, I noticed a problem in the way
nagios scheduled their initial check time : With the 2.0b3 original code,
with max_service_check_spread=3D30, when I look at the scheduling queue jus=
t
after start, I see that the last service checks are scheduled to run in 4
hours !
This delay corresponds to :
max_service_check_spread * (total_services / total_scheduled_services)
And should be equal to max_service_check_spread.
I found the reason in event.c/init_timing_loop() and I am including a chang=
e
which appears to correct the problem but, as I am not sure to fully
understand the 'interleave_block' logic, this change should be taken with
care :
The reason : in the 'schedule service checks' section of init_timing_loop()=
,
next check time is incremented for each service, and not for each SCHEDULED=
service. So, in my case it is incremented 'total_services' times and the
last check time is equal to :
Current_time + total_services * service_inter_check_delay
Where it should be :
Current_time + total_scheduled_services * service_inter_check_delay
Which is coherent with the way service_inter_check_delay is computed.
My change consists of taking the 'should_be_scheduled' check out of the
inner loop, and add a line in order to have the code enter the inner
'interleave_block' loop only for active checks. This way
current_interleave_block goes from 0 to total_schedules_services instead of=
going up to total_services.
Once again, the patch I am submitting seems to correct the problem in MY
case. But I don't know if it is correct when interleave variables have some=
different values.
Regards
Fran=E7ois
------_=_NextPart_001_01C56CF2.8BB066AA
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: Quoted-Printable
Problem with initial service scheduling (2.0b3)
Hi all,
I currently have a configuration with 4800 services : 600=
active and 4200 passive. And, as the number was growing, I noticed a probl=
em in the way nagios scheduled their initial check time : With the 2.0b3 or=
iginal code, with max_service_check_spread=3D30, when I look at the schedul=
ing queue just after start, I see that the last service checks are schedule=
d to run in 4 hours !
This delay corresponds to :
max_service_check_spread * (total_services / total_schedu=
led_services)
And should be equal to max_service_check_spread.
I found the reason in event.c/init_timing_loop() and I am=
including a change which appears to correct the problem but, as I am not s=
ure to fully understand the 'interleave_block' logic, this change should be=
taken with care :
The reason : in the 'schedule service checks' section of =
init_timing_loop(), next check time is incremented for each service, and no=
t for each SCHEDULED service. So, in my case it is incremented 'total_servi=
ces' times and the last check time is equal to :
Current_time + total_services * service_inter_check_delay=
Where it should be :
Current_time + total_scheduled_services * service_inter_c=
heck_delay
Wh
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
this format, some or all of this message may not be legible.
------_=_NextPart_000_01C56CF2.8BB066AA
Content-Type: multipart/alternative;
boundary="----_=_NextPart_001_01C56CF2.8BB066AA"
------_=_NextPart_001_01C56CF2.8BB066AA
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: Quoted-Printable
Hi all,
I currently have a configuration with 4800 services : 600 active and 4200
passive. And, as the number was growing, I noticed a problem in the way
nagios scheduled their initial check time : With the 2.0b3 original code,
with max_service_check_spread=3D30, when I look at the scheduling queue jus=
t
after start, I see that the last service checks are scheduled to run in 4
hours !
This delay corresponds to :
max_service_check_spread * (total_services / total_scheduled_services)
And should be equal to max_service_check_spread.
I found the reason in event.c/init_timing_loop() and I am including a chang=
e
which appears to correct the problem but, as I am not sure to fully
understand the 'interleave_block' logic, this change should be taken with
care :
The reason : in the 'schedule service checks' section of init_timing_loop()=
,
next check time is incremented for each service, and not for each SCHEDULED=
service. So, in my case it is incremented 'total_services' times and the
last check time is equal to :
Current_time + total_services * service_inter_check_delay
Where it should be :
Current_time + total_scheduled_services * service_inter_check_delay
Which is coherent with the way service_inter_check_delay is computed.
My change consists of taking the 'should_be_scheduled' check out of the
inner loop, and add a line in order to have the code enter the inner
'interleave_block' loop only for active checks. This way
current_interleave_block goes from 0 to total_schedules_services instead of=
going up to total_services.
Once again, the patch I am submitting seems to correct the problem in MY
case. But I don't know if it is correct when interleave variables have some=
different values.
Regards
Fran=E7ois
------_=_NextPart_001_01C56CF2.8BB066AA
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: Quoted-Printable
Problem with initial service scheduling (2.0b3)
Hi all,
I currently have a configuration with 4800 services : 600=
active and 4200 passive. And, as the number was growing, I noticed a probl=
em in the way nagios scheduled their initial check time : With the 2.0b3 or=
iginal code, with max_service_check_spread=3D30, when I look at the schedul=
ing queue just after start, I see that the last service checks are schedule=
d to run in 4 hours !
This delay corresponds to :
max_service_check_spread * (total_services / total_schedu=
led_services)
And should be equal to max_service_check_spread.
I found the reason in event.c/init_timing_loop() and I am=
including a change which appears to correct the problem but, as I am not s=
ure to fully understand the 'interleave_block' logic, this change should be=
taken with care :
The reason : in the 'schedule service checks' section of =
init_timing_loop(), next check time is incremented for each service, and no=
t for each SCHEDULED service. So, in my case it is incremented 'total_servi=
ces' times and the last check time is equal to :
Current_time + total_services * service_inter_check_delay=
Where it should be :
Current_time + total_scheduled_services * service_inter_c=
heck_delay
Wh
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]