[Nagios-devel] FW: Problem with initial service scheduling (2.0b3)
Posted: Mon Jul 04, 2005 1:04 am
This is a multi-part message in MIME format.
------------7NhE3WuB9OnU6N9D2A0R5Y7
Content-Type: multipart/alternative;
boundary="----_=_NextPart_001_01C58076.B3231326"
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
------_=_NextPart_001_01C58076.B3231326
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: Quoted-Printable
Sorry for posting this message again but I cannot modify my production
environment before having an opinion from somebody who understands the
'interleave_block' stuff.
Thanks in advance
> -----Original Message-----
> From: Laupretre, Fran=E7ois (CALYON)=20
> Sent: Thursday, June 09, 2005 2:56 PM
> To: [email protected]
> Subject: Problem with initial service scheduling (2.0b3)
>=20
>=20
> Hi all,
>=20
> I currently have a configuration with 4800 services : 600=20
> active and 4200 passive. And, as the number was growing, I=20
> noticed a problem in the way nagios scheduled their initial=20
> check time : With the 2.0b3 original code, with=20
> max_service_check_spread=3D30, when I look at the scheduling=20
> queue just after start, I see that the last service checks=20
> are scheduled to run in 4 hours !
>=20
> This delay corresponds to :
>=20
> max_service_check_spread * (total_services / total_scheduled_services)
>=20
> And should be equal to max_service_check_spread.
>=20
> I found the reason in event.c/init_timing_loop() and I am=20
> including a change which appears to correct the problem but,=20
> as I am not sure to fully understand the 'interleave_block'=20
> logic, this change should be taken with care :
>=20
> The reason : in the 'schedule service checks' section of=20
> init_timing_loop(), next check time is incremented for each=20
> service, and not for each SCHEDULED service. So, in my case=20
> it is incremented 'total_services' times and the last check=20
> time is equal to :
>=20
> Current_time + total_services * service_inter_check_delay
>=20
> Where it should be :
>=20
> Current_time + total_scheduled_services * service_inter_check_delay
>=20
> Which is coherent with the way service_inter_check_delay is computed.
>=20
> My change consists of taking the 'should_be_scheduled' check=20
> out of the inner loop, and add a line in order to have the=20
> code enter the inner 'interleave_block' loop only for active=20
> checks. This way current_interleave_block goes from 0 to=20
> total_schedules_services instead of going up to total_services.
>=20
> Once again, the patch I am submitting seems to correct the=20
> problem in MY case. But I don't know if it is correct when=20
> interleave variables have some different values.
>=20
> Regards
>=20
> Fran=E7ois
>=20
------_=_NextPart_001_01C58076.B3231326
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: Quoted-Printable
FW: Problem with initial service scheduling (2.0b3)
Sorry for posting this message again but I cannot modify =
my production environment before having an opinion from somebody who unders=
tands the 'interleave_block' stuff.
Thanks in advance
> -----Original Message-----
> From: Laupretre, Fran=E7ois (CALYON)
> Sent: Thursday, June 09, 2005 2:56 PM
> To: [email protected]
> Subject: Problem with initial service scheduling (2=
.0b3)
>
>
> Hi all,
>
> I currently have a configuration with 4800 services=
: 600
> active and 4200 passive. And, as
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
------------7NhE3WuB9OnU6N9D2A0R5Y7
Content-Type: multipart/alternative;
boundary="----_=_NextPart_001_01C58076.B3231326"
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
------_=_NextPart_001_01C58076.B3231326
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: Quoted-Printable
Sorry for posting this message again but I cannot modify my production
environment before having an opinion from somebody who understands the
'interleave_block' stuff.
Thanks in advance
> -----Original Message-----
> From: Laupretre, Fran=E7ois (CALYON)=20
> Sent: Thursday, June 09, 2005 2:56 PM
> To: [email protected]
> Subject: Problem with initial service scheduling (2.0b3)
>=20
>=20
> Hi all,
>=20
> I currently have a configuration with 4800 services : 600=20
> active and 4200 passive. And, as the number was growing, I=20
> noticed a problem in the way nagios scheduled their initial=20
> check time : With the 2.0b3 original code, with=20
> max_service_check_spread=3D30, when I look at the scheduling=20
> queue just after start, I see that the last service checks=20
> are scheduled to run in 4 hours !
>=20
> This delay corresponds to :
>=20
> max_service_check_spread * (total_services / total_scheduled_services)
>=20
> And should be equal to max_service_check_spread.
>=20
> I found the reason in event.c/init_timing_loop() and I am=20
> including a change which appears to correct the problem but,=20
> as I am not sure to fully understand the 'interleave_block'=20
> logic, this change should be taken with care :
>=20
> The reason : in the 'schedule service checks' section of=20
> init_timing_loop(), next check time is incremented for each=20
> service, and not for each SCHEDULED service. So, in my case=20
> it is incremented 'total_services' times and the last check=20
> time is equal to :
>=20
> Current_time + total_services * service_inter_check_delay
>=20
> Where it should be :
>=20
> Current_time + total_scheduled_services * service_inter_check_delay
>=20
> Which is coherent with the way service_inter_check_delay is computed.
>=20
> My change consists of taking the 'should_be_scheduled' check=20
> out of the inner loop, and add a line in order to have the=20
> code enter the inner 'interleave_block' loop only for active=20
> checks. This way current_interleave_block goes from 0 to=20
> total_schedules_services instead of going up to total_services.
>=20
> Once again, the patch I am submitting seems to correct the=20
> problem in MY case. But I don't know if it is correct when=20
> interleave variables have some different values.
>=20
> Regards
>=20
> Fran=E7ois
>=20
------_=_NextPart_001_01C58076.B3231326
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: Quoted-Printable
FW: Problem with initial service scheduling (2.0b3)
Sorry for posting this message again but I cannot modify =
my production environment before having an opinion from somebody who unders=
tands the 'interleave_block' stuff.
Thanks in advance
> -----Original Message-----
> From: Laupretre, Fran=E7ois (CALYON)
> Sent: Thursday, June 09, 2005 2:56 PM
> To: [email protected]
> Subject: Problem with initial service scheduling (2=
.0b3)
>
>
> Hi all,
>
> I currently have a configuration with 4800 services=
: 600
> active and 4200 passive. And, as
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]