Re: [Nagios-devel] FW: Problem with initial service scheduling (2.0b3)
Posted: Mon Jul 04, 2005 1:43 am
François Laupretre wrote:
> Sorry for posting this message again but I cannot modify my production
> environment before having an opinion from somebody who understands the
> 'interleave_block' stuff.
>
I think Ethan's the only one who really does.
Here's the doco for it though.
http://nagios.sourceforge.net/docs/2_0/ ... terleaving
As for the algorithm, I believe
max_service_check_spread * (total_active_services /
total_scheduled_services)
is more proper.
> Thanks in advance
>
>
>>-----Original Message-----
>>From: Laupretre, François (CALYON)
>>Sent: Thursday, June 09, 2005 2:56 PM
>>To: [email protected]
>>Subject: Problem with initial service scheduling (2.0b3)
>>
>>
>>Hi all,
>>
>>I currently have a configuration with 4800 services : 600
>>active and 4200 passive. And, as the number was growing, I
>>noticed a problem in the way nagios scheduled their initial
>>check time : With the 2.0b3 original code, with
>>max_service_check_spread=30, when I look at the scheduling
>>queue just after start, I see that the last service checks
>>are scheduled to run in 4 hours !
>>
>>This delay corresponds to :
>>
>>max_service_check_spread * (total_services / total_scheduled_services)
>>
>>And should be equal to max_service_check_spread.
>>
>>I found the reason in event.c/init_timing_loop() and I am
>>including a change which appears to correct the problem but,
>>as I am not sure to fully understand the 'interleave_block'
>>logic, this change should be taken with care :
>>
>>The reason : in the 'schedule service checks' section of
>>init_timing_loop(), next check time is incremented for each
>>service, and not for each SCHEDULED service. So, in my case
>>it is incremented 'total_services' times and the last check
>>time is equal to :
>>
>>Current_time + total_services * service_inter_check_delay
>>
>>Where it should be :
>>
>>Current_time + total_scheduled_services * service_inter_check_delay
>>
>>Which is coherent with the way service_inter_check_delay is computed.
>>
>>My change consists of taking the 'should_be_scheduled' check
>>out of the inner loop, and add a line in order to have the
>>code enter the inner 'interleave_block' loop only for active
>>checks. This way current_interleave_block goes from 0 to
>>total_schedules_services instead of going up to total_services.
>>
>>Once again, the patch I am submitting seems to correct the
>>problem in MY case. But I don't know if it is correct when
>>interleave variables have some different values.
>>
>>Regards
>>
>>François
>>
>
>
>
> ------------------------------------------------------------------------
>
> Ce message et ses pièces jointes (le "message") est destiné à l'usage
> exclusif de son destinataire.
> Si vous recevez ce message par erreur, merci d'en aviser immédiatement
> l'expéditeur et de le détruire ensuite. Le présent message pouvant
> être altéré à notre insu, CALYON Corporate and Investment Bank
> ne peut pas être engagé par son contenu. Tous droits réservés.
>
> This message and/or any attachments (the "message") is intended for
> the sole use of its addressee.
> If you are not the addressee, please immediately notify the sender and
> then destroy the message. As this message and/or any attachments may
> have been altered without our knowledge, its content is not legally
> binding on CALYON Corporate and Investment Bank. All rights reserved.
--
Andreas Ericsson [email protected]
OP5 AB www.op5.se
Lead Developer
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
> Sorry for posting this message again but I cannot modify my production
> environment before having an opinion from somebody who understands the
> 'interleave_block' stuff.
>
I think Ethan's the only one who really does.
Here's the doco for it though.
http://nagios.sourceforge.net/docs/2_0/ ... terleaving
As for the algorithm, I believe
max_service_check_spread * (total_active_services /
total_scheduled_services)
is more proper.
> Thanks in advance
>
>
>>-----Original Message-----
>>From: Laupretre, François (CALYON)
>>Sent: Thursday, June 09, 2005 2:56 PM
>>To: [email protected]
>>Subject: Problem with initial service scheduling (2.0b3)
>>
>>
>>Hi all,
>>
>>I currently have a configuration with 4800 services : 600
>>active and 4200 passive. And, as the number was growing, I
>>noticed a problem in the way nagios scheduled their initial
>>check time : With the 2.0b3 original code, with
>>max_service_check_spread=30, when I look at the scheduling
>>queue just after start, I see that the last service checks
>>are scheduled to run in 4 hours !
>>
>>This delay corresponds to :
>>
>>max_service_check_spread * (total_services / total_scheduled_services)
>>
>>And should be equal to max_service_check_spread.
>>
>>I found the reason in event.c/init_timing_loop() and I am
>>including a change which appears to correct the problem but,
>>as I am not sure to fully understand the 'interleave_block'
>>logic, this change should be taken with care :
>>
>>The reason : in the 'schedule service checks' section of
>>init_timing_loop(), next check time is incremented for each
>>service, and not for each SCHEDULED service. So, in my case
>>it is incremented 'total_services' times and the last check
>>time is equal to :
>>
>>Current_time + total_services * service_inter_check_delay
>>
>>Where it should be :
>>
>>Current_time + total_scheduled_services * service_inter_check_delay
>>
>>Which is coherent with the way service_inter_check_delay is computed.
>>
>>My change consists of taking the 'should_be_scheduled' check
>>out of the inner loop, and add a line in order to have the
>>code enter the inner 'interleave_block' loop only for active
>>checks. This way current_interleave_block goes from 0 to
>>total_schedules_services instead of going up to total_services.
>>
>>Once again, the patch I am submitting seems to correct the
>>problem in MY case. But I don't know if it is correct when
>>interleave variables have some different values.
>>
>>Regards
>>
>>François
>>
>
>
>
> ------------------------------------------------------------------------
>
> Ce message et ses pièces jointes (le "message") est destiné à l'usage
> exclusif de son destinataire.
> Si vous recevez ce message par erreur, merci d'en aviser immédiatement
> l'expéditeur et de le détruire ensuite. Le présent message pouvant
> être altéré à notre insu, CALYON Corporate and Investment Bank
> ne peut pas être engagé par son contenu. Tous droits réservés.
>
> This message and/or any attachments (the "message") is intended for
> the sole use of its addressee.
> If you are not the addressee, please immediately notify the sender and
> then destroy the message. As this message and/or any attachments may
> have been altered without our knowledge, its content is not legally
> binding on CALYON Corporate and Investment Bank. All rights reserved.
--
Andreas Ericsson [email protected]
OP5 AB www.op5.se
Lead Developer
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]