Re: [Nagios-devel] Nagios 3.1.1 eats cpu like mad

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] Nagios 3.1.1 eats cpu like mad

Post by Guest »



On 6/23/2009 2:52 PM, Ethan Galstad wrote:
> Patch is in CVS now. Can someone who was experience scheduling problems
> with the 3.0.6 release test the latest 3.1.2 release? If the problem
> still persists, its likely in one of the following functions in
> base/utils.c:
>
> check_time_against_period()
> get_next_valid_time()
>

This solved the 2010 random schedule of services bug, now this
will happen again. Off course, the 100% CPU is not a trace off to solve
the bug.

.

> These functions are more complicated now with the new timeperiod
> exceptions and date formats, so a bug could likely exist here.
>
> - Ethan Galstad
>
>
> Andreas Ericsson wrote:
>
>> There's a bug in Nagios 3.1.1, making it eat all available CPU even
>> with a very small configuration (5 hosts, 12 service checks).
>>
>> I sort of introduced it, as I didn't fully test the impact of a patch
>> sent in before accepting it. Mea culpa, so I'll make sure to fix it.
>>
>> For some reason, the patch shown inline below makes Nagios consume
>> 100% CPU on my system. I don't know the reason for this, but I'll
>> investigate it and see how it can be fixed. I *think* it happens
>> because Nagios sees that "current_time" is valid and therefore
>> returns precisely that from get_next_valid_time(), which means it
>> pushes all the scheduled checks in front of it until enough time
>> has passed since the check was last *run* before actually executing
>> it. Obviously, that sucks major donkeyballs, so we really shouldn't
>> do that. I'll need to check that up a bit more closely before I can
>> say with 100% certainty that that's what's happening though.
>>
>> -8> commit 523e8c516df323a0bafe98ecb9222384fde62d6e
>> Author: Andreas Ericsson
>> Date: Fri May 22 01:38:28 2009 +0000
>>
>> Fix service rescheduling on clock skew/timeperiod change
>>
>> This patch ensures that services and hosts are never scheduled one
>> year into the future and set to never be rescheduled again.
>>
>> Previously, this could happen if the next preferred time happened
>> to already be valid, but stops being so because of clock skew or
>> someone changing the timeperiod definition between two Nagios
>> restarts while retaining scheduling info.
>>
>> Patch-sent-by: Ricardo Maraschini
>> Signed-off-by: Andreas Ericsson
>>
>> diff --git a/base/checks.c b/base/checks.c
>> index 9d5c497..ef50a20 100644
>> --- a/base/checks.c
>> +++ b/base/checks.c
>> @@ -277,7 +277,7 @@ int run_scheduled_service_check(service *svc, int check_options, double latency)
>> preferred_time=current_time+((svc->check_intervalcheck_interval*interval_length));
>>
>> /* make sure we rescheduled the next service check at a valid time */
>> - get_next_valid_time(preferred_time,&next_valid_time,svc->check_period_ptr);
>> + get_next_valid_time(current_time,&next_valid_time,svc->check_period_ptr);
>>
>> /* the service could not be rescheduled properly - set the next check time for next year, but don't actually reschedule it */
>> if(time_is_valid==FALSE&& next_valid_time==preferred_time){
>> @@ -2792,7 +2792,7 @@ int run_scheduled_host_check_3x(host *hst, int check_options, double latency){
>> preferred_time=current_time+((hst->check_intervalcheck_interval*interval_length));
>>
>> /* make sure we rescheduled the next host check at a valid time */
>> - get_next_valid_time(preferred_time,&next_valid_time,hst->check_period_ptr);
>> + get_next_valid_time(current_time,&next_valid_time,hst->check_period_ptr);
>>
>> /* the host could not be rescheduled properly - set the next check time for next year, but don't actually reschedule it */
>> if(time_is_valid==FALSE&& next_valid_time==preferred_time){
>> -8>
>>
>>
> ------------------------------------------------------------------------------
> _______________________________________________
> Nagios-devel mailing list
> [email protected]
> https://lists.s

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked