Page 1 of 1

[Nagios-devel] Nagios 3.1.1 eats cpu like mad

Posted: Tue Jun 23, 2009 3:29 pm
by Guest
There's a bug in Nagios 3.1.1, making it eat all available CPU even
with a very small configuration (5 hosts, 12 service checks).

I sort of introduced it, as I didn't fully test the impact of a patch
sent in before accepting it. Mea culpa, so I'll make sure to fix it.

For some reason, the patch shown inline below makes Nagios consume
100% CPU on my system. I don't know the reason for this, but I'll
investigate it and see how it can be fixed. I *think* it happens
because Nagios sees that "current_time" is valid and therefore
returns precisely that from get_next_valid_time(), which means it
pushes all the scheduled checks in front of it until enough time
has passed since the check was last *run* before actually executing
it. Obviously, that sucks major donkeyballs, so we really shouldn't
do that. I'll need to check that up a bit more closely before I can
say with 100% certainty that that's what's happening though.

-8
Date: Fri May 22 01:38:28 2009 +0000

Fix service rescheduling on clock skew/timeperiod change

This patch ensures that services and hosts are never scheduled one
year into the future and set to never be rescheduled again.

Previously, this could happen if the next preferred time happened
to already be valid, but stops being so because of clock skew or
someone changing the timeperiod definition between two Nagios
restarts while retaining scheduling info.

Patch-sent-by: Ricardo Maraschini
Signed-off-by: Andreas Ericsson

diff --git a/base/checks.c b/base/checks.c
index 9d5c497..ef50a20 100644
--- a/base/checks.c
+++ b/base/checks.c
@@ -277,7 +277,7 @@ int run_scheduled_service_check(service *svc, int check_options, double latency)
preferred_time=current_time+((svc->check_intervalcheck_interval*interval_length));

/* make sure we rescheduled the next service check at a valid time */
- get_next_valid_time(preferred_time,&next_valid_time,svc->check_period_ptr);
+ get_next_valid_time(current_time,&next_valid_time,svc->check_period_ptr);

/* the service could not be rescheduled properly - set the next check time for next year, but don't actually reschedule it */
if(time_is_valid==FALSE && next_valid_time==preferred_time){
@@ -2792,7 +2792,7 @@ int run_scheduled_host_check_3x(host *hst, int check_options, double latency){
preferred_time=current_time+((hst->check_intervalcheck_interval*interval_length));

/* make sure we rescheduled the next host check at a valid time */
- get_next_valid_time(preferred_time,&next_valid_time,hst->check_period_ptr);
+ get_next_valid_time(current_time,&next_valid_time,hst->check_period_ptr);

/* the host could not be rescheduled properly - set the next check time for next year, but don't actually reschedule it */
if(time_is_valid==FALSE && next_valid_time==preferred_time){
-8<--8<--8<-


--
Andreas Ericsson [email protected]
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.





This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]