[Nagios-devel] Bug with rescheduling a check after a failed fork

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

[Nagios-devel] Bug with rescheduling a check after a failed fork

Post by Guest »

I noticed a problem with the behavior of Nagios when the first fork
for a check fails. While there is code above that appears to decide
on a 'prefered_time' for the next check, it appears that there are
cases where one hasn't been picked by the time a fork failure is being
acted on. As a result the check is rescheduled with a prefered_time of
'0' and is therefore put at the head of the queue so the very next time
through the queue (milliseconds later) it is tried again. Thus adding
to the load problem that is already causing probs for the host.
I added this check and reporting and have seen it do it's bit a number
of times since. (Note to Josh Larsen - this may be why you are achieving
such Herculean load averages. You may want to look for a large quantity
of 'could not be performed due to a fork()error' in your nagios.log matching
the offending times).



Dave !


---------------------------------------------------------------------------
*** 472,481 ****
--- 474,489 ----
snprintf(temp_buffer,sizeof(temp_buffer),"Warning: The check
of service '%s' on host '%s' could not be performed due to a fork() error.
The check will be rescheduled.\n",svc_msg.description,svc_msg.host_name);
temp_buffer[sizeof(temp_buffer)-1]='\x0';
write_to_logs_and_console(temp_buffer,NSLOG_RUNTIME_WARNING,TRU
E);

/* make sure we rescheduled the next service check at a valid
time */
+ if (preferred_time == 0L) {
+ snprintf(temp_buffer,sizeof(temp_buffer),"Resheduling
failed fork() but preferred_time == 0L - fixing\n");
+ temp_buffer[sizeof(temp_buffer)-1]='\x0';
+ write_to_logs_and_console(temp_buffer,NSLOG_RUNTIME_WARNING
,TRUE);
+ preferred_time=current_time + 10;
+ }
get_next_valid_time(preferred_time,&next_valid_time,svc->
check_period);

/* the service could not be rescheduled properly - set the
next check time for next year, but don't actually reschedule it */
if(time_is_valid==FALSE && next_valid_time==preferred_time){

***************
-------------------------------------------------------------------------------
------------






This post was automatically imported from historical nagios-devel mailing list archives
Original poster: David@pobox.net.au
Locked