Re: [Nagios-devel] eventhandler timeout 3.0.4
-
Guest
Re: [Nagios-devel] eventhandler timeout 3.0.4
--Apple-Mail-159-172227593
Content-Type: text/plain;
charset=US-ASCII;
format=flowed;
delsp=yes
Content-Transfer-Encoding: 7bit
Hi Sven,
On 30 Oct 2008, at 15:49, Sven Nierlein wrote:
> Ah, i had a look, but didn't found your post before.
> Solution #1 doesn't work, if i start my eventhandler
> in background, nagios still waits for the eventhandler to
> finish.
You probably have an open filehandle, which causes the parent to still
wait for the script to finish. Look at daemonising examples.
> So, in my opinion, there are 2 problems with eventhandler/
> notifications.
>
> 1. They are executed sequentially and blocking nagios while executed
> 2. nagios runs amok when the eventhandler gets into an early_timeout
> because the main process
> wants to read from the killed pipe in a never ending loop.
>
> I wrote a small patch for the second issue, maybe someone with more
> c skills wants to have a look...
We saw a similar issue for notifications - thanks for your patch. I've
made a few modifications, attached. I don't know how you got it to
work because in your if statement, bytes_read would never be -1.
I've put the whole "reading output" section as an else, so we ignore
processing of the output if a timeout is considered.
I've also updated the debugging so that it prints the output
regardless of whether the calling function wants it or not, which
should help debugging.
BTW, if a SIGHUP or a SIGINT signal is sent while Nagios has a set of
notification commands for a list of contacts because of a host/service
alert, the signal is not processed until all those notification
commands have been completed. There probably should be a break
somewhere in that list. Another reason to make sure your notification
commands are quick!
Ton
--Apple-Mail-159-172227593
Content-Disposition: attachment;
filename=nagios_stop_notification_timeouts_spinning.patch
Content-Type: application/octet-stream; x-unix-mode=0644;
name="nagios_stop_notification_timeouts_spinning.patch"
Content-Transfer-Encoding: 7bit
diff -ur nagios-3.0.4/base/utils.c nagios-3.0.4.with_notifications_timeout/base/utils.c
--- nagios-3.0.4/base/utils.c 2008-10-15 18:43:55.000000000 +0100
+++ nagios-3.0.4.with_notifications_timeout/base/utils.c 2008-10-31 07:40:30.000000000 +0000
@@ -575,12 +575,30 @@
if(result3)
result=STATE_UNKNOWN;
- /* initialize output */
- strcpy(buffer,"");
-
/* initialize dynamic buffer */
dbuf_init(&output_dbuf,dbuf_chunk);
+ /* Opsera patch to check timeout before attempting to read output via pipe. Originally by Sven Nierlein */
+ /* Removed bytes_read from if statement below as not applicable */
+ /* if there was a critical return code AND the command time exceeded the timeout thresholds, assume a timeout */
+ if(result==STATE_CRITICAL && (end_time.tv_sec-start_time.tv_sec)>=timeout){
+
+ /* set the early timeout flag */
+ *early_timeout=TRUE;
+
+ /* try to kill the command that timed out by sending termination signal to child process group */
+ kill((pid_t)(-pid),SIGTERM);
+ sleep(1);
+ kill((pid_t)(-pid),SIGKILL);
+ }
+
+ /* Create if ! timeout block here, so read output if timeout has not occurred */
+ /* Indentation not fixed to allow easier patching */
+ else {
+
+ /* initialize output */
+ strcpy(buffer,"");
+
/* try and read the results from the command output (retry if we encountered a signal) */
do{
bytes_read=read(fd[0],buffer,sizeof(buffer)-1);
@@ -613,28 +631,18 @@
if(output!=NULL && output_dbuf.buf)
*output=(char *)strdup(output_dbuf.buf);
- /* free memory */
- dbuf_free(&output_dbuf);
-
- /* if there was a critical return code and no output AND the command time exceeded the timeout thresholds, assume a timeout */
- if(result==STATE_CRITICAL && bytes_read==-1 && (end_time.tv_sec-start_time.tv_sec)>=timeout){
-
- /* set the early timeout flag */
- *early_timeout=TRUE;
-
- /* try to kill the command that timed out by sending termination signal to child process group */
- kill((pid_t)(-pid),SIGTERM);
- sleep(1);
- kill((pid_t)(-pid),SIGKILL);
- }
+ } /
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]