Re: [Nagios-devel] eventhandler timeout 3.0.4

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] eventhandler timeout 3.0.4

Post by Guest »


--Apple-Mail-159-172227593
Content-Type: text/plain;
charset=US-ASCII;
format=flowed;
delsp=yes
Content-Transfer-Encoding: 7bit

Hi Sven,

On 30 Oct 2008, at 15:49, Sven Nierlein wrote:

> Ah, i had a look, but didn't found your post before.
> Solution #1 doesn't work, if i start my eventhandler
> in background, nagios still waits for the eventhandler to
> finish.

You probably have an open filehandle, which causes the parent to still
wait for the script to finish. Look at daemonising examples.

> So, in my opinion, there are 2 problems with eventhandler/
> notifications.
>
> 1. They are executed sequentially and blocking nagios while executed
> 2. nagios runs amok when the eventhandler gets into an early_timeout
> because the main process
> wants to read from the killed pipe in a never ending loop.
>
> I wrote a small patch for the second issue, maybe someone with more
> c skills wants to have a look...

We saw a similar issue for notifications - thanks for your patch. I've
made a few modifications, attached. I don't know how you got it to
work because in your if statement, bytes_read would never be -1.

I've put the whole "reading output" section as an else, so we ignore
processing of the output if a timeout is considered.

I've also updated the debugging so that it prints the output
regardless of whether the calling function wants it or not, which
should help debugging.

BTW, if a SIGHUP or a SIGINT signal is sent while Nagios has a set of
notification commands for a list of contacts because of a host/service
alert, the signal is not processed until all those notification
commands have been completed. There probably should be a break
somewhere in that list. Another reason to make sure your notification
commands are quick!

Ton


--Apple-Mail-159-172227593
Content-Disposition: attachment;
filename=nagios_stop_notification_timeouts_spinning.patch
Content-Type: application/octet-stream; x-unix-mode=0644;
name="nagios_stop_notification_timeouts_spinning.patch"
Content-Transfer-Encoding: 7bit

diff -ur nagios-3.0.4/base/utils.c nagios-3.0.4.with_notifications_timeout/base/utils.c
--- nagios-3.0.4/base/utils.c 2008-10-15 18:43:55.000000000 +0100
+++ nagios-3.0.4.with_notifications_timeout/base/utils.c 2008-10-31 07:40:30.000000000 +0000
@@ -575,12 +575,30 @@
if(result3)
result=STATE_UNKNOWN;

- /* initialize output */
- strcpy(buffer,"");
-
/* initialize dynamic buffer */
dbuf_init(&output_dbuf,dbuf_chunk);

+ /* Opsera patch to check timeout before attempting to read output via pipe. Originally by Sven Nierlein */
+ /* Removed bytes_read from if statement below as not applicable */
+ /* if there was a critical return code AND the command time exceeded the timeout thresholds, assume a timeout */
+ if(result==STATE_CRITICAL && (end_time.tv_sec-start_time.tv_sec)>=timeout){
+
+ /* set the early timeout flag */
+ *early_timeout=TRUE;
+
+ /* try to kill the command that timed out by sending termination signal to child process group */
+ kill((pid_t)(-pid),SIGTERM);
+ sleep(1);
+ kill((pid_t)(-pid),SIGKILL);
+ }
+
+ /* Create if ! timeout block here, so read output if timeout has not occurred */
+ /* Indentation not fixed to allow easier patching */
+ else {
+
+ /* initialize output */
+ strcpy(buffer,"");
+
/* try and read the results from the command output (retry if we encountered a signal) */
do{
bytes_read=read(fd[0],buffer,sizeof(buffer)-1);
@@ -613,28 +631,18 @@
if(output!=NULL && output_dbuf.buf)
*output=(char *)strdup(output_dbuf.buf);

- /* free memory */
- dbuf_free(&output_dbuf);
-
- /* if there was a critical return code and no output AND the command time exceeded the timeout thresholds, assume a timeout */
- if(result==STATE_CRITICAL && bytes_read==-1 && (end_time.tv_sec-start_time.tv_sec)>=timeout){
-
- /* set the early timeout flag */
- *early_timeout=TRUE;
-
- /* try to kill the command that timed out by sending termination signal to child process group */
- kill((pid_t)(-pid),SIGTERM);
- sleep(1);
- kill((pid_t)(-pid),SIGKILL);
- }
+ } /

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked