Page 1 of 1

[Nagios-devel] SIGSEGV on reload - incorrect freeing of string?

Posted: Tue Feb 20, 2007 1:25 pm
by Guest
Hi!

I wanted to check what people thought about this, because my
knowledge of string handling in C is poor and this problem is hard to
recreate.

We had a situation where nagios had a segfault. Here are the
pertinent entries in nagios.log:

[1171976003] Caught SIGHUP, restarting...
[1171976010] HOST ALERT: hostB;DOWN;SOFT;1;CRITICAL - Plugin timed
out after 10 seconds
[1171976010] SERVICE ALERT: hostB;TCP 2226;CRITICAL;SOFT;1;CRITICAL -
Socket timeout after 10 seconds
[1171976010] SERVICE ALERT: hostA;TCP 2222;CRITICAL;SOFT;1;CRITICAL -
Socket timeout after 10 seconds
[1171976011] Caught SIGSEGV, shutting down...
[1171976073] Nagios 2.5 starting... (PID=19542)

Note that there is a 7 second delay between the signal being caught
and the host alerts, which is reasonable given the check_ping plugin
was timing out after 10 seconds (there were actual network/host
problems). So it is fair to assume that Nagios was in the host
reachability logic.

Ethan made a change to Nagios in 2.5 where nagios exited out of the
host check logic earlier (as this was slowing the restart). This
includes the following code in checks.c:

-----

*** 2086,2089 ****
--- 2094,2107 ----
for(hst->current_attempt=1;hst-
>current_attemptcurrent_attempt++){

+ /* ADDED 06/20/2006 EG */
+ /* bail out if signal encountered - use old state */
+ if(sigrestart==TRUE || sigshutdown==TRUE){
+ hst->current_attempt=1;
+ hst->current_state=old_state;
+ free(hst->plugin_output);
+ hst->plugin_output=(char *)old_plugin_output;
+ return hst->current_state;
+ }
+
/* check the host */
result=run_host_check(hst,check_options);
***************
*** 2172,2175 ****
--- 2190,2203 ----
for(hst->current_attempt=1;hst->current_attemptmax_attempts;hst->current_attempt++){

+ /* ADDED 06/20/2006 EG */
+ /* bail out if signal encountered - use old state */
+ if(sigrestart==TRUE || sigshutdown==TRUE){
+ hst->current_attempt=1;
+ hst->current_state=old_state;
+ free(hst->plugin_output);
+ hst->plugin_output=(char *)old_plugin_output;
+ return hst->current_state;
+ }
+
/* run the host check */
result=run_host_check(hst,check_options);


-----

I'm wondering if the "free(hst->plugin_output)" is the problem.
Shouldn't this be a:

strcpy(hst->plugin_output, old_plugin_output);

instead?

When trying to recreate this, I created a host with an IP address
that is not pingable, using check_ping as the host check command. I
then submit a passive OK to this host and then schedule a check for
all services immediately. After a few seconds (to allow nagios to get
into the host check logic portion), I send a HUP signal to nagios.
The logs show a Caught SIGHUP and a delay before the HOST ALERT.

I can't get nagios to segfault, but I get the plugin output set to
funny characters, which suggests that plugin_output is not being
correctly set in those routines. These corrupt characters do not
appear if I change to a strcpy call. So it is possible that the
segfault is happening somewhere else.

Any thoughts?

Ton

http://www.altinity.com
T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon







This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]