Re: [Nagios-devel] SIGSEGV on reload - incorrect freeing of string?

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] SIGSEGV on reload - incorrect freeing of string?

Post by Guest »

Ton Voon wrote:
> Hi!
>
> I wanted to check what people thought about this, because my
> knowledge of string handling in C is poor and this problem is hard to
> recreate.
>
> We had a situation where nagios had a segfault. Here are the
> pertinent entries in nagios.log:
>
> [1171976003] Caught SIGHUP, restarting...
> [1171976010] HOST ALERT: hostB;DOWN;SOFT;1;CRITICAL - Plugin timed
> out after 10 seconds
> [1171976010] SERVICE ALERT: hostB;TCP 2226;CRITICAL;SOFT;1;CRITICAL -
> Socket timeout after 10 seconds
> [1171976010] SERVICE ALERT: hostA;TCP 2222;CRITICAL;SOFT;1;CRITICAL -
> Socket timeout after 10 seconds
> [1171976011] Caught SIGSEGV, shutting down...
> [1171976073] Nagios 2.5 starting... (PID=19542)
>
> Note that there is a 7 second delay between the signal being caught
> and the host alerts, which is reasonable given the check_ping plugin
> was timing out after 10 seconds (there were actual network/host
> problems). So it is fair to assume that Nagios was in the host
> reachability logic.
>
> Ethan made a change to Nagios in 2.5 where nagios exited out of the
> host check logic earlier (as this was slowing the restart). This
> includes the following code in checks.c:
>
> -----
>
> *** 2086,2089 ****
> --- 2094,2107 ----
> for(hst->current_attempt=1;hst-
> >current_attemptcurrent_attempt++){
>
> + /* ADDED 06/20/2006 EG */
> + /* bail out if signal encountered - use old state */
> + if(sigrestart==TRUE || sigshutdown==TRUE){
> + hst->current_attempt=1;
> + hst->current_state=old_state;
> + free(hst->plugin_output);
> + hst->plugin_output=(char *)old_plugin_output;
> + return hst->current_state;
> + }
> +
> /* check the host */
> result=run_host_check(hst,check_options);
> ***************
> *** 2172,2175 ****
> --- 2190,2203 ----
> for(hst->current_attempt=1;hst->current_attempt >max_attempts;hst->current_attempt++){
>
> + /* ADDED 06/20/2006 EG */
> + /* bail out if signal encountered - use old state */
> + if(sigrestart==TRUE || sigshutdown==TRUE){
> + hst->current_attempt=1;
> + hst->current_state=old_state;
> + free(hst->plugin_output);
> + hst->plugin_output=(char *)old_plugin_output;
> + return hst->current_state;
> + }
> +
> /* run the host check */
> result=run_host_check(hst,check_options);
>
>
> -----
>
> I'm wondering if the "free(hst->plugin_output)" is the problem.
> Shouldn't this be a:
>
> strcpy(hst->plugin_output, old_plugin_output);
>
> instead?
>
> When trying to recreate this, I created a host with an IP address
> that is not pingable, using check_ping as the host check command. I
> then submit a passive OK to this host and then schedule a check for
> all services immediately. After a few seconds (to allow nagios to get
> into the host check logic portion), I send a HUP signal to nagios.
> The logs show a Caught SIGHUP and a delay before the HOST ALERT.
>
> I can't get nagios to segfault, but I get the plugin output set to
> funny characters, which suggests that plugin_output is not being
> correctly set in those routines. These corrupt characters do not
> appear if I change to a strcpy call. So it is possible that the
> segfault is happening somewhere else.
>
> Any thoughts?
>
> Ton
>

Good catch on this - I think the free() might be the problem. I just
added a NULL check before calling free() to fix that potential problem.
Also, I should have been using strdup() instead of referencing a
pointer to the old_plugin_output[] buffer. Patch is now in CVS.



Ethan Galstad,
Nagios Developer
---
Email: [email protected]
Website: http://www.nagios.org





This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked