[Nagios-devel] SIGSEGV on reload - incorrect freeing of string?

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

[Nagios-devel] SIGSEGV on reload - incorrect freeing of string?

Post by Guest »

Hi!

I wanted to check what people thought about this, because my
knowledge of string handling in C is poor and this problem is hard to
recreate.

We had a situation where nagios had a segfault. Here are the
pertinent entries in nagios.log:

[1171976003] Caught SIGHUP, restarting...
[1171976010] HOST ALERT: hostB;DOWN;SOFT;1;CRITICAL - Plugin timed
out after 10 seconds
[1171976010] SERVICE ALERT: hostB;TCP 2226;CRITICAL;SOFT;1;CRITICAL -
Socket timeout after 10 seconds
[1171976010] SERVICE ALERT: hostA;TCP 2222;CRITICAL;SOFT;1;CRITICAL -
Socket timeout after 10 seconds
[1171976011] Caught SIGSEGV, shutting down...
[1171976073] Nagios 2.5 starting... (PID=19542)

Note that there is a 7 second delay between the signal being caught
and the host alerts, which is reasonable given the check_ping plugin
was timing out after 10 seconds (there were actual network/host
problems). So it is fair to assume that Nagios was in the host
reachability logic.

Ethan made a change to Nagios in 2.5 where nagios exited out of the
host check logic earlier (as this was slowing the restart). This
includes the following code in checks.c:

-----

*** 2086,2089 ****
--- 2094,2107 ----
for(hst->current_attempt=1;hst-
>current_attemptcurrent_attempt++){

+ /* ADDED 06/20/2006 EG */
+ /* bail out if signal encountered - use old state */
+ if(sigrestart==TRUE || sigshutdown==TRUE){
+ hst->current_attempt=1;
+ hst->current_state=old_state;
+ free(hst->plugin_output);
+ hst->plugin_output=(char *)old_plugin_output;
+ return hst->current_state;
+ }
+
/* check the host */
result=run_host_check(hst,check_options);
***************
*** 2172,2175 ****
--- 2190,2203 ----
for(hst->current_attempt=1;hst->current_attemptmax_attempts;hst->current_attempt++){

+ /* ADDED 06/20/2006 EG */
+ /* bail out if signal encountered - use old state */
+ if(sigrestart==TRUE || sigshutdown==TRUE){
+ hst->current_attempt=1;
+ hst->current_state=old_state;
+ free(hst->plugin_output);
+ hst->plugin_output=(char *)old_plugin_output;
+ return hst->current_state;
+ }
+
/* run the host check */
result=run_host_check(hst,check_options);


-----

I'm wondering if the "free(hst->plugin_output)" is the problem.
Shouldn't this be a:

strcpy(hst->plugin_output, old_plugin_output);

instead?

When trying to recreate this, I created a host with an IP address
that is not pingable, using check_ping as the host check command. I
then submit a passive OK to this host and then schedule a check for
all services immediately. After a few seconds (to allow nagios to get
into the host check logic portion), I send a HUP signal to nagios.
The logs show a Caught SIGHUP and a delay before the HOST ALERT.

I can't get nagios to segfault, but I get the plugin output set to
funny characters, which suggests that plugin_output is not being
correctly set in those routines. These corrupt characters do not
appear if I change to a strcpy call. So it is possible that the
segfault is happening somewhere else.

Any thoughts?

Ton

http://www.altinity.com
T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon







This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked