[Nagios-devel] Failsafeness and -Wall fixes

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

[Nagios-devel] Failsafeness and -Wall fixes

Post by Guest »

This is a multi-part message in MIME format.
--------------060506090006070904030300
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

I've been trying to locate the SIGSEGV crash I reported earlier for
quite some time now, and have finally resorted to an interim "solution".

Attached is a patch to make nagios failsafe, insofar as such thing goes,
by forking a grandmaster instance that just waits for the child to exit
in any way. SIGTERM and SIGHUP are forwarded from the grandmaster to the
child, so things should work normally on the inside. Incidentally,
SIGTERM also causes the grandmaster to die after kill(2)'ing the child,
so it's possible to stop the daemon as well. ;)

On the downside, the failsafeness is implemented after the config is
read, so it's now impossible to reload the configuration through sending
a hup. This was necessary to reduce overhead and minimize .

Included in the patch are some minor fixes for the stricter of the
compiler warnings.

The crash is possibly reproducable using the plugin check_rand, which
will basically wreak havoc on nagios' exception handling in a fairly
disorderly manner. If nothing else it can be used to stresstest the core
code, or to get funny email (fortune was kind enough to provide one-liners).

check_rand will display random exit-messages roughly half the times it
exits, and will produces truly pseudo-random exit-codes. Don't generate
any cryptographic keys on the system you're using it on though, as it
leeches its chaos from /dev/urandom.

You can download check_rand from
http://oss.op5.se/nagios/check_rand.tar.gz. Read the code if you're
interested in how it works, but don't ask me to port it.

--
Andreas Ericsson [email protected]
OP5 AB www.op5.se
Lead Developer

--------------060506090006070904030300
Content-Type: text/plain;
name="nagios-failsafe-and-fixes.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="nagios-failsafe-and-fixes.diff"

diff -urN ../nagios.orig/base/Makefile.in ./base/Makefile.in
--- ../nagios.orig/base/Makefile.in Fri Dec 10 14:52:36 2004
+++ ./base/Makefile.in Mon Apr 4 20:29:18 2005
@@ -115,7 +115,7 @@
DDATADEPS=$(DDATALIBS)


-OBJS=$(BROKER_O) checks.o config.o commands.o events.o flapping.o logging.o notifications.o sehandlers.o utils.o $(RDATALIBS) $(CDATALIBS) $(ODATALIBS) $(SDATALIBS) $(PDATALIBS) $(DDATALIBS) $(BASEEXTRALIBS) $(SNPRINTF_O) $(PERLXSI_O)
+OBJS=$(BROKER_O) nagios.o failsafe.o checks.o config.o commands.o events.o flapping.o logging.o notifications.o sehandlers.o utils.o $(RDATALIBS) $(CDATALIBS) $(ODATALIBS) $(SDATALIBS) $(PDATALIBS) $(DDATALIBS) $(BASEEXTRALIBS) $(SNPRINTF_O) $(PERLXSI_O)
OBJDEPS=$(ODATADEPS) $(ODATADEPS) $(RDATADEPS) $(CDATADEPS) $(SDATADEPS) $(PDATADEPS) $(DDATADEPS) $(BROKER_H)

all: nagios nagiostats
@@ -162,13 +162,13 @@

########## CGIS ##########

-nagios: nagios.c $(OBJS) $(OBJDEPS) $(SRC_INCLUDE)/nagios.h $(SRC_INCLUDE)/locations.h
- $(CC) $(CFLAGS) -o $@ nagios.c $(OBJS) $(BROKER_LDFLAGS) $(LDFLAGS) $(PERLLIBS) $(MATHLIBS) $(SOCKETLIBS) $(THREADLIBS) $(BROKERLIBS) $(LIBS)
+nagios: $(OBJS) $(OBJDEPS) $(SRC_INCLUDE)/nagios.h $(SRC_INCLUDE)/locations.h
+ $(CC) $(CFLAGS) -o $@ $(OBJS) $(BROKER_LDFLAGS) $(LDFLAGS) $(PERLLIBS) $(MATHLIBS) $(SOCKETLIBS) $(THREADLIBS) $(BROKERLIBS) $(LIBS)

nagiostats: nagiostats.c $(SRC_INCLUDE)/locations.h
$(CC) $(CFLAGS) -o $@ nagiostats.c $(LDFLAGS) $(MATHLIBS) $(LIBS)

-$(OBJS): $(SRC_INCLUDE)/locations.h
+$(OBJS): $(SRC_INCLUDE)/locations.h $(SRC_INCLUDE)/config.h

clean:
rm -f nagios nagiostats core *.o gmon.out
diff -urN ../nagios.orig/base/failsafe.c ./base/failsafe.c
--- ../nagios.orig/base/failsafe.c Thu Jan 1 01:00:00 1970
+++ ./base/failsafe.c Mon Apr 4 21:09:01 2005
@@ -0,0 +1,175 @@
+/*
+ * Copyleft: GNU GPL 2.0 or later
+ *
+ * This is a rubbish'ish workaround meant to force Nagios to keep running
+ * even when it's hit by a SIGSEGV, designed to be in place until
+ * the problem causing the crashes is resolved.
+ *
+ */
+
+#include "../

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked