[Nagios-devel] PATCH: Fix (make it less likely to happen) race in
Posted: Wed Mar 28, 2007 12:26 am
Hi,
there is apparently a race in the stop/restart functionality of the
nagios init script.
When doing stop or restart, the "$NagiosRunFile" can disappear between
the check for existence and the "head" command in the status_nagios
function. As a result one gets a nasty usage message from "ps".
Observed on RHEL4/x86_64 with Nagios 2.6 and 2.7. The following patch
seems to scare away the problem. It does not fix the race, but the
problem does no longer occur.
For me the additional 2 sec delay in stop/restart is acceptable.
# diff -pu daemon-init daemon-init-new
--- daemon-init 2007-03-08 11:11:50.000000000 +0100
+++ daemon-init-new 2007-03-28 10:23:22.000000000 +0200
@@ -37,6 +37,10 @@ status_nagios ()
return 1
fi
+#
+# There is a race between here and the check above !!!
+# Fixed by a "sleep 2" in the "stop" case
+#
NagiosPID=`head -n 1 $NagiosRunFile`
if test -x $NagiosCGI/daemonchk.cgi; then
if $NagiosCGI/daemonchk.cgi -l $NagiosRunFile; then
@@ -139,6 +143,10 @@ case "$1" in
# new NagiosRunFile, allowing multiple nagios daemons
# to (sooner or later) run - John Sellens
echo -n 'Waiting for nagios to exit .'
+#
+# Sleep first to minimize risk of race in status_nagios
+#
+ sleep 2
for i in 1 2 3 4 5 6 7 8 9 10 ; do
if status_nagios > /dev/null; then
echo -n ' .'
------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
there is apparently a race in the stop/restart functionality of the
nagios init script.
When doing stop or restart, the "$NagiosRunFile" can disappear between
the check for existence and the "head" command in the status_nagios
function. As a result one gets a nasty usage message from "ps".
Observed on RHEL4/x86_64 with Nagios 2.6 and 2.7. The following patch
seems to scare away the problem. It does not fix the race, but the
problem does no longer occur.
For me the additional 2 sec delay in stop/restart is acceptable.
# diff -pu daemon-init daemon-init-new
--- daemon-init 2007-03-08 11:11:50.000000000 +0100
+++ daemon-init-new 2007-03-28 10:23:22.000000000 +0200
@@ -37,6 +37,10 @@ status_nagios ()
return 1
fi
+#
+# There is a race between here and the check above !!!
+# Fixed by a "sleep 2" in the "stop" case
+#
NagiosPID=`head -n 1 $NagiosRunFile`
if test -x $NagiosCGI/daemonchk.cgi; then
if $NagiosCGI/daemonchk.cgi -l $NagiosRunFile; then
@@ -139,6 +143,10 @@ case "$1" in
# new NagiosRunFile, allowing multiple nagios daemons
# to (sooner or later) run - John Sellens
echo -n 'Waiting for nagios to exit .'
+#
+# Sleep first to minimize risk of race in status_nagios
+#
+ sleep 2
for i in 1 2 3 4 5 6 7 8 9 10 ; do
if status_nagios > /dev/null; then
echo -n ' .'
------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]