[Nagios-devel] Bug report: nagios shutdown removing lock file too
Posted: Tue Jun 13, 2006 9:06 am
Ethan,
I think I've seen a problem with the nagios shutdown routine. If
nagios is doing a host check and a INT signal is sent, it seems to
take a long time before the nagios daemon dies. It looks like the
child nagios process is trying to complete all the retries for a host
check before going back into the main loop.
Also, it appears that the lockfile is being removed before the main
process dies. Below is the output for a 'while true; do ps -p 728; ls
-l /usr/local/nagios/var/nagios.lock; sleep 1; done' during a kill 728.
[snipped]
PID TT STAT TIME COMMAND
728 ?? Ss 0:01.95 /usr/local/nagios/bin/nagios -d /usr/local/
nagios/etc/nagios.cfg
-rw-r--r-- 1 nagios nagios 4 Jun 13 17:20 /usr/local/nagios/var/
nagios.lock
PID TT STAT TIME COMMAND
728 ?? Ss 0:01.95 /usr/local/nagios/bin/nagios -d /usr/local/
nagios/etc/nagios.cfg
-rw-r--r-- 1 nagios nagios 4 Jun 13 17:20 /usr/local/nagios/var/
nagios.lock
PID TT STAT TIME COMMAND
728 ?? Ss 0:01.95 /usr/local/nagios/bin/nagios -d /usr/local/
nagios/etc/nagios.cfg
ls: /usr/local/nagios/var/nagios.lock: No such file or directory
PID TT STAT TIME COMMAND
728 ?? Ss 0:01.95 /usr/local/nagios/bin/nagios -d /usr/local/
nagios/etc/nagios.cfg
ls: /usr/local/nagios/var/nagios.lock: No such file or directory
This shows the lockfile gets removed before the main daemon dies.
(This is from a kill 728, not using any init scripts.) Eventually the
daemon dies.
I've tested this on Nagios 2.2 on MacOSX 10.4, Nagios 2.0 on Debian
and Nagios 2.4 on Debian.
Sorry, not had time to delve into the source code.
Ton
http://www.altinity.com
T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
I think I've seen a problem with the nagios shutdown routine. If
nagios is doing a host check and a INT signal is sent, it seems to
take a long time before the nagios daemon dies. It looks like the
child nagios process is trying to complete all the retries for a host
check before going back into the main loop.
Also, it appears that the lockfile is being removed before the main
process dies. Below is the output for a 'while true; do ps -p 728; ls
-l /usr/local/nagios/var/nagios.lock; sleep 1; done' during a kill 728.
[snipped]
PID TT STAT TIME COMMAND
728 ?? Ss 0:01.95 /usr/local/nagios/bin/nagios -d /usr/local/
nagios/etc/nagios.cfg
-rw-r--r-- 1 nagios nagios 4 Jun 13 17:20 /usr/local/nagios/var/
nagios.lock
PID TT STAT TIME COMMAND
728 ?? Ss 0:01.95 /usr/local/nagios/bin/nagios -d /usr/local/
nagios/etc/nagios.cfg
-rw-r--r-- 1 nagios nagios 4 Jun 13 17:20 /usr/local/nagios/var/
nagios.lock
PID TT STAT TIME COMMAND
728 ?? Ss 0:01.95 /usr/local/nagios/bin/nagios -d /usr/local/
nagios/etc/nagios.cfg
ls: /usr/local/nagios/var/nagios.lock: No such file or directory
PID TT STAT TIME COMMAND
728 ?? Ss 0:01.95 /usr/local/nagios/bin/nagios -d /usr/local/
nagios/etc/nagios.cfg
ls: /usr/local/nagios/var/nagios.lock: No such file or directory
This shows the lockfile gets removed before the main daemon dies.
(This is from a kill 728, not using any init scripts.) Eventually the
daemon dies.
I've tested this on Nagios 2.2 on MacOSX 10.4, Nagios 2.0 on Debian
and Nagios 2.4 on Debian.
Sorry, not had time to delve into the source code.
Ton
http://www.altinity.com
T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]