Hello:
We faced an issue when nagios silently failed to fork. Since it's
part of our startup scripts, this delayed the rest of the daemons
until nagios exited. For this reason nagios should take care, that if
it doesn't daemonize correctly, it should die. We are using nagios
3.0.5, but I see the bug still exists in the trunk. Here is our
nagios.log showing the problem:
[1242115575] Nagios 3.0.5 starting... (PID=773)
[1242115575] Local time is Tue May 12 11:06:15 EEST 2009
[1242115575] LOG VERSION: 2.0
[1242115575] Finished daemonizing... (New PID=773)
Notice the "New PID" equals the old PID. I think this is the problem
code, from base/utils.c:
2036 : /* check for SIGHUP */
2037 : if(val==1 && (pid=(pid_t)pidno)==getpid()){
2038 : close(lockfile);
2039 : return OK;
2040 : }
If nagios is started at boot time, it's somewhat likely it will have
about the same PID on the next boot. So if the lockfile isn't cleaned
for some reason, then nagios will start, read the lockfile, and not
fork.
2042 : /* exit on errors... */
2043 : if((pid=fork())<0)
2044 : return(ERROR);
This looks like another bug. If fork fails, we return ERROR, but the
return code is just silently ignored.
Thanks,
Devin
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]