Page 1 of 2

Unable to start Nagios service - nagios.lock missing

Posted: Wed Apr 10, 2019 11:45 am
by rferebee
Hello,

I'm seeing an issue this morning with my backup XI server. Randomly the Monitoring Engine stopped and when I try to start it, I get the following error: No lock file found in /var/run/nagios.lock

I found this support thread, but the steps listed to fix the problem did not work:

https://support.nagios.com/forum/viewto ... 16&t=51647

I tried repairing the SQL database and rebooting the server, but no joy. Any help is appreciated.

Thank you.

Re: Unable to start Nagios service - nagios.lock missing

Posted: Wed Apr 10, 2019 12:37 pm
by npolovenko
Hello, @rferebee. What version of XI are you using? Did you verify that the lock file location matches in all the following configurations?
cat /etc/rc.d/init.d/nagios | grep NagiosRunFile=
cat /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint.sh | grep lockfile=
cat /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint_specific.sh | grep lockfile=
cat /usr/local/nagios/etc/nagios.cfg | grep lock
Can you run all the following commands in order and let me know if that fixes your issue?
service crond stop
service npcd stop
service nagios stop
service ndo2db stop
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
service mysqld restart
service ndo2db start
service nagios start
service npcd start
service crond start

Re: Unable to start Nagios service - nagios.lock missing

Posted: Wed Apr 10, 2019 12:47 pm
by rferebee
We are running version 5.5.8.

As stated previously, I tried all the steps you provided already and they did not resolve the issue.

Re: Unable to start Nagios service - nagios.lock missing

Posted: Wed Apr 10, 2019 1:30 pm
by ssax
Please send the output of the information:

Is this RHEL/CENT 6.X or 7.X?

Code: Select all

cat /etc/redhat-release
uname -a
Please include the output of both of these commands:

Code: Select all

service nagios status
systemctl status nagios
Are you running mod_gearman?

Code: Select all

grep gearman /usr/local/nagios/etc/nagios.cfg
What version of Core are you running?

Code: Select all

/usr/local/nagios/bin/nagios -V
Let's check the lock/pid file paths:

Code: Select all

grep -i "lock\|pid" /etc/init.d/nagios /usr/local/nagios/etc/nagios.cfg /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint.sh /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint_specific.sh
Once I get that information I can craft a solution for you.

Thank you!

Re: Unable to start Nagios service - nagios.lock missing

Posted: Wed Apr 10, 2019 3:06 pm
by rferebee
root@nagiosxi-lv> cat /etc/redhat-release
CentOS release 6.10 (Final)
root@nagiosxi-lv> uname -a
Linux nagiosxi-lv 2.6.32-754.12.1.el6.x86_64 #1 SMP Tue Apr 9 14:52:26 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux


root@nagiosxi-lv> service nagios status
No lock file found in /var/run/nagios.lock
root@nagiosxi-lv> systemctl status nagios
-ksh: systemctl: not found [No such file or directory]


No output returned for the following command:

grep gearman /usr/local/nagios/etc/nagios.cfg


root@nagiosxi-lv> /usr/local/nagios/bin/nagios -V

Nagios Core 4.4.2
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2018-08-16
License: GPL

Website: https://www.nagios.org
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License version 2 as
published by the Free Software Foundation.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.


root@nagiosxi-lv> grep -i "lock\|pid" /etc/init.d/nagios /usr/local/nagios/etc/nagios.cfg
/etc/init.d/nagios:NagiosRunFile=/var/run/nagios.lock
/etc/init.d/nagios: if ps -p $NagiosPID > /dev/null 2>&1; then
/etc/init.d/nagios: echo "nagios (pid $NagiosPID) is running..."
/etc/init.d/nagios: kill -s "$1" $NagiosPID
/etc/init.d/nagios:pid_nagios ()
/etc/init.d/nagios: echo "No lock file found in $NagiosRunFile"
/etc/init.d/nagios: NagiosPID=`head -n 1 $NagiosRunFile`
/etc/init.d/nagios: NagiosPID=`head -n 1 $NagiosRunFile`
/etc/init.d/nagios: pid_nagios
/etc/init.d/nagios: pid_nagios
/etc/init.d/nagios: pid_nagios
/usr/local/nagios/etc/nagios.cfg:lock_file=/var/run/nagios.lock

root@nagiosxi-lv> /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint.sh
LATEST NOM SNAPSHOT: /usr/local/nagiosxi/nom/checkpoints/nagioscore/1554844189.tar.gz
/ ~
RESTORING NOM SNAPSHOT : /usr/local/nagiosxi/nom/checkpoints/nagioscore/1554844189.tar.gz
~

--- reset_config_perms.sh ------------
> Setting CCM script permissions
> Setting script permissions
> Setting special component script permissions
> Setting configuration file/directory permissions
> Setting perfdata directory and RRD permissions
> Setting NOM checkpoint user:group permissions
> + Setting CCM configuration file user:group permissions
> + Setting Recurring Downtime file user:group permissions
> + Setting BPI configuration file user:group permissions
--------------------------------------

root@nagiosxi-lv> /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint_specific.sh
NOM SNAPSHOT /usr/local/nagiosxi/nom/checkpoints/nagioscore/.tar.gz NOT FOUND!

Re: Unable to start Nagios service - nagios.lock missing

Posted: Wed Apr 10, 2019 3:32 pm
by ssax
Send the output of these as well, include fresh output from the ones below I've already requested.

Code: Select all

grep -i "lock" /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint.sh /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint_specific.sh
ls -ld /var
ls -ld /var/run
ls -l /var/run/nag*
service nagios start
tail -n100 /usr/local/nagios/var/nagios.log
service nagios status

Re: Unable to start Nagios service - nagios.lock missing

Posted: Wed Apr 10, 2019 3:44 pm
by rferebee
root@nagiosxi-lv> grep -i "lock" /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint.sh
lockfile="/var/run/nagios.lock"
# Verify the nagios.cfg lock file is set to proper location
sed -i "s|lock_file=.*|lock_file=$lockfile|" /usr/local/nagios/etc/nagios.cfg


root@nagiosxi-lv> /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint_specific.sh
NOM SNAPSHOT /usr/local/nagiosxi/nom/checkpoints/nagioscore/.tar.gz NOT FOUND!


root@nagiosxi-lv> ls -ld /var
drwxr-xr-x 24 root root 4096 Apr 10 2017 /var


root@nagiosxi-lv> ls -ld /var/run
drwxr-xr-x. 43 root root 4096 Apr 10 09:38 /var/run


root@nagiosxi-lv> ls -l /var/run/nag*
ls: cannot access /var/run/nag*: No such file or directory


root@nagiosxi-lv> service nagios start
Starting nagios: done.


root@nagiosxi-lv> tail -n100 /usr/local/nagios/var/nagios.log
[1554929037] Warning: Service 'Disk Check /var' on host 'nagiosxi' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Ping' on host 'nagiosxi' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'SSH' on host 'nagiosxi' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Disk Check /apps' on host 'nagiosxi-lv' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Disk Check /home' on host 'nagiosxi-lv' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Disk Check /var' on host 'nagiosxi-lv' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'HTTP' on host 'nagiosxi-lv_monitor' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'I/O Wait' on host 'nagiosxi-lv_monitor' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Load' on host 'nagiosxi-lv_monitor' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'NRPE' on host 'nagiosxi-lv_monitor' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Nagios XI Daemons' on host 'nagiosxi-lv_monitor' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Nagios XI Jobs' on host 'nagiosxi-lv_monitor' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Ping' on host 'nagiosxi-lv_monitor' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'CPU Load NT' on host 'ns1out' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Free Space on C Drive' on host 'ns1out' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Memory Check' on host 'ns1out' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'CPU Load NT' on host 'ns2out' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Free Space on C Drive' on host 'ns2out' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Memory Check' on host 'ns2out' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'CPU Load NT' on host 'ns4' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Memory Check' on host 'ns4' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'CPU Load NT' on host 'ns5' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Memory Check' on host 'ns5' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Blacklist Status' on host 'peb' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'SMTP' on host 'peb' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Ping' on host 'silverflume' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'CPU Load NT' on host 'vCenter-SQL' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'DHHSISD045' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'DHHSISD046' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'DHHSISD101' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'OOM-GC1.adjgencc-ad' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'adfs.state.nv.us' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'caucc' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'cvcs' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'doe_esainods01' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'doe_esainrpt01' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'doe_esainsql03' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'edbop07cc.nvdps' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'edbstsqllab' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'ent-sdcslab' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'gfcreports' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'gs1.web' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'localhost' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'lvcvcs' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'lvma01' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'lvma02' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'lvma03' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'lvma04' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'lvma05' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'ma02' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'ma03' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'ma04' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'ma05' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'mail_mail' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'nagiostest_monitor' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'nagiosxi-lv_monitor' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'peb' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'prodexweb' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'silverflume' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'vCenter-SQL' has no default contacts or contactgroups defined!
[1554929037] Successfully launched command file worker with pid 15280
[1554929037] HOST DOWNTIME ALERT: ent-smplab;STARTED; Host has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;NRPE;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;Nagios XI Jobs;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;HTTP;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;Nagios XI Jobs;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;Nagios XI Daemons;STARTED; Service has entered a period of scheduled downtime
[1554929037] HOST DOWNTIME ALERT: nagiosxi_monitor;STARTED; Host has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;HTTP;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;Ping;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;Ping;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;Nagios XI Daemons;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;Nagios XI Daemons;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;PEBPAD Host Cluster: pebdc01 pebdc02;STARTED; Service has entered a period of scheduled downtime
[1554929037] HOST DOWNTIME ALERT: nagiostest_monitor;STARTED; Host has entered a period of scheduled downtime
[1554929037] HOST DOWNTIME ALERT: nagiostest_monitor;STOPPED; Host has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;Host Cluster: BIP3 BIP4;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;Host Cluster: BIP3 BIP4;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;Ping;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;Load;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;Host Cluster: BIP1 BIP2;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;NRPE;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;I/O Wait;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;I/O Wait;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;Load;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;Ping;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;Load;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;Host Cluster: BIP1 BIP2;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;NRPE;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;I/O Wait;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;I/O Wait;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;Load;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;NRPE;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;Nagios XI Jobs;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;HTTP;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;Nagios XI Jobs;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;Nagios XI Daemons;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] HOST DOWNTIME ALERT: nagiosxi_monitor;STOPPED; Host has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;PEBPAD Host Cluster: pebdc01 pebdc02;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;HTTP;STOPPED; Service has exited from a period of scheduled downtime


root@nagiosxi-lv> service nagios status
nagios (pid 15269) is running...

Re: Unable to start Nagios service - nagios.lock missing

Posted: Wed Apr 10, 2019 4:04 pm
by ssax
Looks good now, validate it's working.

What is the output of these commands again:

Code: Select all

ls -l /var/run/nag*
service nagios status

Re: Unable to start Nagios service - nagios.lock missing

Posted: Wed Apr 10, 2019 4:09 pm
by rferebee
Looks good, any insight as to what we did that fixed it?

root@nagiosxi-lv> ls -l /var/run/nag*
-rw-r--r-- 1 root root 6 Apr 10 13:43 /var/run/nagios.lock
root@nagiosxi-lv> service nagios status
nagios (pid 15269) is running...

For posterity of course! :)

Re: Unable to start Nagios service - nagios.lock missing

Posted: Thu Apr 11, 2019 2:40 pm
by ssax
The nagios service was not running so when it was checking status, it showed:

Code: Select all

root@nagiosxi-lv> service nagios status
No lock file found in /var/run/nagios.lock
But then we started it:

Code: Select all

root@nagiosxi-lv> service nagios start
Starting nagios: done.
Which now shows:

Code: Select all

root@nagiosxi-lv> service nagios status
nagios (pid 15269) is running...
So for whatever reason your nagios process was stopped (no lock file found so it likely didn't crash) and just starting it allowed it to work again.