Unable to start Nagios service - nagios.lock missing

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Unable to start Nagios service - nagios.lock missing

Post by rferebee »

Hello,

I'm seeing an issue this morning with my backup XI server. Randomly the Monitoring Engine stopped and when I try to start it, I get the following error: No lock file found in /var/run/nagios.lock

I found this support thread, but the steps listed to fix the problem did not work:

https://support.nagios.com/forum/viewto ... 16&t=51647

I tried repairing the SQL database and rebooting the server, but no joy. Any help is appreciated.

Thank you.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Unable to start Nagios service - nagios.lock missing

Post by npolovenko »

Hello, @rferebee. What version of XI are you using? Did you verify that the lock file location matches in all the following configurations?
cat /etc/rc.d/init.d/nagios | grep NagiosRunFile=
cat /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint.sh | grep lockfile=
cat /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint_specific.sh | grep lockfile=
cat /usr/local/nagios/etc/nagios.cfg | grep lock
Can you run all the following commands in order and let me know if that fixes your issue?
service crond stop
service npcd stop
service nagios stop
service ndo2db stop
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
service mysqld restart
service ndo2db start
service nagios start
service npcd start
service crond start
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Unable to start Nagios service - nagios.lock missing

Post by rferebee »

We are running version 5.5.8.

As stated previously, I tried all the steps you provided already and they did not resolve the issue.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Unable to start Nagios service - nagios.lock missing

Post by ssax »

Please send the output of the information:

Is this RHEL/CENT 6.X or 7.X?

Code: Select all

cat /etc/redhat-release
uname -a
Please include the output of both of these commands:

Code: Select all

service nagios status
systemctl status nagios
Are you running mod_gearman?

Code: Select all

grep gearman /usr/local/nagios/etc/nagios.cfg
What version of Core are you running?

Code: Select all

/usr/local/nagios/bin/nagios -V
Let's check the lock/pid file paths:

Code: Select all

grep -i "lock\|pid" /etc/init.d/nagios /usr/local/nagios/etc/nagios.cfg /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint.sh /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint_specific.sh
Once I get that information I can craft a solution for you.

Thank you!
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Unable to start Nagios service - nagios.lock missing

Post by rferebee »

root@nagiosxi-lv> cat /etc/redhat-release
CentOS release 6.10 (Final)
root@nagiosxi-lv> uname -a
Linux nagiosxi-lv 2.6.32-754.12.1.el6.x86_64 #1 SMP Tue Apr 9 14:52:26 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux


root@nagiosxi-lv> service nagios status
No lock file found in /var/run/nagios.lock
root@nagiosxi-lv> systemctl status nagios
-ksh: systemctl: not found [No such file or directory]


No output returned for the following command:

grep gearman /usr/local/nagios/etc/nagios.cfg


root@nagiosxi-lv> /usr/local/nagios/bin/nagios -V

Nagios Core 4.4.2
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2018-08-16
License: GPL

Website: https://www.nagios.org
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License version 2 as
published by the Free Software Foundation.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.


root@nagiosxi-lv> grep -i "lock\|pid" /etc/init.d/nagios /usr/local/nagios/etc/nagios.cfg
/etc/init.d/nagios:NagiosRunFile=/var/run/nagios.lock
/etc/init.d/nagios: if ps -p $NagiosPID > /dev/null 2>&1; then
/etc/init.d/nagios: echo "nagios (pid $NagiosPID) is running..."
/etc/init.d/nagios: kill -s "$1" $NagiosPID
/etc/init.d/nagios:pid_nagios ()
/etc/init.d/nagios: echo "No lock file found in $NagiosRunFile"
/etc/init.d/nagios: NagiosPID=`head -n 1 $NagiosRunFile`
/etc/init.d/nagios: NagiosPID=`head -n 1 $NagiosRunFile`
/etc/init.d/nagios: pid_nagios
/etc/init.d/nagios: pid_nagios
/etc/init.d/nagios: pid_nagios
/usr/local/nagios/etc/nagios.cfg:lock_file=/var/run/nagios.lock

root@nagiosxi-lv> /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint.sh
LATEST NOM SNAPSHOT: /usr/local/nagiosxi/nom/checkpoints/nagioscore/1554844189.tar.gz
/ ~
RESTORING NOM SNAPSHOT : /usr/local/nagiosxi/nom/checkpoints/nagioscore/1554844189.tar.gz
~

--- reset_config_perms.sh ------------
> Setting CCM script permissions
> Setting script permissions
> Setting special component script permissions
> Setting configuration file/directory permissions
> Setting perfdata directory and RRD permissions
> Setting NOM checkpoint user:group permissions
> + Setting CCM configuration file user:group permissions
> + Setting Recurring Downtime file user:group permissions
> + Setting BPI configuration file user:group permissions
--------------------------------------

root@nagiosxi-lv> /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint_specific.sh
NOM SNAPSHOT /usr/local/nagiosxi/nom/checkpoints/nagioscore/.tar.gz NOT FOUND!
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Unable to start Nagios service - nagios.lock missing

Post by ssax »

Send the output of these as well, include fresh output from the ones below I've already requested.

Code: Select all

grep -i "lock" /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint.sh /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint_specific.sh
ls -ld /var
ls -ld /var/run
ls -l /var/run/nag*
service nagios start
tail -n100 /usr/local/nagios/var/nagios.log
service nagios status
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Unable to start Nagios service - nagios.lock missing

Post by rferebee »

root@nagiosxi-lv> grep -i "lock" /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint.sh
lockfile="/var/run/nagios.lock"
# Verify the nagios.cfg lock file is set to proper location
sed -i "s|lock_file=.*|lock_file=$lockfile|" /usr/local/nagios/etc/nagios.cfg


root@nagiosxi-lv> /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint_specific.sh
NOM SNAPSHOT /usr/local/nagiosxi/nom/checkpoints/nagioscore/.tar.gz NOT FOUND!


root@nagiosxi-lv> ls -ld /var
drwxr-xr-x 24 root root 4096 Apr 10 2017 /var


root@nagiosxi-lv> ls -ld /var/run
drwxr-xr-x. 43 root root 4096 Apr 10 09:38 /var/run


root@nagiosxi-lv> ls -l /var/run/nag*
ls: cannot access /var/run/nag*: No such file or directory


root@nagiosxi-lv> service nagios start
Starting nagios: done.


root@nagiosxi-lv> tail -n100 /usr/local/nagios/var/nagios.log
[1554929037] Warning: Service 'Disk Check /var' on host 'nagiosxi' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Ping' on host 'nagiosxi' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'SSH' on host 'nagiosxi' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Disk Check /apps' on host 'nagiosxi-lv' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Disk Check /home' on host 'nagiosxi-lv' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Disk Check /var' on host 'nagiosxi-lv' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'HTTP' on host 'nagiosxi-lv_monitor' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'I/O Wait' on host 'nagiosxi-lv_monitor' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Load' on host 'nagiosxi-lv_monitor' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'NRPE' on host 'nagiosxi-lv_monitor' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Nagios XI Daemons' on host 'nagiosxi-lv_monitor' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Nagios XI Jobs' on host 'nagiosxi-lv_monitor' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Ping' on host 'nagiosxi-lv_monitor' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'CPU Load NT' on host 'ns1out' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Free Space on C Drive' on host 'ns1out' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Memory Check' on host 'ns1out' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'CPU Load NT' on host 'ns2out' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Free Space on C Drive' on host 'ns2out' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Memory Check' on host 'ns2out' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'CPU Load NT' on host 'ns4' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Memory Check' on host 'ns4' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'CPU Load NT' on host 'ns5' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Memory Check' on host 'ns5' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Blacklist Status' on host 'peb' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'SMTP' on host 'peb' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'Ping' on host 'silverflume' has no default contacts or contactgroups defined!
[1554929037] Warning: Service 'CPU Load NT' on host 'vCenter-SQL' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'DHHSISD045' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'DHHSISD046' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'DHHSISD101' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'OOM-GC1.adjgencc-ad' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'adfs.state.nv.us' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'caucc' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'cvcs' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'doe_esainods01' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'doe_esainrpt01' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'doe_esainsql03' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'edbop07cc.nvdps' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'edbstsqllab' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'ent-sdcslab' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'gfcreports' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'gs1.web' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'localhost' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'lvcvcs' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'lvma01' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'lvma02' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'lvma03' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'lvma04' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'lvma05' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'ma02' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'ma03' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'ma04' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'ma05' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'mail_mail' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'nagiostest_monitor' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'nagiosxi-lv_monitor' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'peb' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'prodexweb' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'silverflume' has no default contacts or contactgroups defined!
[1554929037] Warning: Host 'vCenter-SQL' has no default contacts or contactgroups defined!
[1554929037] Successfully launched command file worker with pid 15280
[1554929037] HOST DOWNTIME ALERT: ent-smplab;STARTED; Host has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;NRPE;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;Nagios XI Jobs;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;HTTP;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;Nagios XI Jobs;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;Nagios XI Daemons;STARTED; Service has entered a period of scheduled downtime
[1554929037] HOST DOWNTIME ALERT: nagiosxi_monitor;STARTED; Host has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;HTTP;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;Ping;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;Ping;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;Nagios XI Daemons;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;Nagios XI Daemons;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;PEBPAD Host Cluster: pebdc01 pebdc02;STARTED; Service has entered a period of scheduled downtime
[1554929037] HOST DOWNTIME ALERT: nagiostest_monitor;STARTED; Host has entered a period of scheduled downtime
[1554929037] HOST DOWNTIME ALERT: nagiostest_monitor;STOPPED; Host has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;Host Cluster: BIP3 BIP4;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;Host Cluster: BIP3 BIP4;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;Ping;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;Load;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;Host Cluster: BIP1 BIP2;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;NRPE;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;I/O Wait;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;I/O Wait;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;Load;STARTED; Service has entered a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;Ping;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;Load;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;Host Cluster: BIP1 BIP2;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;NRPE;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;I/O Wait;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;I/O Wait;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;Load;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;NRPE;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;Nagios XI Jobs;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiosxi_monitor;HTTP;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;Nagios XI Jobs;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;Nagios XI Daemons;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] HOST DOWNTIME ALERT: nagiosxi_monitor;STOPPED; Host has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;PEBPAD Host Cluster: pebdc01 pebdc02;STOPPED; Service has exited from a period of scheduled downtime
[1554929037] SERVICE DOWNTIME ALERT: nagiostest_monitor;HTTP;STOPPED; Service has exited from a period of scheduled downtime


root@nagiosxi-lv> service nagios status
nagios (pid 15269) is running...
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Unable to start Nagios service - nagios.lock missing

Post by ssax »

Looks good now, validate it's working.

What is the output of these commands again:

Code: Select all

ls -l /var/run/nag*
service nagios status
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Unable to start Nagios service - nagios.lock missing

Post by rferebee »

Looks good, any insight as to what we did that fixed it?

root@nagiosxi-lv> ls -l /var/run/nag*
-rw-r--r-- 1 root root 6 Apr 10 13:43 /var/run/nagios.lock
root@nagiosxi-lv> service nagios status
nagios (pid 15269) is running...

For posterity of course! :)
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Unable to start Nagios service - nagios.lock missing

Post by ssax »

The nagios service was not running so when it was checking status, it showed:

Code: Select all

root@nagiosxi-lv> service nagios status
No lock file found in /var/run/nagios.lock
But then we started it:

Code: Select all

root@nagiosxi-lv> service nagios start
Starting nagios: done.
Which now shows:

Code: Select all

root@nagiosxi-lv> service nagios status
nagios (pid 15269) is running...
So for whatever reason your nagios process was stopped (no lock file found so it likely didn't crash) and just starting it allowed it to work again.
Locked