nagios.lock does not exist or is a zombie

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
jchaima
Posts: 13
Joined: Fri Jun 08, 2018 10:55 am

nagios.lock does not exist or is a zombie

Post by jchaima »

1.- Linux Distribution and version?
Centos 7 64bit in VmWare
2.- VMware Image or Manual Install of XI?
Manual installation
3.- Are there special configurations on your system
We use ramdisk in addition to Mod_Gearman for load distribution
============

Hi.
Some time ago we have presented problems every time a change in XI is applied.

When applying a change, the nagios service does not start, checking the status of the service (systemctl status nagios), the error "nagios.lock does not exist or is a zombie" is found. After some attempts to apply the changes, the nagios service starts.

We have reviewed the errors related to this problem and applied the recommendations but they have not yielded good results.

On other occasions when applying the changes from CCM, it is indicated that the results have been applied correctly but since ssh the nagios service has not been started.

I attach the system profile
You do not have the required permissions to view the files attached to this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: nagios.lock does not exist or is a zombie

Post by scottwilkerson »

You have database errors that should be fixed first and foremost
https://assets.nagios.com/downloads/nag ... tabase.pdf

Then can you post the output of the following:

Code: Select all

ls -dl /usr/local/nagios
ls -al /usr/local/nagios/var/
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
jchaima
Posts: 13
Joined: Fri Jun 08, 2018 10:55 am

Re: nagios.lock does not exist or is a zombie

Post by jchaima »

Thanks, we have noticed that the "nagios external commands" table gets corrupted frequently.

The result of the commands are the following:

[root@NagiosXI nagios]# ls -dl /usr/local/nagios
drwxr-xr-x 9 root root 94 Jun 1 2017 /usr/local/nagios

[root@NagiosXI nagios]# ls -al /usr/local/nagios/var/
total 10040208
drwxrwxr-x 6 nagios nagios 4096 Jun 12 09:05 .
drwxr-xr-x 9 root root 94 Jun 1 2017 ..
drwxrwxr-x 2 nagios nagios 8192 Dec 13 09:46 archives
-rw-r--r-- 1 nagios nagios 18230 Apr 18 15:14 host-perfdata
-rw-r--r-- 1 nagios nagios 34 Jun 12 09:03 nagios.configtest
-rw-r--r-- 1 nagios nagios 6 Jun 12 09:03 nagios.lock
-rw-r----- 1 nagios nagios 10171685382 Jun 12 09:05 nagios.log
-rw------- 1 nagios nagios 0 Jun 5 09:09 nagios.tmp5whzIR
-rw------- 1 nagios nagios 0 Jun 6 23:24 nagios.tmp6PntDZ
-rw------- 1 nagios nagios 12288 Jun 11 17:07 nagios.tmpASE0HJ
-rw------- 1 nagios nagios 1101824 Apr 13 00:00 nagios.tmpeFF4tg
-rw------- 1 nagios nagios 0 Jun 8 09:46 nagios.tmpjwEk9O
-rw-rw-r-- 1 nagios nagios 7185497 Aug 18 2017 nagios.tmpOKA3a8
-rw------- 1 nagios nagios 10549 May 6 00:30 nagios.tmpVpTIX6
-rw------- 1 nagios nagios 8736768 Sep 25 2017 nagios.tmpXUg6Ru
-rw-r--r-- 1 nagios nagios 7 Jun 11 17:07 ndo2db.lock
-rw-r--r-- 1 nagios nagios 0 Jun 12 00:00 ndomod.tmp
srwxr-xr-x 1 nagios nagios 0 Jun 11 17:07 ndo.sock
-rw-r--r-- 1 nagios nagios 2688963 Jun 12 09:05 npcd.log
-rw-r--r-- 1 nagios nagios 10485769 May 30 18:11 npcd.log.old
-rw-r--r-- 1 nagios nagios 17947854 Apr 18 14:47 objects.cache
-rw-r--r-- 1 nagios nagios 25793615 Jun 12 09:03 objects.precache
-rw-rw-r-- 1 nagios nagios 3657069 Jun 12 09:04 perfdata.log
-rw------- 1 nagios nagios 24490534 Jun 12 09:03 retention.dat
drwxrwsr-x 2 nagios nagcmd 41 Jun 12 09:03 rw
-rw-r--r-- 1 nagios nagios 233737 Apr 18 15:14 service-perfdata
drwxr-xr-x 5 nagios nagios 55 Jun 1 2017 spool
drwxr-xr-x 2 nagios nagios 8192 Jun 12 00:00 stats
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: nagios.lock does not exist or is a zombie

Post by scottwilkerson »

a couple permissions are incorrect, lets run the following

Code: Select all

chown nagios.nagios /usr/local/nagios
chmod g+w /usr/local/nagios/var
But even with that I do see the nagios.lock file present.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
jchaima
Posts: 13
Joined: Fri Jun 08, 2018 10:55 am

Re: nagios.lock does not exist or is a zombie

Post by jchaima »

I applied the change but there were no good results.

The Nagios service starts but then it goes down, after insisting with the restart it keeps starting.

#####
[root@NagiosXI nagios]# /etc/rc.d/init.d/nagios start
Starting nagios (via systemctl): [ OK ]

[root@NagiosXI nagios]# systemctl status nagios
● nagios.service - LSB: Starts and stops the Nagios monitoring server
Loaded: loaded (/etc/rc.d/init.d/nagios; bad; vendor preset: disabled)
Active: active (running) since Tue 2018-06-12 17:00:05 CLT; 2s ago
Docs: man:systemd-sysv-generator(8)
Process: 111413 ExecStop=/etc/rc.d/init.d/nagios stop (code=exited, status=0/SUCCESS)
Process: 111669 ExecStart=/etc/rc.d/init.d/nagios start (code=exited, status=0/SUCCESS)
Main PID: 111698 (nagios)

Jun 12 17:00:04 NagiosXI systemd[1]: Starting LSB: Starts and stops the Nagios monitoring server...
Jun 12 17:00:05 NagiosXI nagios[111669]: Starting nagios: done.
Jun 12 17:00:05 NagiosXI systemd[1]: Started LSB: Starts and stops the Nagios monitoring server.

[root@NagiosXI nagios]# systemctl status nagios
● nagios.service - LSB: Starts and stops the Nagios monitoring server
Loaded: loaded (/etc/rc.d/init.d/nagios; bad; vendor preset: disabled)
Active: failed (Result: exit-code) since Tue 2018-06-12 17:00:09 CLT; 1s ago
Docs: man:systemd-sysv-generator(8)
Process: 111753 ExecStop=/etc/rc.d/init.d/nagios stop (code=exited, status=0/SUCCESS)
Process: 111669 ExecStart=/etc/rc.d/init.d/nagios start (code=exited, status=0/SUCCESS)
Main PID: 111698 (code=exited, status=254)

Jun 12 17:00:04 NagiosXI systemd[1]: Starting LSB: Starts and stops the Nagios monitoring server...
Jun 12 17:00:05 NagiosXI nagios[111669]: Starting nagios: done.
Jun 12 17:00:05 NagiosXI systemd[1]: Started LSB: Starts and stops the Nagios monitoring server.
Jun 12 17:00:09 NagiosXI systemd[1]: nagios.service: main process exited, code=exited, status=254/n/a
Jun 12 17:00:09 NagiosXI nagios[111753]: Stopping nagios:/etc/rc.d/init.d/nagios: line 143: kill: (111698) - No such process
Jun 12 17:00:09 NagiosXI nagios[111753]: done.
Jun 12 17:00:09 NagiosXI systemd[1]: Unit nagios.service entered failed state.
Jun 12 17:00:09 NagiosXI systemd[1]: nagios.service failed.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: nagios.lock does not exist or is a zombie

Post by scottwilkerson »

Please attach your /etc/rc.d/init.d/nagios

Has this ever worked? If so what changes were made before it stopped working?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
jchaima
Posts: 13
Joined: Fri Jun 08, 2018 10:55 am

Re: nagios.lock does not exist or is a zombie

Post by jchaima »

Yes, it has worked well.

We are not sure that it could be the cause, since the beginning it has been config
You do not have the required permissions to view the files attached to this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: nagios.lock does not exist or is a zombie

Post by scottwilkerson »

Well that looks normal. The next time it keeps failing to start after you get it going can you send a copy of /usr/local/nagios/var/nagios.log so we can see if we can see what is causing it to fail.

Might also be a good time to get a snapshot of /var/log/messages too as the error may show up in there as well
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
jchaima
Posts: 13
Joined: Fri Jun 08, 2018 10:55 am

Re: nagios.lock does not exist or is a zombie

Post by jchaima »

Thanks Scott.

I enclose the log records, in which you can see that the nagios service was started on several occasions until you get it started
You do not have the required permissions to view the files attached to this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: nagios.lock does not exist or is a zombie

Post by scottwilkerson »

Based on these lines it appears that systemd thinks nagios is already running when it is trying to be started

Code: Select all

Jun 13 09:48:59 NagiosXI systemd: cgroup /system.slice/nagios.service exists already: File exists
Jun 13 09:48:59 NagiosXI systemd: Failed to realize cgroups for queued unit nagios.service: File exists
Next time this is the case, can we verify if nagios process is fully stopped before running any start commands

Code: Select all

ps -ef|grep nagios.cfg
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked