nagios.lock does not exist or is a zombie
nagios.lock does not exist or is a zombie
1.- Linux Distribution and version?
Centos 7 64bit in VmWare
2.- VMware Image or Manual Install of XI?
Manual installation
3.- Are there special configurations on your system
We use ramdisk in addition to Mod_Gearman for load distribution
============
Hi.
Some time ago we have presented problems every time a change in XI is applied.
When applying a change, the nagios service does not start, checking the status of the service (systemctl status nagios), the error "nagios.lock does not exist or is a zombie" is found. After some attempts to apply the changes, the nagios service starts.
We have reviewed the errors related to this problem and applied the recommendations but they have not yielded good results.
On other occasions when applying the changes from CCM, it is indicated that the results have been applied correctly but since ssh the nagios service has not been started.
I attach the system profile
Centos 7 64bit in VmWare
2.- VMware Image or Manual Install of XI?
Manual installation
3.- Are there special configurations on your system
We use ramdisk in addition to Mod_Gearman for load distribution
============
Hi.
Some time ago we have presented problems every time a change in XI is applied.
When applying a change, the nagios service does not start, checking the status of the service (systemctl status nagios), the error "nagios.lock does not exist or is a zombie" is found. After some attempts to apply the changes, the nagios service starts.
We have reviewed the errors related to this problem and applied the recommendations but they have not yielded good results.
On other occasions when applying the changes from CCM, it is indicated that the results have been applied correctly but since ssh the nagios service has not been started.
I attach the system profile
You do not have the required permissions to view the files attached to this post.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: nagios.lock does not exist or is a zombie
You have database errors that should be fixed first and foremost
https://assets.nagios.com/downloads/nag ... tabase.pdf
Then can you post the output of the following:
https://assets.nagios.com/downloads/nag ... tabase.pdf
Then can you post the output of the following:
Code: Select all
ls -dl /usr/local/nagios
ls -al /usr/local/nagios/var/Re: nagios.lock does not exist or is a zombie
Thanks, we have noticed that the "nagios external commands" table gets corrupted frequently.
The result of the commands are the following:
[root@NagiosXI nagios]# ls -dl /usr/local/nagios
drwxr-xr-x 9 root root 94 Jun 1 2017 /usr/local/nagios
[root@NagiosXI nagios]# ls -al /usr/local/nagios/var/
total 10040208
drwxrwxr-x 6 nagios nagios 4096 Jun 12 09:05 .
drwxr-xr-x 9 root root 94 Jun 1 2017 ..
drwxrwxr-x 2 nagios nagios 8192 Dec 13 09:46 archives
-rw-r--r-- 1 nagios nagios 18230 Apr 18 15:14 host-perfdata
-rw-r--r-- 1 nagios nagios 34 Jun 12 09:03 nagios.configtest
-rw-r--r-- 1 nagios nagios 6 Jun 12 09:03 nagios.lock
-rw-r----- 1 nagios nagios 10171685382 Jun 12 09:05 nagios.log
-rw------- 1 nagios nagios 0 Jun 5 09:09 nagios.tmp5whzIR
-rw------- 1 nagios nagios 0 Jun 6 23:24 nagios.tmp6PntDZ
-rw------- 1 nagios nagios 12288 Jun 11 17:07 nagios.tmpASE0HJ
-rw------- 1 nagios nagios 1101824 Apr 13 00:00 nagios.tmpeFF4tg
-rw------- 1 nagios nagios 0 Jun 8 09:46 nagios.tmpjwEk9O
-rw-rw-r-- 1 nagios nagios 7185497 Aug 18 2017 nagios.tmpOKA3a8
-rw------- 1 nagios nagios 10549 May 6 00:30 nagios.tmpVpTIX6
-rw------- 1 nagios nagios 8736768 Sep 25 2017 nagios.tmpXUg6Ru
-rw-r--r-- 1 nagios nagios 7 Jun 11 17:07 ndo2db.lock
-rw-r--r-- 1 nagios nagios 0 Jun 12 00:00 ndomod.tmp
srwxr-xr-x 1 nagios nagios 0 Jun 11 17:07 ndo.sock
-rw-r--r-- 1 nagios nagios 2688963 Jun 12 09:05 npcd.log
-rw-r--r-- 1 nagios nagios 10485769 May 30 18:11 npcd.log.old
-rw-r--r-- 1 nagios nagios 17947854 Apr 18 14:47 objects.cache
-rw-r--r-- 1 nagios nagios 25793615 Jun 12 09:03 objects.precache
-rw-rw-r-- 1 nagios nagios 3657069 Jun 12 09:04 perfdata.log
-rw------- 1 nagios nagios 24490534 Jun 12 09:03 retention.dat
drwxrwsr-x 2 nagios nagcmd 41 Jun 12 09:03 rw
-rw-r--r-- 1 nagios nagios 233737 Apr 18 15:14 service-perfdata
drwxr-xr-x 5 nagios nagios 55 Jun 1 2017 spool
drwxr-xr-x 2 nagios nagios 8192 Jun 12 00:00 stats
The result of the commands are the following:
[root@NagiosXI nagios]# ls -dl /usr/local/nagios
drwxr-xr-x 9 root root 94 Jun 1 2017 /usr/local/nagios
[root@NagiosXI nagios]# ls -al /usr/local/nagios/var/
total 10040208
drwxrwxr-x 6 nagios nagios 4096 Jun 12 09:05 .
drwxr-xr-x 9 root root 94 Jun 1 2017 ..
drwxrwxr-x 2 nagios nagios 8192 Dec 13 09:46 archives
-rw-r--r-- 1 nagios nagios 18230 Apr 18 15:14 host-perfdata
-rw-r--r-- 1 nagios nagios 34 Jun 12 09:03 nagios.configtest
-rw-r--r-- 1 nagios nagios 6 Jun 12 09:03 nagios.lock
-rw-r----- 1 nagios nagios 10171685382 Jun 12 09:05 nagios.log
-rw------- 1 nagios nagios 0 Jun 5 09:09 nagios.tmp5whzIR
-rw------- 1 nagios nagios 0 Jun 6 23:24 nagios.tmp6PntDZ
-rw------- 1 nagios nagios 12288 Jun 11 17:07 nagios.tmpASE0HJ
-rw------- 1 nagios nagios 1101824 Apr 13 00:00 nagios.tmpeFF4tg
-rw------- 1 nagios nagios 0 Jun 8 09:46 nagios.tmpjwEk9O
-rw-rw-r-- 1 nagios nagios 7185497 Aug 18 2017 nagios.tmpOKA3a8
-rw------- 1 nagios nagios 10549 May 6 00:30 nagios.tmpVpTIX6
-rw------- 1 nagios nagios 8736768 Sep 25 2017 nagios.tmpXUg6Ru
-rw-r--r-- 1 nagios nagios 7 Jun 11 17:07 ndo2db.lock
-rw-r--r-- 1 nagios nagios 0 Jun 12 00:00 ndomod.tmp
srwxr-xr-x 1 nagios nagios 0 Jun 11 17:07 ndo.sock
-rw-r--r-- 1 nagios nagios 2688963 Jun 12 09:05 npcd.log
-rw-r--r-- 1 nagios nagios 10485769 May 30 18:11 npcd.log.old
-rw-r--r-- 1 nagios nagios 17947854 Apr 18 14:47 objects.cache
-rw-r--r-- 1 nagios nagios 25793615 Jun 12 09:03 objects.precache
-rw-rw-r-- 1 nagios nagios 3657069 Jun 12 09:04 perfdata.log
-rw------- 1 nagios nagios 24490534 Jun 12 09:03 retention.dat
drwxrwsr-x 2 nagios nagcmd 41 Jun 12 09:03 rw
-rw-r--r-- 1 nagios nagios 233737 Apr 18 15:14 service-perfdata
drwxr-xr-x 5 nagios nagios 55 Jun 1 2017 spool
drwxr-xr-x 2 nagios nagios 8192 Jun 12 00:00 stats
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: nagios.lock does not exist or is a zombie
a couple permissions are incorrect, lets run the following
But even with that I do see the nagios.lock file present.
Code: Select all
chown nagios.nagios /usr/local/nagios
chmod g+w /usr/local/nagios/varRe: nagios.lock does not exist or is a zombie
I applied the change but there were no good results.
The Nagios service starts but then it goes down, after insisting with the restart it keeps starting.
#####
[root@NagiosXI nagios]# /etc/rc.d/init.d/nagios start
Starting nagios (via systemctl): [ OK ]
[root@NagiosXI nagios]# systemctl status nagios
● nagios.service - LSB: Starts and stops the Nagios monitoring server
Loaded: loaded (/etc/rc.d/init.d/nagios; bad; vendor preset: disabled)
Active: active (running) since Tue 2018-06-12 17:00:05 CLT; 2s ago
Docs: man:systemd-sysv-generator(8)
Process: 111413 ExecStop=/etc/rc.d/init.d/nagios stop (code=exited, status=0/SUCCESS)
Process: 111669 ExecStart=/etc/rc.d/init.d/nagios start (code=exited, status=0/SUCCESS)
Main PID: 111698 (nagios)
Jun 12 17:00:04 NagiosXI systemd[1]: Starting LSB: Starts and stops the Nagios monitoring server...
Jun 12 17:00:05 NagiosXI nagios[111669]: Starting nagios: done.
Jun 12 17:00:05 NagiosXI systemd[1]: Started LSB: Starts and stops the Nagios monitoring server.
[root@NagiosXI nagios]# systemctl status nagios
● nagios.service - LSB: Starts and stops the Nagios monitoring server
Loaded: loaded (/etc/rc.d/init.d/nagios; bad; vendor preset: disabled)
Active: failed (Result: exit-code) since Tue 2018-06-12 17:00:09 CLT; 1s ago
Docs: man:systemd-sysv-generator(8)
Process: 111753 ExecStop=/etc/rc.d/init.d/nagios stop (code=exited, status=0/SUCCESS)
Process: 111669 ExecStart=/etc/rc.d/init.d/nagios start (code=exited, status=0/SUCCESS)
Main PID: 111698 (code=exited, status=254)
Jun 12 17:00:04 NagiosXI systemd[1]: Starting LSB: Starts and stops the Nagios monitoring server...
Jun 12 17:00:05 NagiosXI nagios[111669]: Starting nagios: done.
Jun 12 17:00:05 NagiosXI systemd[1]: Started LSB: Starts and stops the Nagios monitoring server.
Jun 12 17:00:09 NagiosXI systemd[1]: nagios.service: main process exited, code=exited, status=254/n/a
Jun 12 17:00:09 NagiosXI nagios[111753]: Stopping nagios:/etc/rc.d/init.d/nagios: line 143: kill: (111698) - No such process
Jun 12 17:00:09 NagiosXI nagios[111753]: done.
Jun 12 17:00:09 NagiosXI systemd[1]: Unit nagios.service entered failed state.
Jun 12 17:00:09 NagiosXI systemd[1]: nagios.service failed.
The Nagios service starts but then it goes down, after insisting with the restart it keeps starting.
#####
[root@NagiosXI nagios]# /etc/rc.d/init.d/nagios start
Starting nagios (via systemctl): [ OK ]
[root@NagiosXI nagios]# systemctl status nagios
● nagios.service - LSB: Starts and stops the Nagios monitoring server
Loaded: loaded (/etc/rc.d/init.d/nagios; bad; vendor preset: disabled)
Active: active (running) since Tue 2018-06-12 17:00:05 CLT; 2s ago
Docs: man:systemd-sysv-generator(8)
Process: 111413 ExecStop=/etc/rc.d/init.d/nagios stop (code=exited, status=0/SUCCESS)
Process: 111669 ExecStart=/etc/rc.d/init.d/nagios start (code=exited, status=0/SUCCESS)
Main PID: 111698 (nagios)
Jun 12 17:00:04 NagiosXI systemd[1]: Starting LSB: Starts and stops the Nagios monitoring server...
Jun 12 17:00:05 NagiosXI nagios[111669]: Starting nagios: done.
Jun 12 17:00:05 NagiosXI systemd[1]: Started LSB: Starts and stops the Nagios monitoring server.
[root@NagiosXI nagios]# systemctl status nagios
● nagios.service - LSB: Starts and stops the Nagios monitoring server
Loaded: loaded (/etc/rc.d/init.d/nagios; bad; vendor preset: disabled)
Active: failed (Result: exit-code) since Tue 2018-06-12 17:00:09 CLT; 1s ago
Docs: man:systemd-sysv-generator(8)
Process: 111753 ExecStop=/etc/rc.d/init.d/nagios stop (code=exited, status=0/SUCCESS)
Process: 111669 ExecStart=/etc/rc.d/init.d/nagios start (code=exited, status=0/SUCCESS)
Main PID: 111698 (code=exited, status=254)
Jun 12 17:00:04 NagiosXI systemd[1]: Starting LSB: Starts and stops the Nagios monitoring server...
Jun 12 17:00:05 NagiosXI nagios[111669]: Starting nagios: done.
Jun 12 17:00:05 NagiosXI systemd[1]: Started LSB: Starts and stops the Nagios monitoring server.
Jun 12 17:00:09 NagiosXI systemd[1]: nagios.service: main process exited, code=exited, status=254/n/a
Jun 12 17:00:09 NagiosXI nagios[111753]: Stopping nagios:/etc/rc.d/init.d/nagios: line 143: kill: (111698) - No such process
Jun 12 17:00:09 NagiosXI nagios[111753]: done.
Jun 12 17:00:09 NagiosXI systemd[1]: Unit nagios.service entered failed state.
Jun 12 17:00:09 NagiosXI systemd[1]: nagios.service failed.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: nagios.lock does not exist or is a zombie
Please attach your /etc/rc.d/init.d/nagios
Has this ever worked? If so what changes were made before it stopped working?
Has this ever worked? If so what changes were made before it stopped working?
Re: nagios.lock does not exist or is a zombie
Yes, it has worked well.
We are not sure that it could be the cause, since the beginning it has been config
We are not sure that it could be the cause, since the beginning it has been config
You do not have the required permissions to view the files attached to this post.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: nagios.lock does not exist or is a zombie
Well that looks normal. The next time it keeps failing to start after you get it going can you send a copy of /usr/local/nagios/var/nagios.log so we can see if we can see what is causing it to fail.
Might also be a good time to get a snapshot of /var/log/messages too as the error may show up in there as well
Might also be a good time to get a snapshot of /var/log/messages too as the error may show up in there as well
Re: nagios.lock does not exist or is a zombie
Thanks Scott.
I enclose the log records, in which you can see that the nagios service was started on several occasions until you get it started
I enclose the log records, in which you can see that the nagios service was started on several occasions until you get it started
You do not have the required permissions to view the files attached to this post.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: nagios.lock does not exist or is a zombie
Based on these lines it appears that systemd thinks nagios is already running when it is trying to be started
Next time this is the case, can we verify if nagios process is fully stopped before running any start commands
Code: Select all
Jun 13 09:48:59 NagiosXI systemd: cgroup /system.slice/nagios.service exists already: File exists
Jun 13 09:48:59 NagiosXI systemd: Failed to realize cgroups for queued unit nagios.service: File existsCode: Select all
ps -ef|grep nagios.cfg