Page 2 of 3
Re: nagios.lock does not exist or is a zombie
Posted: Thu Jun 14, 2018 12:51 pm
by jchaima
Hi Scott.
Your comment gave us the idea to check if there was a process that was restarted in service and we found an "event-handler" that rebooted ndo. We have deactivated it and now we are observing Nagios for a week.
We will inform by this means if this was the cause of the problem and then continue or close the ticking
Thank you.
Re: nagios.lock does not exist or is a zombie
Posted: Thu Jun 14, 2018 1:38 pm
by scottwilkerson
Thanks for reaching back out!
Re: nagios.lock does not exist or is a zombie
Posted: Mon Jun 18, 2018 12:18 pm
by jchaima
Hi Scott.
After a few days of observation, the problem still presents itself.
When applying Nagios changes, but the service is dropped again, no duplicate services are seen. The restart of the service when applying changes looks good, but for some reason it stops after starting.
Re: nagios.lock does not exist or is a zombie
Posted: Mon Jun 18, 2018 1:00 pm
by scottwilkerson
Could you attach another copy of /var/log/messages after it happens again.
Also, before restarting please send the output of
Re: nagios.lock does not exist or is a zombie
Posted: Mon Jun 18, 2018 3:28 pm
by jchaima
I attach the log and the result of the commands
Before restarting Nagios:
[root@NagiosXI ~]# ps -ef|grep nagios.cfg
root 51377 77351 0 16:22 pts/4 00:00:00 grep --color=auto nagios.cfg
nagios 112207 1 15 16:20 ? 00:00:11 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 112290 112207 0 16:21 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
After restarting Nagios (there are no errors in the configuration of objects):
[root@NagiosXI ~]#
[root@NagiosXI ~]# ps -ef|grep nagios.cfg
root 111920 23678 0 16:20 pts/0 00:00:00 grep --color=auto nagios.cfg
[root@NagiosXI ~]# ps -ef|grep nagios.cfg
root 128679 23678 0 16:23 pts/0 00:00:00 grep --color=auto nagios.cfg
root 128815 23678 0 16:23 pts/0 00:00:00 grep --color=auto nagios.cfg
[root@NagiosXI ~]# systemctl status nagios
● nagios.service - LSB: Starts and stops the Nagios monitoring server
Loaded: loaded (/etc/rc.d/init.d/nagios; bad; vendor preset: disabled)
Active: failed (Result: exit-code) since Mon 2018-06-18 16:23:32 CLT; 27s ago
Docs: man:systemd-sysv-generator(8)
Process: 128555 ExecStop=/etc/rc.d/init.d/nagios stop (code=exited, status=0/SUCCESS)
Process: 128370 ExecStart=/etc/rc.d/init.d/nagios start (code=exited, status=0/SUCCESS)
Main PID: 128440 (code=exited, status=254)
Jun 18 16:23:22 NagiosXI systemd[1]: Starting LSB: Starts and stops the Nagios monitoring server...
Jun 18 16:23:24 NagiosXI nagios[128370]: Starting nagios: done.
Jun 18 16:23:24 NagiosXI systemd[1]: Started LSB: Starts and stops the Nagios monitoring server.
Jun 18 16:23:32 NagiosXI systemd[1]: nagios.service: main process exited, code=exited, status=254/n/a
Jun 18 16:23:32 NagiosXI nagios[128555]: Stopping nagios:/etc/rc.d/init.d/nagios: line 143: kill: (128440) - No such process
Jun 18 16:23:32 NagiosXI nagios[128555]: done.
Jun 18 16:23:32 NagiosXI systemd[1]: Unit nagios.service entered failed state.
Jun 18 16:23:32 NagiosXI systemd[1]: nagios.service failed.
After executing "systemctl start nagios" several times:
[root@NagiosXI ~]# ps -ef|grep nagios.cfg
root 90448 23678 0 16:26 pts/0 00:00:00 grep --color=auto nagios.cfg
nagios 129243 1 11 16:24 ? 00:00:16 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 129286 129243 0 16:24 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
Re: nagios.lock does not exist or is a zombie
Posted: Mon Jun 18, 2018 3:56 pm
by scottwilkerson
Did someone in your org make custom systemd configs for nagios?
It is erroring on this still
Code: Select all
Jun 18 16:24:18 NagiosXI systemd: cgroup /system.slice/nagios.service exists already: File exists
But I've never seen this in a Nagios XI installation, and cannot replicate it.
Also, when you say this:
jchaima wrote:After executing "systemctl start nagios" several times:
are you verifying it is not running between attempts? because the system thinks the slice file is still there
Re: nagios.lock does not exist or is a zombie
Posted: Tue Jun 19, 2018 11:44 am
by jchaima
Hi Scott.
For both comments we have checked if the nagios process is running before restarting.
I have reviewed systemd and it is not seen that it has been customized.
It draws attention, Nagios works well as long as the changes are not applied.
Re: nagios.lock does not exist or is a zombie
Posted: Tue Jun 19, 2018 11:58 am
by scottwilkerson
Could you share the systemd files. Also, what version of Nagios core are you running?
Code: Select all
/usr/local/nagios/bin/nagios --help |head -5
It is also possible that something isn't getting cleaned up correctly with mod_gearman initialized. would it be possible to see if you have the same issue if you comment out the mod_gearman line from the nagios.cfg?
Re: nagios.lock does not exist or is a zombie
Posted: Tue Jun 19, 2018 3:45 pm
by jchaima
I have commented the line, we will keep it under observation.
Would there be a malfunction of mod_gearman if this line is commented?
The version of the core is the following:
[root@NagiosXI /]# /usr/local/nagios/bin/nagios --help |head -5
Nagios Core 4.2.4
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 12-07-2016
[root@NagiosXI /]#
Re: nagios.lock does not exist or is a zombie
Posted: Tue Jun 19, 2018 4:30 pm
by scottwilkerson
With the line commented out, it would not use mod_gearman just the workers in Core itself.
If it works with the line commented, it could be mod_gearman malfunctioning.