systemctl does not do anything other than start the service

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
DMKatIBM
Posts: 22
Joined: Thu Jan 11, 2018 3:41 pm

Re: systemctl does not do anything other than start the serv

Post by DMKatIBM »

Getting closer. The lock file in init.d was the wrong spot:

Code: Select all

[root@sjc04-build-Nagios multi-user.target.wants]# grep nagios.lock /etc/rc.d/init.d/nagios
NagiosRunFile=/run/nagios.lock
[root@sjc04-build-Nagios multi-user.target.wants]#
Now it stops correctly:

Code: Select all

[root@sjc04-build-Nagios multi-user.target.wants]# service nagios stop
Stopping nagios (via systemctl):  Warning: nagios.service changed on disk. Run 'systemctl daemon-reload' to reload units.
                                                           [  OK  ]
[root@sjc04-build-Nagios multi-user.target.wants]# ps -ef | grep nagios
root     16911  3379  0 11:09 pts/0    00:00:00 grep --color=auto nagios
[root@sjc04-build-Nagios multi-user.target.wants]#
But it doesn't start:

Code: Select all

[root@sjc04-build-Nagios multi-user.target.wants]# service nagios start
Starting nagios (via systemctl):                           [  OK  ]
You have new mail in /var/spool/mail/root
[root@sjc04-build-Nagios multi-user.target.wants]# ps -ef| grep nagios
root     17074  3379  0 11:11 pts/0    00:00:00 grep --color=auto nagios
[root@sjc04-build-Nagios multi-user.target.wants]#
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: systemctl does not do anything other than start the serv

Post by tgriep »

Do you see any messages in the nagios.log file or the /var/log/messages file?
Be sure to check out our Knowledgebase for helpful articles and solutions!
DMKatIBM
Posts: 22
Joined: Thu Jan 11, 2018 3:41 pm

Re: systemctl does not do anything other than start the serv

Post by DMKatIBM »

It's complaining about the permissions of the lock file:

Code: Select all

Starting nagios (via systemctl):  Jan 19 07:19:02 sjc04-build-Nagios systemd: Starting LSB: Starts and stops the Nagios monitoring server...
Jan 19 07:19:02 sjc04-build-Nagios su: (to nagios) root on none
Jan 19 07:19:02 sjc04-build-Nagios su: (to nagios) root on none
Jan 19 07:19:02 sjc04-build-Nagios nagios: Failed to obtain lock on file /usr/local/nagios/var/nagios.lock: Permission denied
Jan 19 07:19:02 sjc04-build-Nagios nagios: Bailing out due to errors encountered while attempting to daemonize... (PID=9223)
Jan 19 07:19:02 sjc04-build-Nagios nagios: Starting nagios: done.
Jan 19 07:19:02 sjc04-build-Nagios systemd: Started LSB: Starts and stops the Nagios monitoring server.
                                                           [  OK  ]
[root@sjc04-build-Nagios nagios]# ls -l /usr/local/nagios/var/nagios.lock
[b]-rw-r--r--. 1 root root[/b] 0 Jan 19 07:19 /usr/local/nagios/var/nagios.lock
[root@sjc04-build-Nagios nagios]#
It's creating it as root instead of nagios user, when the service starts.

I also notice that the /etc/rc.d/init.d/nagios file contains both a run line and a lock line:

NagiosRunFile=/usr/local/nagios/var/nagios.lock
NagiosLockDir=/var/lock/subsys
NagiosLockFile=nagios

I'm not particularly familiar with these startup files, so I don't know what the purpose of each is.

One thing that I do recall is that when the run file pointed to /run/nagios.lock, that file was created with 0 bytes when the service started, the nagios.lock file in the /usr/local/nagios/var directory was created with the correct permissions (and had the PID), but the service would seem to ignore it when the stop command was used. Should I change the run file back to /run?
DMKatIBM
Posts: 22
Joined: Thu Jan 11, 2018 3:41 pm

Re: systemctl does not do anything other than start the serv

Post by DMKatIBM »

Interesting. So if I change the run file back to /run in init.d/nagios, it starts fine, and creates the lock file with the correct permissions (and has the PID in it):

Code: Select all

[root@sjc04-build-Nagios nagios]# !ls
ls -l /usr/local/nagios/var
total 12060
drwxrwxr-x. 2 nagios nagios   32768 Jan 17 23:59 archives
-rw-r--r--. 1 root   root        56 Nov  3 06:52 mss-console-list
-rw-r--r--. 1 nagios nagios      34 Jan 19 08:04 nagios.configtest
-rw-r--r--. 1 nagios nagios   50076 Jan  2 11:42 nagios.debug
-rw-r--r--. 1 nagios nagios       6 Jan 19 08:04 nagios.lock       <------- permissions are correct   
-rw-r--r--. 1 nagios nagios 1671119 Jan 19 08:04 nagios.log
-rw-rw-r--. 1 nagios nagios   49278 Aug  6  2016 nagios.tmpRdcNCL
-rw-r--r--. 1 nagios nagios       5 Nov 15  2016 nrpe.pid
-rw-r--r--. 1 nagios nagios 1995840 Jan 19 08:04 objects.cache
-rw-r--r--. 1 nagios nagios 1995840 Jan 19 08:04 objects.precache
-rw-------. 1 nagios nagios 3223163 Jan 19 08:04 retention.dat
drwxrwsrwx. 2 nagios apache    4096 Jan 19 08:04 rw
drwxr-xr-x. 3 nagios nagios    4096 Jul 20  2016 spool
-rw-rw-r--. 1 nagios nagios 3261363 Jan 19 08:04 status.dat
[root@sjc04-build-Nagios nagios]# 
But back to the point where it doesn't stop it again (basically back where I started):

Code: Select all

[root@sjc04-build-Nagios nagios]# service nagios stop
Stopping nagios (via systemctl):  Jan 19 08:05:07 sjc04-build-Nagios systemd: Stopping LSB: Starts and stops the Nagios monitoring server...
Jan 19 08:05:07 sjc04-build-Nagios systemd: Stopped LSB: Starts and stops the Nagios monitoring server.
                                                           [  OK  ]
[root@sjc04-build-Nagios nagios]#
[root@sjc04-build-Nagios nagios]# ps -ef | grep Jan 19 08:05:14 sjc04-build-Nagios journal: Suppressed 788 messages from /system.slice/nagios.service
Jan 19 08:05:14 sjc04-build-Nagios check_nrpe: Remote 10.165.57.120 accepted a Version 3 Packet
nagiosJan 19 08:05:15 sjc04-build-Nagios check_nrpe: Remote 10.134.155.155 accepted a Version 3 Packet

nagios   11573     1  0 08:04 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   11575 11573  0 08:04 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   11576 11573  0 08:04 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   11577 11573  0 08:04 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   11578 11573  0 08:04 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   11579 11573  0 08:04 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   11580 11573  0 08:04 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   11581 11573  0 08:04 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   11582 11573  0 08:04 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   11583 11573  0 08:04 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   11584 11573  0 08:04 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   11585 11573  0 08:04 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   11586 11573  0 08:04 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   11587 11573  0 08:04 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
[...]
I have a backgrounded tail -f /var/log/messages running there, so it shows the logs as I attempt the start/stop on the service.

It also looks like the lock file is correctly being put and removed from /var/lock/subsys:

Code: Select all

[root@sjc04-build-Nagios nagios]# service nagios start
Starting nagios (via systemctl):                           [  OK  ]
[root@sjc04-build-Nagios nagios]# ls -l /var/lock/subsys
total 0
-rw-------. 1 root root 0 Jan 18 09:10 BESClient
-rw-r--r--. 1 root root 0 Jan 19 08:15 nagios
-rw-r--r--. 1 root root 0 Jan  9 12:36 network
-rw-r--r--. 1 root root 0 Jan  9 12:36 nimbus
-rw-r-----. 1 root root 0 Jan  9 12:36 rhsmcertd
[root@sjc04-build-Nagios nagios]# service nagios stop
Stopping nagios (via systemctl):                           [  OK  ]
[root@sjc04-build-Nagios nagios]# ls -l /var/lock/subsys
total 0
-rw-------. 1 root root 0 Jan 18 09:10 BESClient
-rw-r--r--. 1 root root 0 Jan  9 12:36 network
-rw-r--r--. 1 root root 0 Jan  9 12:36 nimbus
-rw-r-----. 1 root root 0 Jan  9 12:36 rhsmcertd
[root@sjc04-build-Nagios nagios]#
So the lock file itself gets removed correctly, and the file with the PID in /usr/local/nagios/var is getting correctly placed there, but the stop function seems to ignore it.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: systemctl does not do anything other than start the serv

Post by tgriep »

I think the issue is that the location to the nagios.lock file in the /etc/rc.d/init.d/nagios file is different than the path in the nagios.cfg file.
If they don't match, that could cause the issue.
Also, the permission's for the nagios user and group could be it as well if the nagios user account can't write the lock file to the folder.
Be sure to check out our Knowledgebase for helpful articles and solutions!
DMKatIBM
Posts: 22
Joined: Thu Jan 11, 2018 3:41 pm

Re: systemctl does not do anything other than start the serv

Post by DMKatIBM »

So the nagios.lock file is supposed to contain the pid?

If I change /etc/rc.d/init.d/nagios to point to the /usr/local/nagios/var directory, then the nagios process gets a permission denied error, because the file gets created with root permissions.

The problem seems to be that there are 3 lock files. One of them contains the PID, and I don't know what the others do.

In /etc/rc.d/init.d/nagios, it points to:

NagiosRunFile=/run/nagios.lock
NagiosLockDir=/var/lock/subsys
NagiosLockFile=nagios

And in /usr/local/nagios/etc/nagios.cfg it lists:

lock_file=/usr/local/nagios/var/nagios.lock

So there is now a lock file in /var, one in /var/lock/subsys, and one in /usr/local/nagios/var. Here are the permissions:

Code: Select all

[root@sjc04-build-Nagios nagios]# service nagios start
Starting nagios (via systemctl):                           [  OK  ]
[root@sjc04-build-Nagios nagios]# ls -l /usr/local/nagios/var/nagios.lock
-rw-r--r--. 1 nagios nagios 6 Jan 19 08:57 /usr/local/nagios/var/nagios.lock
[root@sjc04-build-Nagios nagios]# ls -l /run/nagios.lock
-rw-r--r--. 1 root root 0 Jan 19 08:57 /run/nagios.lock
[root@sjc04-build-Nagios nagios]# ls -l /var/lock/subsys/nagios
-rw-r--r--. 1 root root 0 Jan 19 08:57 /var/lock/subsys/nagios
[root@sjc04-build-Nagios nagios]#
The one in subsys is likely to note that the actual process is running, and I'm (possibly incorrectly) assuming that the one in /run and /usr/local/nagios/var are supposed to point to the same thing, and contain the PID?

If so, then there's the permission issue. If I change either of the config files to make the file the same in both, it won't run because the file gets created with root permissions, and the process running as nagios user can't write into it. That or the file is getting created by root, and then the process running as nagios is also trying to create the file, but of course it can't overwrite it because it's owned by root.
DMKatIBM
Posts: 22
Joined: Thu Jan 11, 2018 3:41 pm

Re: systemctl does not do anything other than start the serv

Post by DMKatIBM »

Okay, I don't understand why this worked, so hopefully somebody with more understanding of service files than me can help explain it (because I'd love to know why this happens). Changing the lock file in /usr/local/nagios/etc/nagios.cfg got it working:

lock_file=/run/nagios.lock

Now service nagios start/stop works fine.
In fact I even went one step further, did a "make install-init" and now systemctl start/stop nagios.service also works.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: systemctl does not do anything other than start the serv

Post by tgriep »

The init script uses the pib number from the the lock file to stop it but the pid number gets put there by the nagios process.
The file did not line up so the pid number was invalid and it could not stop it.
Be sure to check out our Knowledgebase for helpful articles and solutions!
inakiuy
Posts: 1
Joined: Mon Apr 02, 2018 8:46 am

Re: systemctl does not do anything other than start the serv

Post by inakiuy »

Hi, this solved my problem
Stop nagios "systemctl stop nagios.service"
Kill nagios proc

Changing the lock file in /usr/local/nagios/etc/nagios.cfg got it working:
lock_file=/run/nagios.lock
Now service nagios start/stop works fine.
I had the same problem. Looking for lock files I found out that I had two files. One located in "/run/nagios.lock" and another in "/usr/local/nagios/var/nagios.lock". When I looked into those files, "/usr/local/nagios/var/nagios.lock" had the PID number but "/run/nagios.lock" was empty. Just for curiosity y did

Code: Select all

cat /usr/local/nagios/var/nagios.lock > /run/nagios.lock
.

Now nagios stopped correctly. So on creation nagios stores his PID on "/usr/local/nagios/var/nagios.lock" but for stopping it tries to read "/run/nagios.lock".

I did a default nagios install. Anyway... now it's working.

Thx for posting and helping!
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: systemctl does not do anything other than start the serv

Post by scottwilkerson »

@inakiuy thanks for sharing on this old thread
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
Locked