Page 1 of 2
systemctl does not do anything other than start the service
Posted: Fri Jan 12, 2018 8:45 am
by DMKatIBM
New install of Nagios Core (4.3.4) on Red Hat EL 7.4.
The "systemctl start nagios.service" command works fine, and Nagios runs fine when it starts.
The problem is that "systemctl stop nagios.service" and "systemctl restart nagios.service" don't do
anything.
It appears to correctly execute the command, but the service doesn't stop or restart.
I've checked through a lot of posts that I could find on issues with systemctl (because this is not a native service in CentOS/RedHat 7.x), but none of the suggestions have solved my problem. I was hoping someone here may be able to provide me with a solution.
Based on what I've found so far, here is the configuration for the service:
Code: Select all
[root@dal10-build-Nagios system]# find / -name nagios\*service -print
/etc/systemd/system/multi-user.target.wants/nagios.service
/etc/systemd/system/nagios.service
/sys/fs/cgroup/systemd/system.slice/nagios.service
Code: Select all
[root@dal10-build-Nagios system]# ls -l /etc/systemd/system/multi-user.target.wants/nagios.service /etc/systemd/system/nagios.service /sys/fs/cgroup/systemd/system.slice/nagios.service
lrwxrwxrwx 1 root root 34 Jan 9 09:01 /etc/systemd/system/multi-user.target.wants/nagios.service -> /etc/systemd/system/nagios.service
-rw-r--r-- 1 root root 729 Jan 9 09:01 /etc/systemd/system/nagios.service
/sys/fs/cgroup/systemd/system.slice/nagios.service:
total 0
-rw-r--r-- 1 root root 0 Jan 9 10:34 cgroup.clone_children
--w--w--w- 1 root root 0 Jan 9 10:34 cgroup.event_control
-rw-r--r-- 1 root root 0 Jan 9 10:34 cgroup.procs
-rw-r--r-- 1 root root 0 Jan 9 10:34 notify_on_release
-rw-r--r-- 1 root root 0 Jan 9 10:34 tasks
Code: Select all
[root@dal10-build-Nagios system]# cat /etc/systemd/system/nagios.service
# Automatically generated by systemd-sysv-generator
[Unit]
Documentation=man:systemd-sysv-generator(8)
SourcePath=/etc/rc.d/init.d/nagios
Description=LSB: Starts and stops the Nagios monitoring server
Before=runlevel2.target
Before=runlevel3.target
Before=runlevel4.target
Before=runlevel5.target
Before=shutdown.target
After=network-online.target
After=network-online.target
After=nrpe.service
Wants=network-online.target
Conflicts=shutdown.target
[Service]
Type=forking
Restart=no
TimeoutSec=5min
IgnoreSIGPIPE=no
KillMode=process
GuessMainPID=no
RemainAfterExit=yes
ExecStart=/etc/rc.d/init.d/nagios start
ExecStop=/etc/rc.d/init.d/nagios stop
ExecReload=/etc/rc.d/init.d/nagios reload
[Install]
WantedBy=multi-user.target
[root@dal10-build-Nagios system]# systemctl list-unit-files | grep nagios
nagios.service enabled
[root@dal10-build-Nagios system]# ls -l /etc/rc.d/init.d/nagios
-rwxr-xr-x 1 root root 8243 Jan 11 13:41 /etc/rc.d/init.d/nagios
[root@dal10-build-Nagios system]#
Re: systemctl does not do anything other than start the serv
Posted: Fri Jan 12, 2018 2:02 pm
by dwhitfield
What's the output of ps -aef | grep nagios.cfg? It could be there are multiple processes running and thus the stop and restart appear to do nothing. Please put the output in a code block. The "Code" button is the fifth from the left on the post input screen (between Quote and List).
Re: systemctl does not do anything other than start the serv
Posted: Fri Jan 12, 2018 2:16 pm
by DMKatIBM
Code: Select all
[root@dal10-build-Nagios etc]# ps -aef | grep nagios.cfg
nagios 18189 1 0 Jan11 ? 00:04:31 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 18197 18189 0 Jan11 ? 00:00:04 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 26015 13217 0 13:15 pts/0 00:00:00 grep --color=auto nagios.cfg
[root@dal10-build-Nagios etc]#
It's only the single instance (18197 spawned from 18189).
Re: systemctl does not do anything other than start the serv
Posted: Fri Jan 12, 2018 4:33 pm
by dwhitfield
Did you compile this? If so, what instructions did you use? If not, from what repo did you install nagios core.
systemctl stop nagios works just fine on my CentOS 7 box. 4.3.4 was released in August, so I'm pretty sure we'd know if our compile instructions caused this.
What's the output of sestatus?
Re: systemctl does not do anything other than start the serv
Posted: Sat Jan 13, 2018 10:26 am
by DMKatIBM
SELinux is disabled.
So here's the thing...I compiled this when the system was CentOS 7, and everything ran fine there. I put the entire /usr/local/nagios directory into a tarball, and flattened the VM and reinstalled it as Red Hat 7.4. I did recompile it, and ran all the make commands (including make install-init) and installed it the same as it did in CentOS. Then I untarred everything in /usr/local/nagios. Nothing in /usr/local/nagios should have impacted the systemctl files, since that's all in /usr/systemd and-or /etc/systemd.
The symlinks are all correct (that I can see), so I don't get why the service won't stop or restart with it.
Re: systemctl does not do anything other than start the serv
Posted: Sat Jan 13, 2018 8:19 pm
by dwhitfield
What do this two commands do...anything?
Code: Select all
/etc/rc.d/init.d/nagios stop
/etc/rc.d/init.d/nagios reload
Re: systemctl does not do anything other than start the serv
Posted: Mon Jan 15, 2018 7:38 am
by DMKatIBM
The stop command says it will stop it, but it actually doesn't do anything.
Code: Select all
[root@dal10-build-Nagios etc]# /etc/rc.d/init.d/nagios stop
Stopping nagios (via systemctl): [ OK ]
In /var/log/messages I get the following:
Code: Select all
Jan 15 06:35:07 dal10-build-Nagios systemd: Stopping LSB: Starts and stops the Nagios monitoring server...
Jan 15 06:35:07 dal10-build-Nagios nagios: Stopping nagios:kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
Jan 15 06:35:07 dal10-build-Nagios nagios: done.
For the reload, it just flat-out fails (nothing get logged in /var/log/messages).
Code: Select all
[root@dal10-build-Nagios etc]# /etc/rc.d/init.d/nagios reload
Reloading nagios configuration (via systemctl): Job for nagios.service canceled.
[FAILED]
Re: systemctl does not do anything other than start the serv
Posted: Mon Jan 15, 2018 4:29 pm
by tgriep
When nagios is running, did it create a nagios.lock file in this location?
If that file doesn't exist, the /etc/rc.d/init.d/nagios sctipt cannot get the PID number for the running nagios process and it cannot stop it.
The easiest way to fix it is to remove the nagios.service files so the system will go back to just using the init script.
So, delete these files
Code: Select all
/etc/systemd/system/nagios.service
/etc/systemd/system/multi-user.target.wants/nagios.service
Run this to reload the daemon configuration files
Then kill the nagios process by running
Then start / stop / restart nagios by running the collowing to see if they work.
Code: Select all
service nagios start
service nagios stop
service nagios restart
Re: systemctl does not do anything other than start the serv
Posted: Thu Jan 18, 2018 8:01 am
by DMKatIBM
The lock file does exist:
Code: Select all
[root@sjc04-build-Nagios multi-user.target.wants]# ls -l /usr/local/nagios/var/nagios.*
-rw-r--r--. 1 nagios nagios 34 Jan 18 06:55 /usr/local/nagios/var/nagios.configtest
-rw-r--r--. 1 nagios nagios 50076 Jan 2 11:42 /usr/local/nagios/var/nagios.debug
-rw-r--r--. 1 nagios nagios 5 Jan 18 06:55 /usr/local/nagios/var/nagios.lock
-rw-r--r--. 1 nagios nagios 968023 Jan 18 06:55 /usr/local/nagios/var/nagios.log
-rw-rw-r--. 1 nagios nagios 49278 Aug 6 2016 /usr/local/nagios/var/nagios.tmpRdcNCL
[root@sjc04-build-Nagios multi-user.target.wants]#
But it was worth trying what you suggested. So I deleted the files, did a systemctl daemon-reload, and tried again.
Code: Select all
[root@sjc04-build-Nagios multi-user.target.wants]# service nagios stop
Stopping nagios (via systemctl): [ OK ]
[root@sjc04-build-Nagios multi-user.target.wants]#
It says it is stopping it, but it doesn't actually stop. Nothing is logged in /var/log/messages (at all) about the process change, and it remains running:
Code: Select all
[root@sjc04-build-Nagios multi-user.target.wants]# ls -l /usr/local/nagios/var/nagios.*
-rw-r--r--. 1 nagios nagios 34 Jan 18 06:55 /usr/local/nagios/var/nagios.configtest
-rw-r--r--. 1 nagios nagios 50076 Jan 2 11:42 /usr/local/nagios/var/nagios.debug
-rw-r--r--. 1 nagios nagios 5 Jan 18 06:55 /usr/local/nagios/var/nagios.lock
-rw-r--r--. 1 nagios nagios 968023 Jan 18 06:55 /usr/local/nagios/var/nagios.log
-rw-rw-r--. 1 nagios nagios 49278 Aug 6 2016 /usr/local/nagios/var/nagios.tmpRdcNCL
[root@sjc04-build-Nagios multi-user.target.wants]# ps -ef | grep nagios
nagios 4473 1 1 06:55 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 4475 4473 0 06:55 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 4476 4473 0 06:55 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 4477 4473 0 06:55 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 4478 4473 0 06:55 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 4479 4473 0 06:55 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 4480 4473 0 06:55 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 4481 4473 0 06:55 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 4482 4473 0 06:55 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 4483 4473 0 06:55 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 4484 4473 0 06:55 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 4485 4473 0 06:55 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 4486 4473 0 06:55 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 4488 4473 0 06:55 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 4789 4477 0 06:56 ? 00:00:00 /usr/local/nagios/libexec/check_http -S -H 10.164.14.204 -w 10 -c 20 -p 443
root 4791 3379 0 06:56 pts/0 00:00:00 grep --color=auto nagios
[root@sjc04-build-Nagios multi-user.target.wants]#
If I kill it manually, the lock file does get cleaned up:
Code: Select all
[root@sjc04-build-Nagios multi-user.target.wants]# kill 4473
[root@sjc04-build-Nagios multi-user.target.wants]# ls -l /usr/local/nagios/var/nagios.*
-rw-r--r--. 1 nagios nagios 34 Jan 18 06:55 /usr/local/nagios/var/nagios.configtest
-rw-r--r--. 1 nagios nagios 50076 Jan 2 11:42 /usr/local/nagios/var/nagios.debug
-rw-r--r--. 1 nagios nagios 968341 Jan 18 06:56 /usr/local/nagios/var/nagios.log
-rw-rw-r--. 1 nagios nagios 49278 Aug 6 2016 /usr/local/nagios/var/nagios.tmpRdcNCL
[root@sjc04-build-Nagios multi-user.target.wants]#
I highly appreciate your help so far.
Any further thoughts as to what could be going on here?
Re: systemctl does not do anything other than start the serv
Posted: Thu Jan 18, 2018 10:36 am
by tgriep
If you look in the /usr/local/nagios/var/nagios.lock file when Nagios is running, does the PID number match the PID number of the Nagios Daemon?
When you run the following commands as root
Code: Select all
/etc/rc.d/init.d/nagios stop
/etc/rc.d/init.d/nagios start
Does the Nagios daemon stop and start?
If you look in the init script
does the path to the nagios.lock file match where the lock file is created?