Page 1 of 2
nrpe - 3.2.1 service file issues
Posted: Tue Oct 16, 2018 8:01 am
by jenglish
Hello,
We recently updated our nrpe agents to 3.2.1 and it caused some issues with starting/restarting the service. We were able to fix this by using the old service file configurations from 3.2.0. Is this a bug? Please advise.
nrpe 3.2.0 service file:
Code: Select all
[Unit]
Description=Nagios Remote Program Executor
Documentation=http://www.nagios.org/documentation
Conflicts=nrpe.socket
Requires=network.target
[Install]
WantedBy=multi-user.target
[Service]
Type=forking
User=nrpe
Group=nrpe
EnvironmentFile=/etc/sysconfig/nrpe
ExecStart=/usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d $NRPE_SSL_OPT
nrpe 3.2.1 service file:
Code: Select all
[Unit]
Description=Nagios Remote Program Executor
Documentation=http://www.nagios.org/documentation
Conflicts=nrpe.socket
Requires=network-online.target
After=var-run.mount nss-lookup.target network.target local-fs.target time-sync.target
[email protected] xdm.service
[Install]
WantedBy=multi-user.target
[Service]
Type=forking
User=nrpe
Group=nrpe
EnvironmentFile=/etc/sysconfig/nrpe
ExecStart=/usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d $NRPE_SSL_OPT
ExecReload=/bin/kill -HUP $MAINPID
ExecStopPost=/bin/rm -f /var/run/nrpe/nrpe.pid
PIDFile=/var/run/nrpe/nrpe.pid
Log file snip from nrpe 3.2.1:
Code: Select all
Oct 16 07:15:48 boy-oraem01.opm.gov systemd[1]: Starting Nagios Remote Program Executor...
Oct 16 07:15:48 boy-oraem01.opm.gov systemd[1]: PID file /var/run/nrpe/nrpe.pid not readable (yet?) after start.
Oct 16 07:15:48 boy-oraem01.opm.gov nrpe[64219]: Starting up daemon
Oct 16 07:15:48 boy-oraem01.opm.gov systemd[1]: nrpe.service never wrote its PID file. Failing.
Oct 16 07:15:48 boy-oraem01.opm.gov systemd[1]: Failed to start Nagios Remote Program Executor.
Oct 16 07:15:48 boy-oraem01.opm.gov systemd[1]: Unit nrpe.service entered failed state.
Oct 16 07:15:48 boy-oraem01.opm.gov systemd[1]: nrpe.service failed.
Job for nrpe.service failed because a configured resource limit was exceeded. See "systemctl status nrpe.service" and "journalctl -xe" for details.
OS version:
Code: Select all
Red Hat Enterprise Linux Server release 7.5 (Maipo)
3.10.0-862.6.3.el7.x86_64
Thank you!
Jordan
Re: nrpe - 3.2.1 service file issues
Posted: Tue Oct 16, 2018 9:45 am
by lmiltchev
We recently updated our nrpe agents to 3.2.1 and it caused some issues with starting/restarting the service.
Can you describe in details what were the steps you took to upgrade NRPE? What document/guide/tutorial did you follow?
Can you show the output of the following commands (when nrpe fails to start)?
Code: Select all
systemctl status nrpe.service
journalctl -xe
Re: nrpe - 3.2.1 service file issues
Posted: Tue Oct 16, 2018 10:09 am
by jenglish
@lmitchev
We used the EPEL repository to update nrpe using yum. e.g. "yum update rnpe"
Code: Select all
(PRO-BOY|jenglish@boy-oraem01 ~)$ systemctl status nrpe
● nrpe.service - Nagios Remote Program Executor
Loaded: loaded (/usr/lib/systemd/system/nrpe.service; enabled; vendor preset: disabled)
Active: failed (Result: resources) since Tue 2018-10-16 10:45:48 EDT; 21min ago
Docs: http://www.nagios.org/documentation
Process: 54990 ExecStart=/usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d $NRPE_SSL_OPT (code=exited, status=0/SUCCESS)
Main PID: 37390 (code=exited, status=0/SUCCESS)
Oct 16 10:45:48 boy-oraem01.opm.gov systemd[1]: Starting Nagios Remote Program Executor...
Oct 16 10:45:48 boy-oraem01.opm.gov nrpe[54990]: Added command[check_users]=/usr/lib64/nagios/...2$
Oct 16 10:45:48 boy-oraem01.opm.gov nrpe[54990]: Added command[check_load]=/usr/lib64/nagios/p...2$
Oct 16 10:45:48 boy-oraem01.opm.gov nrpe[54990]: Added command[check_disk]=/usr/lib64/nagios/p...3$
Oct 16 10:45:48 boy-oraem01.opm.gov systemd[1]: PID file /var/run/nrpe/nrpe.pid not readable ...rt.
Oct 16 10:45:48 boy-oraem01.opm.gov nrpe[54995]: Starting up daemon
Oct 16 10:45:48 boy-oraem01.opm.gov systemd[1]: nrpe.service never wrote its PID file. Failing.
Oct 16 10:45:48 boy-oraem01.opm.gov systemd[1]: Failed to start Nagios Remote Program Executor.
Oct 16 10:45:48 boy-oraem01.opm.gov systemd[1]: Unit nrpe.service entered failed state.
Oct 16 10:45:48 boy-oraem01.opm.gov systemd[1]: nrpe.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
check for logs since 11:00 - restart service fails - check logs again:
Code: Select all
(PRO-BOY|jenglish@boy-oraem01 ~)$ journalctl -u nrpe -S 11:00
-- No entries --
(PRO-BOY|jenglish@boy-oraem01 ~)$ sudo systemctl restart nrpe
Job for nrpe.service failed because a configured resource limit was exceeded. See "systemctl status nrpe.service" and "journalctl -xe" for details.
(PRO-BOY|jenglish@boy-oraem01 ~)$ journalctl -u nrpe -S 11:00
-- Logs begin at Sat 2018-10-13 04:00:00 EDT, end at Tue 2018-10-16 11:08:33 EDT. --
Oct 16 11:08:33 boy-oraem01.opm.gov systemd[1]: Starting Nagios Remote Program Executor...
Oct 16 11:08:33 boy-oraem01.opm.gov nrpe[123230]: Added command[check_users]=/usr/lib64/nagios/plug
Oct 16 11:08:33 boy-oraem01.opm.gov nrpe[123230]: Added command[check_load]=/usr/lib64/nagios/plugi
Oct 16 11:08:33 boy-oraem01.opm.gov nrpe[123230]: Added command[check_disk]=/usr/lib64/nagios/plugi
Oct 16 11:08:33 boy-oraem01.opm.gov nrpe[123230]: Added command[check_temp]=/usr/lib64/nagios/plugi
Oct 16 11:08:33 boy-oraem01.opm.gov nrpe[123230]: Added command[check_procs]=/usr/lib64/nagios/plug
Oct 16 11:08:33 boy-oraem01.opm.gov nrpe[123230]: Added command[check_lock_age]=/usr/lib64/nagios/p
Oct 16 11:08:33 boy-oraem01.opm.gov nrpe[123230]: Added command[check_ntp_time]=/usr/lib64/nagios/p
Oct 16 11:08:33 boy-oraem01.opm.gov nrpe[123230]: Added command[check_file_age]=sudo /usr/lib64/nag
Oct 16 11:08:33 boy-oraem01.opm.gov nrpe[123230]: Added command[check_init]=/usr/lib64/nagios/plugi
Oct 16 11:08:33 boy-oraem01.opm.gov nrpe[123230]: Added command[check_swap]=/usr/lib64/nagios/plugi
Oct 16 11:08:33 boy-oraem01.opm.gov nrpe[123230]: Added command[check_generic]=/usr/lib64/nagios/pl
Oct 16 11:08:33 boy-oraem01.opm.gov nrpe[123230]: Added command[check_tcp]=/usr/lib64/nagios/plugin
Oct 16 11:08:33 boy-oraem01.opm.gov systemd[1]: PID file /var/run/nrpe/nrpe.pid not readable (yet?)
Oct 16 11:08:33 boy-oraem01.opm.gov systemd[1]: nrpe.service never wrote its PID file. Failing.
Oct 16 11:08:33 boy-oraem01.opm.gov systemd[1]: Failed to start Nagios Remote Program Executor.
Oct 16 11:08:33 boy-oraem01.opm.gov systemd[1]: Unit nrpe.service entered failed state.
Oct 16 11:08:33 boy-oraem01.opm.gov systemd[1]: nrpe.service failed.
Re: nrpe - 3.2.1 service file issues
Posted: Tue Oct 16, 2018 10:30 am
by lmiltchev
We were able to fix this by using the old service file configurations from 3.2.0. Is this a bug? Please advise.
It's possible that this is a bug. Our developers will be looking into this. Please report the issue here:
https://bugzilla.redhat.com/
Oct 16 10:45:48 boy-oraem01.opm.gov systemd[1]: nrpe.service never wrote its PID file. Failing.
Where is nrpe.pid located on your system? Is it in the "/var/run/nrpe" directory?
Re: nrpe - 3.2.1 service file issues
Posted: Tue Oct 16, 2018 11:01 am
by jenglish
No PID created:
Code: Select all
(PRO-BOY|jenglish@boy-oraem01 ~)$ sudo find / -name nrpe.pid
(PRO-BOY|jenglish@boy-oraem01 ~)$
I've never entered a bug before. Which would be most applicable? RedHat or Other?
Capture.PNG
Re: nrpe - 3.2.1 service file issues
Posted: Tue Oct 16, 2018 11:11 am
by lmiltchev
No PID created:
You were not able to start NRPE (with the new config), that's why the PID file was not created. Try switching to the old config, so that you can start NRPE successfully. After this, try finding the PID again. We need to see the location, and permissions of the directory (where the PID is located).
I've never entered a bug before. Which would be most applicable? RedHat or Other?
You need to select:
Product: Fedora EPEL
Version: epel7
Hardware: x86_64 Linux
Re: nrpe - 3.2.1 service file issues
Posted: Tue Oct 16, 2018 11:31 am
by jenglish
No PID file here either:
Code: Select all
(PRO-BOY|jenglish@boy-adams2 ~)$ uname -r ; cat /etc/redhat-release ; sudo rpm -qa | grep nrpe
3.10.0-862.6.3.el7.x86_64
Red Hat Enterprise Linux Server release 7.5 (Maipo)
nagios-plugins-nrpe-3.2.1-6.el7.x86_64
nrpe-3.2.1-6.el7.x86_64
(PRO-BOY|jenglish@boy-adams2 ~)$ sudo systemctl cat nrpe
# /usr/lib/systemd/system/nrpe.service
[Unit]
Description=Nagios Remote Program Executor
Documentation=http://www.nagios.org/documentation
Conflicts=nrpe.socket
Requires=network.target
[Install]
WantedBy=multi-user.target
[Service]
Type=forking
User=nrpe
Group=nrpe
EnvironmentFile=/etc/sysconfig/nrpe
ExecStart=/usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d $NRPE_SSL_OPT
(PRO-BOY|jenglish@boy-adams2 ~)$ sudo systemctl restart nrpe
(PRO-BOY|jenglish@boy-adams2 ~)$ sudo find / -name nrpe.pid
(PRO-BOY|jenglish@boy-adams2 ~)$
(PRO-BOY|jenglish@boy-adams2 ~)$ sudo systemctl is-active nrpe
active
(PRO-BOY|jenglish@boy-adams2 ~)$ ps aux | grep nrpe
nrpe 1227 0.0 0.0 44884 1440 ? Ss 12:28 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
jenglish 1310 0.0 0.0 112704 980 pts/0 S+ 12:29 0:00 grep --color=auto nrpe
Re: nrpe - 3.2.1 service file issues
Posted: Tue Oct 16, 2018 11:57 am
by lmiltchev
Do you see any nrpe related errors in the /var/log/messages?
Can you show the output of the following command?
Re: nrpe - 3.2.1 service file issues
Posted: Tue Oct 16, 2018 12:04 pm
by jenglish
I see more info in journald, but here is the output from messages:
Code: Select all
(PRO-BOY|jenglish@boy-oraem01 ~)$ sudo grep -i nrpe /var/log/messages | grep 12:
Oct 16 12:15:51 boy-oraem01 systemd: PID file /var/run/nrpe/nrpe.pid not readable (yet?) after start.
Oct 16 12:15:51 boy-oraem01 systemd: nrpe.service never wrote its PID file. Failing.
Oct 16 12:15:51 boy-oraem01 systemd: Unit nrpe.service entered failed state.
Oct 16 12:15:51 boy-oraem01 systemd: nrpe.service failed.
Oct 16 12:45:48 boy-oraem01 systemd: PID file /var/run/nrpe/nrpe.pid not readable (yet?) after start.
Oct 16 12:45:48 boy-oraem01 systemd: nrpe.service never wrote its PID file. Failing.
Oct 16 12:45:48 boy-oraem01 systemd: Unit nrpe.service entered failed state.
Oct 16 12:45:48 boy-oraem01 systemd: nrpe.service failed.
output:
Code: Select all
(PRO-BOY|jenglish@boy-oraem01 ~)$ ls -lad /var/run/nrpe/
drwxrwxr-x. 2 nrpe nrpe 40 Jul 24 18:37 /var/run/nrpe/
Re: nrpe - 3.2.1 service file issues
Posted: Tue Oct 16, 2018 1:36 pm
by lmiltchev
I believe I know what happened. The nrpe.cfg usually doesn't get updated on the upgrade of NRPE, so if you had a different path to the nrpe.pid, specified in the nrpe.cfg file, then starting NRPE would fail.
Can you double check what you have in the
nrpe.cfg file
and make sure it matches the paths in the init file:
Code: Select all
ExecStopPost=/bin/rm -f /var/run/nrpe/nrpe.pid
PIDFile=/var/run/nrpe/nrpe.pid
Once you make these identical, and reload the config, NRPE should start fine (with the "new" config).
Let us know if this resolved your issue.