We recently updated our nrpe agents to 3.2.1 and it caused some issues with starting/restarting the service. We were able to fix this by using the old service file configurations from 3.2.0. Is this a bug? Please advise.
Oct 16 07:15:48 boy-oraem01.opm.gov systemd[1]: Starting Nagios Remote Program Executor...
Oct 16 07:15:48 boy-oraem01.opm.gov systemd[1]: PID file /var/run/nrpe/nrpe.pid not readable (yet?) after start.
Oct 16 07:15:48 boy-oraem01.opm.gov nrpe[64219]: Starting up daemon
Oct 16 07:15:48 boy-oraem01.opm.gov systemd[1]: nrpe.service never wrote its PID file. Failing.
Oct 16 07:15:48 boy-oraem01.opm.gov systemd[1]: Failed to start Nagios Remote Program Executor.
Oct 16 07:15:48 boy-oraem01.opm.gov systemd[1]: Unit nrpe.service entered failed state.
Oct 16 07:15:48 boy-oraem01.opm.gov systemd[1]: nrpe.service failed.
Job for nrpe.service failed because a configured resource limit was exceeded. See "systemctl status nrpe.service" and "journalctl -xe" for details.
(PRO-BOY|jenglish@boy-oraem01 ~)$ systemctl status nrpe
● nrpe.service - Nagios Remote Program Executor
Loaded: loaded (/usr/lib/systemd/system/nrpe.service; enabled; vendor preset: disabled)
Active: failed (Result: resources) since Tue 2018-10-16 10:45:48 EDT; 21min ago
Docs: http://www.nagios.org/documentation
Process: 54990 ExecStart=/usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d $NRPE_SSL_OPT (code=exited, status=0/SUCCESS)
Main PID: 37390 (code=exited, status=0/SUCCESS)
Oct 16 10:45:48 boy-oraem01.opm.gov systemd[1]: Starting Nagios Remote Program Executor...
Oct 16 10:45:48 boy-oraem01.opm.gov nrpe[54990]: Added command[check_users]=/usr/lib64/nagios/...2$
Oct 16 10:45:48 boy-oraem01.opm.gov nrpe[54990]: Added command[check_load]=/usr/lib64/nagios/p...2$
Oct 16 10:45:48 boy-oraem01.opm.gov nrpe[54990]: Added command[check_disk]=/usr/lib64/nagios/p...3$
Oct 16 10:45:48 boy-oraem01.opm.gov systemd[1]: PID file /var/run/nrpe/nrpe.pid not readable ...rt.
Oct 16 10:45:48 boy-oraem01.opm.gov nrpe[54995]: Starting up daemon
Oct 16 10:45:48 boy-oraem01.opm.gov systemd[1]: nrpe.service never wrote its PID file. Failing.
Oct 16 10:45:48 boy-oraem01.opm.gov systemd[1]: Failed to start Nagios Remote Program Executor.
Oct 16 10:45:48 boy-oraem01.opm.gov systemd[1]: Unit nrpe.service entered failed state.
Oct 16 10:45:48 boy-oraem01.opm.gov systemd[1]: nrpe.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
check for logs since 11:00 - restart service fails - check logs again:
You were not able to start NRPE (with the new config), that's why the PID file was not created. Try switching to the old config, so that you can start NRPE successfully. After this, try finding the PID again. We need to see the location, and permissions of the directory (where the PID is located).
I've never entered a bug before. Which would be most applicable? RedHat or Other?
You need to select:
Product: Fedora EPEL
Version: epel7
Hardware: x86_64 Linux
Be sure to check out our Knowledgebase for helpful articles and solutions!
(PRO-BOY|jenglish@boy-oraem01 ~)$ sudo grep -i nrpe /var/log/messages | grep 12:
Oct 16 12:15:51 boy-oraem01 systemd: PID file /var/run/nrpe/nrpe.pid not readable (yet?) after start.
Oct 16 12:15:51 boy-oraem01 systemd: nrpe.service never wrote its PID file. Failing.
Oct 16 12:15:51 boy-oraem01 systemd: Unit nrpe.service entered failed state.
Oct 16 12:15:51 boy-oraem01 systemd: nrpe.service failed.
Oct 16 12:45:48 boy-oraem01 systemd: PID file /var/run/nrpe/nrpe.pid not readable (yet?) after start.
Oct 16 12:45:48 boy-oraem01 systemd: nrpe.service never wrote its PID file. Failing.
Oct 16 12:45:48 boy-oraem01 systemd: Unit nrpe.service entered failed state.
Oct 16 12:45:48 boy-oraem01 systemd: nrpe.service failed.
I believe I know what happened. The nrpe.cfg usually doesn't get updated on the upgrade of NRPE, so if you had a different path to the nrpe.pid, specified in the nrpe.cfg file, then starting NRPE would fail.
Can you double check what you have in the nrpe.cfg file