[Reload] Job for nagios.service invalid
-
- Posts: 11
- Joined: Fri Oct 19, 2018 10:16 am
[Reload] Job for nagios.service invalid
Hi,
So the VM used to run this nagios instance recently ran out of space on the disk, around the same time, the "service nagios reload" command started to fail with the message :
"Reloading nagios configuration (via systemctl): Job for nagios.service invalid.
[FAILED]"
The nagios.cmd file under "/usr/local/nagios/var/rw/" also disappeared, and would not come back on "service nagios restart". There are no config errors popping up, and the disk has been expanded. I followed instructions on this post to get the nagios.cmd file to come back, but the reload command still fails.
Any idea whats going on?
So the VM used to run this nagios instance recently ran out of space on the disk, around the same time, the "service nagios reload" command started to fail with the message :
"Reloading nagios configuration (via systemctl): Job for nagios.service invalid.
[FAILED]"
The nagios.cmd file under "/usr/local/nagios/var/rw/" also disappeared, and would not come back on "service nagios restart". There are no config errors popping up, and the disk has been expanded. I followed instructions on this post to get the nagios.cmd file to come back, but the reload command still fails.
Any idea whats going on?
Re: [Reload] Job for nagios.service invalid
Lets run a verification on the nagios configuration files to see if there are any errors.
Run this as root
Also, check the /usr/local/nagios/var/nagios.log file for any errors when you try to restart it.
Run this as root
Code: Select all
/usr/local/nagios/bin/nagios -v /usr/local/nagios/nagios.cfg
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
- Posts: 11
- Joined: Fri Oct 19, 2018 10:16 am
Re: [Reload] Job for nagios.service invalid
The nagios.cfg file is under a different location, so my command looks like:
And has the output:
This instance was working fine before, but does the nagios.cfg file need to be under "/usr/local/nagios/" instead?
Also, I've checked the log file, only thing I see that might be relevant is this:
[1540305250] wproc: Core Worker 61148: job 355 (pid=64529) timed out. Killing it
[1540305250] wproc: CHECK job 355 from worker Core Worker 61148 timed out after 30.01s
[1540305250] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1540305250] wproc: Core Worker 61148: job 355 (pid=64529): Dormant child reaped
Everything else has "SERVICE NOTIFICATION" prefixing the log.
Code: Select all
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Code: Select all
Nagios Core 4.3.4
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2017-08-24
License: GPL
Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
Read object config files okay...
Running pre-flight check on configuration data...
Checking objects...
Checked 3284 services.
Checked 347 hosts.
Checked 84 host groups.
Checked 85 service groups.
Checked 11 contacts.
Checked 0 contact groups.
Checked 119 commands.
Checked 181 time periods.
Checked 0 host escalations.
Checked 0 service escalations.
Checking for circular paths...
Checked 347 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 181 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
Also, I've checked the log file, only thing I see that might be relevant is this:
[1540305250] wproc: Core Worker 61148: job 355 (pid=64529) timed out. Killing it
[1540305250] wproc: CHECK job 355 from worker Core Worker 61148 timed out after 30.01s
[1540305250] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1540305250] wproc: Core Worker 61148: job 355 (pid=64529): Dormant child reaped
Everything else has "SERVICE NOTIFICATION" prefixing the log.
Last edited by jfarnsworth on Fri Oct 26, 2018 10:33 am, edited 1 time in total.
Re: [Reload] Job for nagios.service invalid
The path to the nagios.cfg file I gave is the default location but if it is changed on your server, that is OK if all of the init scripts were changed to match.
Search for the nagios.service file under the /etc folder and see if it's settings match where the nagios binary and configuration files are located on your server.
If they do, that is OK.
Those messages from the nagios.log file do not show why it is not starting.
You may want to check the /var/log/messages file or the /var/log/stslog file to see if there are any errors but we would need to know how nagios was installed on the server.
The servers operating system and release version.
You can try and start the nagios daemon manually to see if there are any errors.
Run this as root.
Let us know what you find out.
Search for the nagios.service file under the /etc folder and see if it's settings match where the nagios binary and configuration files are located on your server.
If they do, that is OK.
Those messages from the nagios.log file do not show why it is not starting.
You may want to check the /var/log/messages file or the /var/log/stslog file to see if there are any errors but we would need to know how nagios was installed on the server.
The servers operating system and release version.
You can try and start the nagios daemon manually to see if there are any errors.
Run this as root.
Code: Select all
/usr/local/nagios/bin/nagios -d /usr/local/nagios/nagios.cfg
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
- Posts: 11
- Joined: Fri Oct 19, 2018 10:16 am
Re: [Reload] Job for nagios.service invalid
From /var/log/messages after trying to reload following a stop/start:
From /var/log/messages after attempting to reload once:
The nagios service is under /etc/rc.d/init.d/nagios, but it points to the right files.
The service was installed following this guide for a CentOS 7 VM.
Release: CentOS Linux release 7.5.1804 (Core)
Code: Select all
Oct 26 11:17:49 nagios-dca-45 nagios: Running configuration check...
Oct 26 11:17:49 nagios-xx su: (to nagios) root on none
Oct 26 11:17:49 nagios-xx systemd: Created slice User Slice of nagios.
Oct 26 11:17:49 nagios-xx systemd: Starting User Slice of nagios.
Oct 26 11:17:49 nagios-xx systemd: Started Session c71 of user nagios.
Oct 26 11:17:49 nagios-xx systemd: Starting Session c71 of user nagios.
Oct 26 11:17:49 nagios-xx systemd: Removed slice User Slice of nagios.
Oct 26 11:17:49 nagios-xx systemd: Stopping User Slice of nagios.
Oct 26 11:17:49 nagios-xx systemd: Stopped LSB: Starts and stops the Nagios monitoring server.
Oct 26 11:17:49 nagios-xx nagios: Stopping nagios (via systemctl):
Code: Select all
Oct 26 11:22:12 nagios-dca-45 systemd: Unit nagios.service cannot be reloaded because it is inactive.
The service was installed following this guide for a CentOS 7 VM.
Release: CentOS Linux release 7.5.1804 (Core)
Re: [Reload] Job for nagios.service invalid
Can you get the following files from the server and post them here?
Run this as root and post the output here.
The entry from the messages file suggests that the server is trying to use the nagios.service file and it is not enabled on the server.
Run this to enable it
Then run this to stop nagios if it is running
Run this to start it
Run this to check the status of nagios and post the output.
If the status output says that it is not running, get this file and post it so we can view it.
Code: Select all
/usr/local/nagios/etc/nagios.cfg
/etc/rc.d/init.d/nagios
Code: Select all
ps -ef --cols=300
systemctl |grep -i nagios
Run this to enable it
Code: Select all
systemctl enable nagios.service
Then run this to stop nagios if it is running
Code: Select all
systemctl stop nagios.service
Code: Select all
systemctl start nagios.service
Code: Select all
systemctl status nagios.service
Code: Select all
/usr/local/nagios/var/nagios.log
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
- Posts: 11
- Joined: Fri Oct 19, 2018 10:16 am
Re: [Reload] Job for nagios.service invalid
Files attached
PS command output attached as file
After enabling and resetting the service
The process doesn't bring back the nagios.cmd file under "/usr/local/nagios/var/rw/", and the reload command still fails. Followed the process for restoring the nagios.cmd file, and repeated the steps you provided, but unfortunately still nothing.
Nothing obvious in the logs, just a bunch of SERVICE NOTIFICATIONS
PS command output attached as file
Code: Select all
systemctl |grep -i nagios
nagios.service loaded active running LSB: Starts and stops the Nagios monitoring server
Code: Select all
systemctl status nagios.service
● nagios.service - LSB: Starts and stops the Nagios monitoring server
Loaded: loaded (/etc/rc.d/init.d/nagios; bad; vendor preset: disabled)
Active: active (running) since Tue 2018-10-30 10:40:12 EDT; 4s ago
Docs: man:systemd-sysv-generator(8)
Process: 91893 ExecReload=/etc/rc.d/init.d/nagios reload (code=killed, signal=TERM)
Process: 92262 ExecStart=/etc/rc.d/init.d/nagios start (code=exited, status=0/SUCCESS)
CGroup: /system.slice/nagios.service
├─ 91997 /usr/local/nagios/libexec/check_http -H 10.3.4.76 -p 8080 -t 30 -u /oms/monitor/appheartbeat.jsp -r PLT is responding. -w 5 -c 25
├─ 92182 /usr/local/nagios/libexec/check_ping -H 10.1.1.70 -w 3000.0,80% -c 5000.0,100% -p 5
├─ 92183 /bin/ping -n -U -W 30 -c 5 10.1.1.70
├─ 92220 /bin/sh /usr/local/nagios/custom-plugins/check_uptime.sh 10.1.1.70 not4public 0 0
├─ 92223 /bin/sh /usr/local/nagios/custom-plugins/check_uptime.sh 10.1.1.70 not4public 0 0
├─ 92224 /usr/local/nagios/libexec/check_snmp -H 10.1.1.70 -C not4public -t 20 -o .1.3.6.1.2.1.25.1.1.0 -w 0 -c 0
├─ 92225 cut -d -f 1-55555
├─ 92226 /bin/snmpget -Le -t 5 -r 5 -m -v 1 -c 10.1.1.70:161 .1.3.6.1.2.1.25.1.1.0
├─ 92314 /usr/local/nagios/libexec/check_ping -H 10.3.6.65 -w 3000.0,80% -c 5000.0,100% -p 5
├─ 92315 /bin/ping -n -U -W 30 -c 5 10.3.6.65
├─ 92321 /usr/bin/perl -w /usr/local/nagios/custom-plugins/check_snmp_load.pl -H 10.3.9.49 -t 60 -C not4public -w 95 -c 99
├─ 92322 /usr/local/nagios/libexec/check_ping -H 10.3.6.54 -w 3000.0,80% -c 5000.0,100% -p 5
├─ 92323 /bin/ping -n -U -W 30 -c 5 10.3.6.54
├─ 92327 /usr/bin/perl -w /usr/local/nagios/custom-plugins/check_snmp_mem.pl -H 10.1.1.70 -C not4public -2 -w 95,0 -c 98,0
├─ 92373 /usr/local/nagios/libexec/check_ping -H 10.3.5.79 -w 3000.0,80% -c 5000.0,100% -p 5
├─ 92374 /bin/ping -n -U -W 30 -c 5 10.3.5.79
├─ 92377 /usr/local/nagios/libexec/check_http -H 10.3.5.2 -p 80 -t 30 -u /oms/monitor/appheartbeat.jsp -r PLT is responding. -w 5 -c 25
├─125570 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
├─125572 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─125573 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─125574 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─125575 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
└─125579 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
Oct 30 10:40:11 nagios-dca-45.elogex.com systemd[1]: Starting LSB: Starts and stops the Nagios monitoring server...
Oct 30 10:40:12 nagios-dca-45.elogex.com su[92266]: (to nagios) root on none
Oct 30 10:40:12 nagios-dca-45.elogex.com su[92283]: (to nagios) root on none
Oct 30 10:40:12 nagios-dca-45.elogex.com nagios[92262]: Starting nagios: done.
Oct 30 10:40:12 nagios-dca-45.elogex.com systemd[1]: Started LSB: Starts and stops the Nagios monitoring server.
Nothing obvious in the logs, just a bunch of SERVICE NOTIFICATIONS
- Attachments
-
- ps.txt
- (12.16 KiB) Downloaded 238 times
-
- ps.txt
- (12.16 KiB) Downloaded 321 times
-
- nagios.cfg
- (44.16 KiB) Downloaded 361 times
Re: [Reload] Job for nagios.service invalid
The ps -ef command shows that the Nagios process is running on the server and I see checks running.
When you run the "reload" command, what is failing on the server?
Have you tired to just reboot the server to see if starts to function?
When you run the "reload" command, what is failing on the server?
Have you tired to just reboot the server to see if starts to function?
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
- Posts: 11
- Joined: Fri Oct 19, 2018 10:16 am
Re: [Reload] Job for nagios.service invalid
I'm not sure what's failing, all I get is the message:
I have tried rebooting the VM, it doesn't seem to change anything
Code: Select all
Reloading nagios configuration (via systemctl): Job for nagios.service invalid.
[FAILED]
Re: [Reload] Job for nagios.service invalid
Can you get this file from the Nagios server and post it here so we can view it?
Code: Select all
/etc/rc.d/init.d/nagios
Be sure to check out our Knowledgebase for helpful articles and solutions!