[Reload] Job for nagios.service invalid

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
jfarnsworth
Posts: 11
Joined: Fri Oct 19, 2018 10:16 am

[Reload] Job for nagios.service invalid

Post by jfarnsworth »

Hi,

So the VM used to run this nagios instance recently ran out of space on the disk, around the same time, the "service nagios reload" command started to fail with the message :

"Reloading nagios configuration (via systemctl): Job for nagios.service invalid.
[FAILED]"

The nagios.cmd file under "/usr/local/nagios/var/rw/" also disappeared, and would not come back on "service nagios restart". There are no config errors popping up, and the disk has been expanded. I followed instructions on this post to get the nagios.cmd file to come back, but the reload command still fails.
Any idea whats going on?
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: [Reload] Job for nagios.service invalid

Post by tgriep »

Lets run a verification on the nagios configuration files to see if there are any errors.
Run this as root

Code: Select all

/usr/local/nagios/bin/nagios -v /usr/local/nagios/nagios.cfg
Also, check the /usr/local/nagios/var/nagios.log file for any errors when you try to restart it.
Be sure to check out our Knowledgebase for helpful articles and solutions!
jfarnsworth
Posts: 11
Joined: Fri Oct 19, 2018 10:16 am

Re: [Reload] Job for nagios.service invalid

Post by jfarnsworth »

The nagios.cfg file is under a different location, so my command looks like:

Code: Select all

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
And has the output:

Code: Select all

Nagios Core 4.3.4
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2017-08-24
License: GPL

Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
        Checked 3284 services.
        Checked 347 hosts.
        Checked 84 host groups.
        Checked 85 service groups.
        Checked 11 contacts.
        Checked 0 contact groups.
        Checked 119 commands.
        Checked 181 time periods.
        Checked 0 host escalations.
        Checked 0 service escalations.
Checking for circular paths...
        Checked 347 hosts
        Checked 0 service dependencies
        Checked 0 host dependencies
        Checked 181 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check
This instance was working fine before, but does the nagios.cfg file need to be under "/usr/local/nagios/" instead?

Also, I've checked the log file, only thing I see that might be relevant is this:
[1540305250] wproc: Core Worker 61148: job 355 (pid=64529) timed out. Killing it
[1540305250] wproc: CHECK job 355 from worker Core Worker 61148 timed out after 30.01s
[1540305250] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1540305250] wproc: Core Worker 61148: job 355 (pid=64529): Dormant child reaped

Everything else has "SERVICE NOTIFICATION" prefixing the log.
Last edited by jfarnsworth on Fri Oct 26, 2018 10:33 am, edited 1 time in total.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: [Reload] Job for nagios.service invalid

Post by tgriep »

The path to the nagios.cfg file I gave is the default location but if it is changed on your server, that is OK if all of the init scripts were changed to match.

Search for the nagios.service file under the /etc folder and see if it's settings match where the nagios binary and configuration files are located on your server.
If they do, that is OK.

Those messages from the nagios.log file do not show why it is not starting.

You may want to check the /var/log/messages file or the /var/log/stslog file to see if there are any errors but we would need to know how nagios was installed on the server.
The servers operating system and release version.

You can try and start the nagios daemon manually to see if there are any errors.
Run this as root.

Code: Select all

/usr/local/nagios/bin/nagios -d /usr/local/nagios/nagios.cfg
Let us know what you find out.
Be sure to check out our Knowledgebase for helpful articles and solutions!
jfarnsworth
Posts: 11
Joined: Fri Oct 19, 2018 10:16 am

Re: [Reload] Job for nagios.service invalid

Post by jfarnsworth »

From /var/log/messages after trying to reload following a stop/start:

Code: Select all

Oct 26 11:17:49 nagios-dca-45 nagios: Running configuration check...
Oct 26 11:17:49 nagios-xx su: (to nagios) root on none
Oct 26 11:17:49 nagios-xx systemd: Created slice User Slice of nagios.
Oct 26 11:17:49 nagios-xx systemd: Starting User Slice of nagios.
Oct 26 11:17:49 nagios-xx systemd: Started Session c71 of user nagios.
Oct 26 11:17:49 nagios-xx systemd: Starting Session c71 of user nagios.
Oct 26 11:17:49 nagios-xx systemd: Removed slice User Slice of nagios.
Oct 26 11:17:49 nagios-xx systemd: Stopping User Slice of nagios.
Oct 26 11:17:49 nagios-xx systemd: Stopped LSB: Starts and stops the Nagios monitoring server.
Oct 26 11:17:49 nagios-xx nagios: Stopping nagios (via systemctl):
From /var/log/messages after attempting to reload once:

Code: Select all

Oct 26 11:22:12 nagios-dca-45 systemd: Unit nagios.service cannot be reloaded because it is inactive.
The nagios service is under /etc/rc.d/init.d/nagios, but it points to the right files.

The service was installed following this guide for a CentOS 7 VM.

Release: CentOS Linux release 7.5.1804 (Core)
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: [Reload] Job for nagios.service invalid

Post by tgriep »

Can you get the following files from the server and post them here?

Code: Select all

/usr/local/nagios/etc/nagios.cfg
/etc/rc.d/init.d/nagios
Run this as root and post the output here.

Code: Select all

ps -ef --cols=300
systemctl |grep -i nagios
The entry from the messages file suggests that the server is trying to use the nagios.service file and it is not enabled on the server.
Run this to enable it

Code: Select all

systemctl enable nagios.service

Then run this to stop nagios if it is running

Code: Select all

systemctl stop nagios.service
Run this to start it

Code: Select all

systemctl start nagios.service
Run this to check the status of nagios and post the output.

Code: Select all

systemctl status nagios.service
If the status output says that it is not running, get this file and post it so we can view it.

Code: Select all

/usr/local/nagios/var/nagios.log
Be sure to check out our Knowledgebase for helpful articles and solutions!
jfarnsworth
Posts: 11
Joined: Fri Oct 19, 2018 10:16 am

Re: [Reload] Job for nagios.service invalid

Post by jfarnsworth »

Files attached
PS command output attached as file

Code: Select all

systemctl |grep -i nagios
nagios.service                                                                                   loaded active running   LSB: Starts and stops the Nagios monitoring server
After enabling and resetting the service

Code: Select all

systemctl status nagios.service
● nagios.service - LSB: Starts and stops the Nagios monitoring server
   Loaded: loaded (/etc/rc.d/init.d/nagios; bad; vendor preset: disabled)
   Active: active (running) since Tue 2018-10-30 10:40:12 EDT; 4s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 91893 ExecReload=/etc/rc.d/init.d/nagios reload (code=killed, signal=TERM)
  Process: 92262 ExecStart=/etc/rc.d/init.d/nagios start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/nagios.service
           ├─ 91997 /usr/local/nagios/libexec/check_http -H 10.3.4.76 -p 8080 -t 30 -u /oms/monitor/appheartbeat.jsp -r PLT is responding. -w 5 -c 25
           ├─ 92182 /usr/local/nagios/libexec/check_ping -H 10.1.1.70 -w 3000.0,80% -c 5000.0,100% -p 5
           ├─ 92183 /bin/ping -n -U -W 30 -c 5 10.1.1.70
           ├─ 92220 /bin/sh /usr/local/nagios/custom-plugins/check_uptime.sh 10.1.1.70 not4public 0 0
           ├─ 92223 /bin/sh /usr/local/nagios/custom-plugins/check_uptime.sh 10.1.1.70 not4public 0 0
           ├─ 92224 /usr/local/nagios/libexec/check_snmp -H 10.1.1.70 -C not4public -t 20 -o .1.3.6.1.2.1.25.1.1.0 -w 0 -c 0
           ├─ 92225 cut -d   -f 1-55555
           ├─ 92226 /bin/snmpget -Le -t 5 -r 5 -m -v 1 -c 10.1.1.70:161 .1.3.6.1.2.1.25.1.1.0
           ├─ 92314 /usr/local/nagios/libexec/check_ping -H 10.3.6.65 -w 3000.0,80% -c 5000.0,100% -p 5
           ├─ 92315 /bin/ping -n -U -W 30 -c 5 10.3.6.65
           ├─ 92321 /usr/bin/perl -w /usr/local/nagios/custom-plugins/check_snmp_load.pl -H 10.3.9.49 -t 60 -C not4public -w 95 -c 99
           ├─ 92322 /usr/local/nagios/libexec/check_ping -H 10.3.6.54 -w 3000.0,80% -c 5000.0,100% -p 5
           ├─ 92323 /bin/ping -n -U -W 30 -c 5 10.3.6.54
           ├─ 92327 /usr/bin/perl -w /usr/local/nagios/custom-plugins/check_snmp_mem.pl -H 10.1.1.70 -C not4public -2 -w 95,0 -c 98,0
           ├─ 92373 /usr/local/nagios/libexec/check_ping -H 10.3.5.79 -w 3000.0,80% -c 5000.0,100% -p 5
           ├─ 92374 /bin/ping -n -U -W 30 -c 5 10.3.5.79
           ├─ 92377 /usr/local/nagios/libexec/check_http -H 10.3.5.2 -p 80 -t 30 -u /oms/monitor/appheartbeat.jsp -r PLT is responding. -w 5 -c 25
           ├─125570 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
           ├─125572 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
           ├─125573 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
           ├─125574 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
           ├─125575 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
           └─125579 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

Oct 30 10:40:11 nagios-dca-45.elogex.com systemd[1]: Starting LSB: Starts and stops the Nagios monitoring server...
Oct 30 10:40:12 nagios-dca-45.elogex.com su[92266]: (to nagios) root on none
Oct 30 10:40:12 nagios-dca-45.elogex.com su[92283]: (to nagios) root on none
Oct 30 10:40:12 nagios-dca-45.elogex.com nagios[92262]: Starting nagios: done.
Oct 30 10:40:12 nagios-dca-45.elogex.com systemd[1]: Started LSB: Starts and stops the Nagios monitoring server.
The process doesn't bring back the nagios.cmd file under "/usr/local/nagios/var/rw/", and the reload command still fails. Followed the process for restoring the nagios.cmd file, and repeated the steps you provided, but unfortunately still nothing.

Nothing obvious in the logs, just a bunch of SERVICE NOTIFICATIONS
Attachments
ps.txt
(12.16 KiB) Downloaded 234 times
ps.txt
(12.16 KiB) Downloaded 316 times
nagios.cfg
(44.16 KiB) Downloaded 356 times
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: [Reload] Job for nagios.service invalid

Post by tgriep »

The ps -ef command shows that the Nagios process is running on the server and I see checks running.

When you run the "reload" command, what is failing on the server?

Have you tired to just reboot the server to see if starts to function?
Be sure to check out our Knowledgebase for helpful articles and solutions!
jfarnsworth
Posts: 11
Joined: Fri Oct 19, 2018 10:16 am

Re: [Reload] Job for nagios.service invalid

Post by jfarnsworth »

I'm not sure what's failing, all I get is the message:

Code: Select all

Reloading nagios configuration (via systemctl):  Job for nagios.service invalid.
                                                           [FAILED]
I have tried rebooting the VM, it doesn't seem to change anything
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: [Reload] Job for nagios.service invalid

Post by tgriep »

Can you get this file from the Nagios server and post it here so we can view it?

Code: Select all

/etc/rc.d/init.d/nagios
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked