[Reload] Job for nagios.service invalid

An open discussion forum for obtaining help with Nagios Core. Nagios Core users of all experience levels are welcome here. Subforum have been created for the discussion of Nagios Core and Nagios Plugin development.

NOTE: The SourceForge.net mailing lists have been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

[Reload] Job for nagios.service invalid

Postby jfarnsworth » Fri Oct 19, 2018 10:29 am

Hi,

So the VM used to run this nagios instance recently ran out of space on the disk, around the same time, the "service nagios reload" command started to fail with the message :

"Reloading nagios configuration (via systemctl): Job for nagios.service invalid.
[FAILED]"

The nagios.cmd file under "/usr/local/nagios/var/rw/" also disappeared, and would not come back on "service nagios restart". There are no config errors popping up, and the disk has been expanded. I followed instructions on this post to get the nagios.cmd file to come back, but the reload command still fails.
Any idea whats going on?
jfarnsworth
 
Posts: 7
Joined: Fri Oct 19, 2018 10:16 am

Re: [Reload] Job for nagios.service invalid

Postby tgriep » Fri Oct 19, 2018 3:01 pm

Lets run a verification on the nagios configuration files to see if there are any errors.
Run this as root
Code: Select all
/usr/local/nagios/bin/nagios -v /usr/local/nagios/nagios.cfg


Also, check the /usr/local/nagios/var/nagios.log file for any errors when you try to restart it.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
 
Posts: 7237
Joined: Thu Oct 30, 2014 9:02 am

Re: [Reload] Job for nagios.service invalid

Postby jfarnsworth » Tue Oct 23, 2018 9:39 am

The nagios.cfg file is under a different location, so my command looks like:
Code: Select all
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg


And has the output:
Code: Select all
Nagios Core 4.3.4
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2017-08-24
License: GPL

Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
        Checked 3284 services.
        Checked 347 hosts.
        Checked 84 host groups.
        Checked 85 service groups.
        Checked 11 contacts.
        Checked 0 contact groups.
        Checked 119 commands.
        Checked 181 time periods.
        Checked 0 host escalations.
        Checked 0 service escalations.
Checking for circular paths...
        Checked 347 hosts
        Checked 0 service dependencies
        Checked 0 host dependencies
        Checked 181 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check


This instance was working fine before, but does the nagios.cfg file need to be under "/usr/local/nagios/" instead?

Also, I've checked the log file, only thing I see that might be relevant is this:
[1540305250] wproc: Core Worker 61148: job 355 (pid=64529) timed out. Killing it
[1540305250] wproc: CHECK job 355 from worker Core Worker 61148 timed out after 30.01s
[1540305250] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1540305250] wproc: Core Worker 61148: job 355 (pid=64529): Dormant child reaped

Everything else has "SERVICE NOTIFICATION" prefixing the log.
Last edited by jfarnsworth on Fri Oct 26, 2018 10:33 am, edited 1 time in total.
jfarnsworth
 
Posts: 7
Joined: Fri Oct 19, 2018 10:16 am

Re: [Reload] Job for nagios.service invalid

Postby tgriep » Tue Oct 23, 2018 3:00 pm

The path to the nagios.cfg file I gave is the default location but if it is changed on your server, that is OK if all of the init scripts were changed to match.

Search for the nagios.service file under the /etc folder and see if it's settings match where the nagios binary and configuration files are located on your server.
If they do, that is OK.

Those messages from the nagios.log file do not show why it is not starting.

You may want to check the /var/log/messages file or the /var/log/stslog file to see if there are any errors but we would need to know how nagios was installed on the server.
The servers operating system and release version.

You can try and start the nagios daemon manually to see if there are any errors.
Run this as root.
Code: Select all
/usr/local/nagios/bin/nagios -d /usr/local/nagios/nagios.cfg


Let us know what you find out.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
 
Posts: 7237
Joined: Thu Oct 30, 2014 9:02 am

Re: [Reload] Job for nagios.service invalid

Postby jfarnsworth » Fri Oct 26, 2018 10:28 am

From /var/log/messages after trying to reload following a stop/start:
Code: Select all
Oct 26 11:17:49 nagios-dca-45 nagios: Running configuration check...
Oct 26 11:17:49 nagios-xx su: (to nagios) root on none
Oct 26 11:17:49 nagios-xx systemd: Created slice User Slice of nagios.
Oct 26 11:17:49 nagios-xx systemd: Starting User Slice of nagios.
Oct 26 11:17:49 nagios-xx systemd: Started Session c71 of user nagios.
Oct 26 11:17:49 nagios-xx systemd: Starting Session c71 of user nagios.
Oct 26 11:17:49 nagios-xx systemd: Removed slice User Slice of nagios.
Oct 26 11:17:49 nagios-xx systemd: Stopping User Slice of nagios.
Oct 26 11:17:49 nagios-xx systemd: Stopped LSB: Starts and stops the Nagios monitoring server.
Oct 26 11:17:49 nagios-xx nagios: Stopping nagios (via systemctl):


From /var/log/messages after attempting to reload once:
Code: Select all
Oct 26 11:22:12 nagios-dca-45 systemd: Unit nagios.service cannot be reloaded because it is inactive.


The nagios service is under /etc/rc.d/init.d/nagios, but it points to the right files.

The service was installed following this guide for a CentOS 7 VM.

Release: CentOS Linux release 7.5.1804 (Core)
jfarnsworth
 
Posts: 7
Joined: Fri Oct 19, 2018 10:16 am

Re: [Reload] Job for nagios.service invalid

Postby tgriep » Fri Oct 26, 2018 12:58 pm

Can you get the following files from the server and post them here?
Code: Select all
/usr/local/nagios/etc/nagios.cfg
/etc/rc.d/init.d/nagios


Run this as root and post the output here.
Code: Select all
ps -ef --cols=300
systemctl |grep -i nagios


The entry from the messages file suggests that the server is trying to use the nagios.service file and it is not enabled on the server.
Run this to enable it
Code: Select all
systemctl enable nagios.service



Then run this to stop nagios if it is running
Code: Select all
systemctl stop nagios.service


Run this to start it
Code: Select all
systemctl start nagios.service


Run this to check the status of nagios and post the output.
Code: Select all
systemctl status nagios.service


If the status output says that it is not running, get this file and post it so we can view it.
Code: Select all
/usr/local/nagios/var/nagios.log
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
 
Posts: 7237
Joined: Thu Oct 30, 2014 9:02 am

Re: [Reload] Job for nagios.service invalid

Postby jfarnsworth » Tue Oct 30, 2018 9:56 am

Files attached
PS command output attached as file

Code: Select all
systemctl |grep -i nagios
nagios.service                                                                                   loaded active running   LSB: Starts and stops the Nagios monitoring server


After enabling and resetting the service
Code: Select all
systemctl status nagios.service
● nagios.service - LSB: Starts and stops the Nagios monitoring server
   Loaded: loaded (/etc/rc.d/init.d/nagios; bad; vendor preset: disabled)
   Active: active (running) since Tue 2018-10-30 10:40:12 EDT; 4s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 91893 ExecReload=/etc/rc.d/init.d/nagios reload (code=killed, signal=TERM)
  Process: 92262 ExecStart=/etc/rc.d/init.d/nagios start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/nagios.service
           ├─ 91997 /usr/local/nagios/libexec/check_http -H 10.3.4.76 -p 8080 -t 30 -u /oms/monitor/appheartbeat.jsp -r PLT is responding. -w 5 -c 25
           ├─ 92182 /usr/local/nagios/libexec/check_ping -H 10.1.1.70 -w 3000.0,80% -c 5000.0,100% -p 5
           ├─ 92183 /bin/ping -n -U -W 30 -c 5 10.1.1.70
           ├─ 92220 /bin/sh /usr/local/nagios/custom-plugins/check_uptime.sh 10.1.1.70 not4public 0 0
           ├─ 92223 /bin/sh /usr/local/nagios/custom-plugins/check_uptime.sh 10.1.1.70 not4public 0 0
           ├─ 92224 /usr/local/nagios/libexec/check_snmp -H 10.1.1.70 -C not4public -t 20 -o .1.3.6.1.2.1.25.1.1.0 -w 0 -c 0
           ├─ 92225 cut -d   -f 1-55555
           ├─ 92226 /bin/snmpget -Le -t 5 -r 5 -m -v 1 -c 10.1.1.70:161 .1.3.6.1.2.1.25.1.1.0
           ├─ 92314 /usr/local/nagios/libexec/check_ping -H 10.3.6.65 -w 3000.0,80% -c 5000.0,100% -p 5
           ├─ 92315 /bin/ping -n -U -W 30 -c 5 10.3.6.65
           ├─ 92321 /usr/bin/perl -w /usr/local/nagios/custom-plugins/check_snmp_load.pl -H 10.3.9.49 -t 60 -C not4public -w 95 -c 99
           ├─ 92322 /usr/local/nagios/libexec/check_ping -H 10.3.6.54 -w 3000.0,80% -c 5000.0,100% -p 5
           ├─ 92323 /bin/ping -n -U -W 30 -c 5 10.3.6.54
           ├─ 92327 /usr/bin/perl -w /usr/local/nagios/custom-plugins/check_snmp_mem.pl -H 10.1.1.70 -C not4public -2 -w 95,0 -c 98,0
           ├─ 92373 /usr/local/nagios/libexec/check_ping -H 10.3.5.79 -w 3000.0,80% -c 5000.0,100% -p 5
           ├─ 92374 /bin/ping -n -U -W 30 -c 5 10.3.5.79
           ├─ 92377 /usr/local/nagios/libexec/check_http -H 10.3.5.2 -p 80 -t 30 -u /oms/monitor/appheartbeat.jsp -r PLT is responding. -w 5 -c 25
           ├─125570 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
           ├─125572 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
           ├─125573 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
           ├─125574 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
           ├─125575 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
           └─125579 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

Oct 30 10:40:11 nagios-dca-45.elogex.com systemd[1]: Starting LSB: Starts and stops the Nagios monitoring server...
Oct 30 10:40:12 nagios-dca-45.elogex.com su[92266]: (to nagios) root on none
Oct 30 10:40:12 nagios-dca-45.elogex.com su[92283]: (to nagios) root on none
Oct 30 10:40:12 nagios-dca-45.elogex.com nagios[92262]: Starting nagios: done.
Oct 30 10:40:12 nagios-dca-45.elogex.com systemd[1]: Started LSB: Starts and stops the Nagios monitoring server.


The process doesn't bring back the nagios.cmd file under "/usr/local/nagios/var/rw/", and the reload command still fails. Followed the process for restoring the nagios.cmd file, and repeated the steps you provided, but unfortunately still nothing.

Nothing obvious in the logs, just a bunch of SERVICE NOTIFICATIONS
Attachments
ps.txt
(12.16 KiB) Downloaded 10 times
ps.txt
(12.16 KiB) Downloaded 13 times
nagios.cfg
(44.16 KiB) Downloaded 12 times
jfarnsworth
 
Posts: 7
Joined: Fri Oct 19, 2018 10:16 am

Re: [Reload] Job for nagios.service invalid

Postby tgriep » Tue Oct 30, 2018 1:24 pm

The ps -ef command shows that the Nagios process is running on the server and I see checks running.

When you run the "reload" command, what is failing on the server?

Have you tired to just reboot the server to see if starts to function?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
 
Posts: 7237
Joined: Thu Oct 30, 2014 9:02 am

Re: [Reload] Job for nagios.service invalid

Postby jfarnsworth » Thu Nov 08, 2018 10:45 am

I'm not sure what's failing, all I get is the message:
Code: Select all
Reloading nagios configuration (via systemctl):  Job for nagios.service invalid.
                                                           [FAILED]


I have tried rebooting the VM, it doesn't seem to change anything
jfarnsworth
 
Posts: 7
Joined: Fri Oct 19, 2018 10:16 am

Re: [Reload] Job for nagios.service invalid

Postby tgriep » Thu Nov 08, 2018 11:04 am

Can you get this file from the Nagios server and post it here so we can view it?
Code: Select all
/etc/rc.d/init.d/nagios
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
 
Posts: 7237
Joined: Thu Oct 30, 2014 9:02 am

Next

Return to Nagios Core

Who is online

Users browsing this forum: Google [Bot] and 13 guests