Page 1 of 1

Nagios Core "Not running" via web although services UP

Posted: Fri Apr 26, 2019 3:09 am
by heidebock
Hello all,
I realize this is a fairly common post, but I haven't been able to find a solution that applies directly to this problem.

I installed Nagios from source following this guide:
https://www.linuxtechi.com/install-conf ... os7-rhel7/

running centos 7, Nagios version 4.3.2...

When logging into the Nagios front-end website, it appears the website is unable to communicate at all w/ the Nagios service:
"Not running" at the main page, "Error: Could not read object configuration data!" at every other page.

The only anomaly I see is that systemctl indicates there's a Nagios job timing out although the service is running... more detail is available in the nagios log (see code snippet for output):

I'm not sure where 'lab_inet_1' is coming from... I did a recursive grep on the entire /usr/local/nagios/ directory and was unable to find this host listed.... Nagios is actually running on an eve-ng server & lab_inet_[1-2] are 2 virtual routers being run on eve (???)... Not sure if it's relevant, but again, it's the only thing that 'doesn't look right'.

Here is output from the server:

Code: Select all

[root@eve-netserv ~]# systemctl status nagios
● nagios.service - Nagios Network Monitoring
   Loaded: loaded (/usr/lib/systemd/system/nagios.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2019-04-25 09:53:28 MDT; 16h ago
     Docs: https://www.nagios.org/documentation/
  Process: 21011 ExecStopPost=/usr/bin/rm -f /var/spool/nagios/cmd/nagios.cmd (code=exited, status=0/SUCCESS)
  Process: 21009 ExecStop=/bin/kill -TERM ${MAINPID} (code=exited, status=0/SUCCESS)
  Process: 20153 ExecReload=/bin/kill -HUP ${MAINPID} (code=exited, status=0/SUCCESS)
  Process: 21014 ExecStart=/usr/sbin/nagios -d /etc/nagios/nagios.cfg (code=exited, status=0/SUCCESS)
  Process: 21012 ExecStartPre=/usr/sbin/nagios -v /etc/nagios/nagios.cfg (code=exited, status=0/SUCCESS)
 Main PID: 21016 (nagios)
   CGroup: /system.slice/nagios.service
           ├─21016 /usr/sbin/nagios -d /etc/nagios/nagios.cfg
           ├─21017 /usr/sbin/nagios --worker /var/spool/nagios/cmd/nagios.qh
           ├─21018 /usr/sbin/nagios --worker /var/spool/nagios/cmd/nagios.qh
           ├─21019 /usr/sbin/nagios --worker /var/spool/nagios/cmd/nagios.qh
           ├─21020 /usr/sbin/nagios --worker /var/spool/nagios/cmd/nagios.qh
           └─21021 /usr/sbin/nagios -d /etc/nagios/nagios.cfg

Apr 26 01:51:59 eve-netserv nagios[21016]: Warning: Check of host 'lab_inet_2' timed out after 30.01 seconds
Apr 26 01:51:59 eve-netserv nagios[21016]: wproc: Core Worker 21018: job 566 (pid=30811): Dormant child reaped
Apr 26 01:53:28 eve-netserv nagios[21016]: Auto-save of retention data completed successfully.
Apr 26 01:55:45 eve-netserv nagios[21020]: job 569 (pid=30876): read() returned error 11
Apr 26 01:55:45 eve-netserv nagios[21016]: wproc: Core Worker 21020: job 569 (pid=30876) timed out. Killing it
Apr 26 01:55:45 eve-netserv nagios[21016]: wproc: CHECK job 569 from worker Core Worker 21020 timed out after 30.02s
Apr 26 01:55:45 eve-netserv nagios[21016]: wproc:   host=lab_inet_1; service=(null);
Apr 26 01:55:45 eve-netserv nagios[21016]: wproc:   early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Apr 26 01:55:45 eve-netserv nagios[21016]: Warning: Check of host 'lab_inet_1' timed out after 30.02 seconds
Apr 26 01:55:45 eve-netserv nagios[21016]: wproc: Core Worker 21020: job 569 (pid=30876): Dormant child reaped

[root@eve-netserv ~]# systemctl status httpd
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2019-04-25 09:53:18 MDT; 16h ago
     Docs: man:httpd(8)
           man:apachectl(8)
  Process: 20986 ExecStop=/bin/kill -WINCH ${MAINPID} (code=exited, status=0/SUCCESS)
 Main PID: 20990 (httpd)
   Status: "Total requests: 196; Current requests/sec: 0; Current traffic:   0 B/sec"
   CGroup: /system.slice/httpd.service
           ├─20990 /usr/sbin/httpd -DFOREGROUND
           ├─20992 /usr/sbin/httpd -DFOREGROUND
           ├─20993 /usr/sbin/httpd -DFOREGROUND
           ├─20994 /usr/sbin/httpd -DFOREGROUND
           ├─20995 /usr/sbin/httpd -DFOREGROUND
           ├─20996 /usr/sbin/httpd -DFOREGROUND
           ├─27326 /usr/sbin/httpd -DFOREGROUND
           └─30874 /usr/sbin/httpd -DFOREGROUND

Apr 25 09:53:18 eve-netserv systemd[1]: Starting The Apache HTTP Server...
Apr 25 09:53:18 eve-netserv httpd[20990]: AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using fe80::9725:a77f:c65f:df82. Set the 'ServerName' dire...s this message
Apr 25 09:53:18 eve-netserv systemd[1]: Started The Apache HTTP Server.
Hint: Some lines were ellipsized, use -l to show in full.

# the below log correlates to the error seen via the 'systemctl' command - but I'm not sure where 'lab_inet_1' is coming from... I did a recursive
# grep on the entire /usr/local/nagios/ directory and was unable to find this host listed....  Nagios is actually running on an eve-ng server & 
# lab_inet_[1-2] are 2 virtual routers being run on eve (???)...
[root@eve-netserv ~]# tail -f /var/log/nagios/nagios.log
[1556265119] wproc:   early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1556265119] Warning: Check of host 'lab_inet_2' timed out after 30.01 seconds
[1556265119] wproc: Core Worker 21018: job 566 (pid=30811): Dormant child reaped
[1556265208] Auto-save of retention data completed successfully.
[1556265345] wproc: Core Worker 21020: job 569 (pid=30876) timed out. Killing it
[1556265345] wproc: CHECK job 569 from worker Core Worker 21020 timed out after 30.02s
[1556265345] wproc:   host=lab_inet_1; service=(null);
[1556265345] wproc:   early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1556265345] Warning: Check of host 'lab_inet_1' timed out after 30.02 seconds
[1556265345] wproc: Core Worker 21020: job 569 (pid=30876): Dormant child reaped

[root@eve-netserv nagios]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Nagios Core 4.3.2
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2017-05-09
License: GPL

Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
	Checked 8 services.
	Checked 1 hosts.
	Checked 1 host groups.
	Checked 0 service groups.
	Checked 1 contacts.
	Checked 1 contact groups.
	Checked 24 commands.
	Checked 5 time periods.
	Checked 0 host escalations.
	Checked 0 service escalations.
Checking for circular paths...
	Checked 1 hosts
	Checked 0 service dependencies
	Checked 0 host dependencies
	Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check

Re: Nagios Core "Not running" via web although services UP

Posted: Fri Apr 26, 2019 2:56 pm
by cdienger
Is this a new install or something that was working previously? It indicates an issue with the config(/usr/local/nagios/etc/) somewhere. Try removing the checks that are timing out from the config and let us know if that resolves the problem.

Re: Nagios Core "Not running" via web although services UP

Posted: Tue Apr 30, 2019 7:32 am
by heidebock
thanks for the reply... This is a new install from source and has never worked... I don't really know where to look in /usr/local/nagios/etc/... I've grepped for the failed host & also a reference to the etc/hosts file. How do I go about pinpointing the check in question & disabling it?

thanks!

Re: Nagios Core "Not running" via web although services UP

Posted: Tue Apr 30, 2019 11:50 am
by scottwilkerson
You said this is a new install from source, but your service is running the following command and these are note the source locations

Code: Select all

/usr/sbin/nagios -d /etc/nagios/nagios.cfg
Are you mixing together source and rpm installs? This will never turn out well.

I would highly recommend scrapping this server and start over following the official guide
https://support.nagios.com/kb/article/n ... ce-96.html