Nagios Core "Not running" via web although services UP

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
heidebock
Posts: 2
Joined: Fri Apr 26, 2019 2:54 am

Nagios Core "Not running" via web although services UP

Post by heidebock »

Hello all,
I realize this is a fairly common post, but I haven't been able to find a solution that applies directly to this problem.

I installed Nagios from source following this guide:
https://www.linuxtechi.com/install-conf ... os7-rhel7/

running centos 7, Nagios version 4.3.2...

When logging into the Nagios front-end website, it appears the website is unable to communicate at all w/ the Nagios service:
"Not running" at the main page, "Error: Could not read object configuration data!" at every other page.

The only anomaly I see is that systemctl indicates there's a Nagios job timing out although the service is running... more detail is available in the nagios log (see code snippet for output):

I'm not sure where 'lab_inet_1' is coming from... I did a recursive grep on the entire /usr/local/nagios/ directory and was unable to find this host listed.... Nagios is actually running on an eve-ng server & lab_inet_[1-2] are 2 virtual routers being run on eve (???)... Not sure if it's relevant, but again, it's the only thing that 'doesn't look right'.

Here is output from the server:

Code: Select all

[root@eve-netserv ~]# systemctl status nagios
● nagios.service - Nagios Network Monitoring
   Loaded: loaded (/usr/lib/systemd/system/nagios.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2019-04-25 09:53:28 MDT; 16h ago
     Docs: https://www.nagios.org/documentation/
  Process: 21011 ExecStopPost=/usr/bin/rm -f /var/spool/nagios/cmd/nagios.cmd (code=exited, status=0/SUCCESS)
  Process: 21009 ExecStop=/bin/kill -TERM ${MAINPID} (code=exited, status=0/SUCCESS)
  Process: 20153 ExecReload=/bin/kill -HUP ${MAINPID} (code=exited, status=0/SUCCESS)
  Process: 21014 ExecStart=/usr/sbin/nagios -d /etc/nagios/nagios.cfg (code=exited, status=0/SUCCESS)
  Process: 21012 ExecStartPre=/usr/sbin/nagios -v /etc/nagios/nagios.cfg (code=exited, status=0/SUCCESS)
 Main PID: 21016 (nagios)
   CGroup: /system.slice/nagios.service
           ├─21016 /usr/sbin/nagios -d /etc/nagios/nagios.cfg
           ├─21017 /usr/sbin/nagios --worker /var/spool/nagios/cmd/nagios.qh
           ├─21018 /usr/sbin/nagios --worker /var/spool/nagios/cmd/nagios.qh
           ├─21019 /usr/sbin/nagios --worker /var/spool/nagios/cmd/nagios.qh
           ├─21020 /usr/sbin/nagios --worker /var/spool/nagios/cmd/nagios.qh
           └─21021 /usr/sbin/nagios -d /etc/nagios/nagios.cfg

Apr 26 01:51:59 eve-netserv nagios[21016]: Warning: Check of host 'lab_inet_2' timed out after 30.01 seconds
Apr 26 01:51:59 eve-netserv nagios[21016]: wproc: Core Worker 21018: job 566 (pid=30811): Dormant child reaped
Apr 26 01:53:28 eve-netserv nagios[21016]: Auto-save of retention data completed successfully.
Apr 26 01:55:45 eve-netserv nagios[21020]: job 569 (pid=30876): read() returned error 11
Apr 26 01:55:45 eve-netserv nagios[21016]: wproc: Core Worker 21020: job 569 (pid=30876) timed out. Killing it
Apr 26 01:55:45 eve-netserv nagios[21016]: wproc: CHECK job 569 from worker Core Worker 21020 timed out after 30.02s
Apr 26 01:55:45 eve-netserv nagios[21016]: wproc:   host=lab_inet_1; service=(null);
Apr 26 01:55:45 eve-netserv nagios[21016]: wproc:   early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Apr 26 01:55:45 eve-netserv nagios[21016]: Warning: Check of host 'lab_inet_1' timed out after 30.02 seconds
Apr 26 01:55:45 eve-netserv nagios[21016]: wproc: Core Worker 21020: job 569 (pid=30876): Dormant child reaped

[root@eve-netserv ~]# systemctl status httpd
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2019-04-25 09:53:18 MDT; 16h ago
     Docs: man:httpd(8)
           man:apachectl(8)
  Process: 20986 ExecStop=/bin/kill -WINCH ${MAINPID} (code=exited, status=0/SUCCESS)
 Main PID: 20990 (httpd)
   Status: "Total requests: 196; Current requests/sec: 0; Current traffic:   0 B/sec"
   CGroup: /system.slice/httpd.service
           ├─20990 /usr/sbin/httpd -DFOREGROUND
           ├─20992 /usr/sbin/httpd -DFOREGROUND
           ├─20993 /usr/sbin/httpd -DFOREGROUND
           ├─20994 /usr/sbin/httpd -DFOREGROUND
           ├─20995 /usr/sbin/httpd -DFOREGROUND
           ├─20996 /usr/sbin/httpd -DFOREGROUND
           ├─27326 /usr/sbin/httpd -DFOREGROUND
           └─30874 /usr/sbin/httpd -DFOREGROUND

Apr 25 09:53:18 eve-netserv systemd[1]: Starting The Apache HTTP Server...
Apr 25 09:53:18 eve-netserv httpd[20990]: AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using fe80::9725:a77f:c65f:df82. Set the 'ServerName' dire...s this message
Apr 25 09:53:18 eve-netserv systemd[1]: Started The Apache HTTP Server.
Hint: Some lines were ellipsized, use -l to show in full.

# the below log correlates to the error seen via the 'systemctl' command - but I'm not sure where 'lab_inet_1' is coming from... I did a recursive
# grep on the entire /usr/local/nagios/ directory and was unable to find this host listed....  Nagios is actually running on an eve-ng server & 
# lab_inet_[1-2] are 2 virtual routers being run on eve (???)...
[root@eve-netserv ~]# tail -f /var/log/nagios/nagios.log
[1556265119] wproc:   early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1556265119] Warning: Check of host 'lab_inet_2' timed out after 30.01 seconds
[1556265119] wproc: Core Worker 21018: job 566 (pid=30811): Dormant child reaped
[1556265208] Auto-save of retention data completed successfully.
[1556265345] wproc: Core Worker 21020: job 569 (pid=30876) timed out. Killing it
[1556265345] wproc: CHECK job 569 from worker Core Worker 21020 timed out after 30.02s
[1556265345] wproc:   host=lab_inet_1; service=(null);
[1556265345] wproc:   early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1556265345] Warning: Check of host 'lab_inet_1' timed out after 30.02 seconds
[1556265345] wproc: Core Worker 21020: job 569 (pid=30876): Dormant child reaped

[root@eve-netserv nagios]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Nagios Core 4.3.2
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2017-05-09
License: GPL

Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
	Checked 8 services.
	Checked 1 hosts.
	Checked 1 host groups.
	Checked 0 service groups.
	Checked 1 contacts.
	Checked 1 contact groups.
	Checked 24 commands.
	Checked 5 time periods.
	Checked 0 host escalations.
	Checked 0 service escalations.
Checking for circular paths...
	Checked 1 hosts
	Checked 0 service dependencies
	Checked 0 host dependencies
	Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Nagios Core "Not running" via web although services UP

Post by cdienger »

Is this a new install or something that was working previously? It indicates an issue with the config(/usr/local/nagios/etc/) somewhere. Try removing the checks that are timing out from the config and let us know if that resolves the problem.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
heidebock
Posts: 2
Joined: Fri Apr 26, 2019 2:54 am

Re: Nagios Core "Not running" via web although services UP

Post by heidebock »

thanks for the reply... This is a new install from source and has never worked... I don't really know where to look in /usr/local/nagios/etc/... I've grepped for the failed host & also a reference to the etc/hosts file. How do I go about pinpointing the check in question & disabling it?

thanks!
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios Core "Not running" via web although services UP

Post by scottwilkerson »

You said this is a new install from source, but your service is running the following command and these are note the source locations

Code: Select all

/usr/sbin/nagios -d /etc/nagios/nagios.cfg
Are you mixing together source and rpm installs? This will never turn out well.

I would highly recommend scrapping this server and start over following the official guide
https://support.nagios.com/kb/article/n ... ce-96.html
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked