Nagios Core "Not running" via web although services UP
Posted: Fri Apr 26, 2019 3:09 am
Hello all,
I realize this is a fairly common post, but I haven't been able to find a solution that applies directly to this problem.
I installed Nagios from source following this guide:
https://www.linuxtechi.com/install-conf ... os7-rhel7/
running centos 7, Nagios version 4.3.2...
When logging into the Nagios front-end website, it appears the website is unable to communicate at all w/ the Nagios service:
"Not running" at the main page, "Error: Could not read object configuration data!" at every other page.
The only anomaly I see is that systemctl indicates there's a Nagios job timing out although the service is running... more detail is available in the nagios log (see code snippet for output):
I'm not sure where 'lab_inet_1' is coming from... I did a recursive grep on the entire /usr/local/nagios/ directory and was unable to find this host listed.... Nagios is actually running on an eve-ng server & lab_inet_[1-2] are 2 virtual routers being run on eve (???)... Not sure if it's relevant, but again, it's the only thing that 'doesn't look right'.
Here is output from the server:
I realize this is a fairly common post, but I haven't been able to find a solution that applies directly to this problem.
I installed Nagios from source following this guide:
https://www.linuxtechi.com/install-conf ... os7-rhel7/
running centos 7, Nagios version 4.3.2...
When logging into the Nagios front-end website, it appears the website is unable to communicate at all w/ the Nagios service:
"Not running" at the main page, "Error: Could not read object configuration data!" at every other page.
The only anomaly I see is that systemctl indicates there's a Nagios job timing out although the service is running... more detail is available in the nagios log (see code snippet for output):
I'm not sure where 'lab_inet_1' is coming from... I did a recursive grep on the entire /usr/local/nagios/ directory and was unable to find this host listed.... Nagios is actually running on an eve-ng server & lab_inet_[1-2] are 2 virtual routers being run on eve (???)... Not sure if it's relevant, but again, it's the only thing that 'doesn't look right'.
Here is output from the server:
Code: Select all
[root@eve-netserv ~]# systemctl status nagios
● nagios.service - Nagios Network Monitoring
Loaded: loaded (/usr/lib/systemd/system/nagios.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2019-04-25 09:53:28 MDT; 16h ago
Docs: https://www.nagios.org/documentation/
Process: 21011 ExecStopPost=/usr/bin/rm -f /var/spool/nagios/cmd/nagios.cmd (code=exited, status=0/SUCCESS)
Process: 21009 ExecStop=/bin/kill -TERM ${MAINPID} (code=exited, status=0/SUCCESS)
Process: 20153 ExecReload=/bin/kill -HUP ${MAINPID} (code=exited, status=0/SUCCESS)
Process: 21014 ExecStart=/usr/sbin/nagios -d /etc/nagios/nagios.cfg (code=exited, status=0/SUCCESS)
Process: 21012 ExecStartPre=/usr/sbin/nagios -v /etc/nagios/nagios.cfg (code=exited, status=0/SUCCESS)
Main PID: 21016 (nagios)
CGroup: /system.slice/nagios.service
├─21016 /usr/sbin/nagios -d /etc/nagios/nagios.cfg
├─21017 /usr/sbin/nagios --worker /var/spool/nagios/cmd/nagios.qh
├─21018 /usr/sbin/nagios --worker /var/spool/nagios/cmd/nagios.qh
├─21019 /usr/sbin/nagios --worker /var/spool/nagios/cmd/nagios.qh
├─21020 /usr/sbin/nagios --worker /var/spool/nagios/cmd/nagios.qh
└─21021 /usr/sbin/nagios -d /etc/nagios/nagios.cfg
Apr 26 01:51:59 eve-netserv nagios[21016]: Warning: Check of host 'lab_inet_2' timed out after 30.01 seconds
Apr 26 01:51:59 eve-netserv nagios[21016]: wproc: Core Worker 21018: job 566 (pid=30811): Dormant child reaped
Apr 26 01:53:28 eve-netserv nagios[21016]: Auto-save of retention data completed successfully.
Apr 26 01:55:45 eve-netserv nagios[21020]: job 569 (pid=30876): read() returned error 11
Apr 26 01:55:45 eve-netserv nagios[21016]: wproc: Core Worker 21020: job 569 (pid=30876) timed out. Killing it
Apr 26 01:55:45 eve-netserv nagios[21016]: wproc: CHECK job 569 from worker Core Worker 21020 timed out after 30.02s
Apr 26 01:55:45 eve-netserv nagios[21016]: wproc: host=lab_inet_1; service=(null);
Apr 26 01:55:45 eve-netserv nagios[21016]: wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Apr 26 01:55:45 eve-netserv nagios[21016]: Warning: Check of host 'lab_inet_1' timed out after 30.02 seconds
Apr 26 01:55:45 eve-netserv nagios[21016]: wproc: Core Worker 21020: job 569 (pid=30876): Dormant child reaped
[root@eve-netserv ~]# systemctl status httpd
● httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2019-04-25 09:53:18 MDT; 16h ago
Docs: man:httpd(8)
man:apachectl(8)
Process: 20986 ExecStop=/bin/kill -WINCH ${MAINPID} (code=exited, status=0/SUCCESS)
Main PID: 20990 (httpd)
Status: "Total requests: 196; Current requests/sec: 0; Current traffic: 0 B/sec"
CGroup: /system.slice/httpd.service
├─20990 /usr/sbin/httpd -DFOREGROUND
├─20992 /usr/sbin/httpd -DFOREGROUND
├─20993 /usr/sbin/httpd -DFOREGROUND
├─20994 /usr/sbin/httpd -DFOREGROUND
├─20995 /usr/sbin/httpd -DFOREGROUND
├─20996 /usr/sbin/httpd -DFOREGROUND
├─27326 /usr/sbin/httpd -DFOREGROUND
└─30874 /usr/sbin/httpd -DFOREGROUND
Apr 25 09:53:18 eve-netserv systemd[1]: Starting The Apache HTTP Server...
Apr 25 09:53:18 eve-netserv httpd[20990]: AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using fe80::9725:a77f:c65f:df82. Set the 'ServerName' dire...s this message
Apr 25 09:53:18 eve-netserv systemd[1]: Started The Apache HTTP Server.
Hint: Some lines were ellipsized, use -l to show in full.
# the below log correlates to the error seen via the 'systemctl' command - but I'm not sure where 'lab_inet_1' is coming from... I did a recursive
# grep on the entire /usr/local/nagios/ directory and was unable to find this host listed.... Nagios is actually running on an eve-ng server &
# lab_inet_[1-2] are 2 virtual routers being run on eve (???)...
[root@eve-netserv ~]# tail -f /var/log/nagios/nagios.log
[1556265119] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1556265119] Warning: Check of host 'lab_inet_2' timed out after 30.01 seconds
[1556265119] wproc: Core Worker 21018: job 566 (pid=30811): Dormant child reaped
[1556265208] Auto-save of retention data completed successfully.
[1556265345] wproc: Core Worker 21020: job 569 (pid=30876) timed out. Killing it
[1556265345] wproc: CHECK job 569 from worker Core Worker 21020 timed out after 30.02s
[1556265345] wproc: host=lab_inet_1; service=(null);
[1556265345] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1556265345] Warning: Check of host 'lab_inet_1' timed out after 30.02 seconds
[1556265345] wproc: Core Worker 21020: job 569 (pid=30876): Dormant child reaped
[root@eve-netserv nagios]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Nagios Core 4.3.2
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2017-05-09
License: GPL
Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
Read object config files okay...
Running pre-flight check on configuration data...
Checking objects...
Checked 8 services.
Checked 1 hosts.
Checked 1 host groups.
Checked 0 service groups.
Checked 1 contacts.
Checked 1 contact groups.
Checked 24 commands.
Checked 5 time periods.
Checked 0 host escalations.
Checked 0 service escalations.
Checking for circular paths...
Checked 1 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check