lose the UI for a few minutes, constantly

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
DMKatIBM
Posts: 22
Joined: Thu Jan 11, 2018 3:41 pm

lose the UI for a few minutes, constantly

Post by DMKatIBM »

On one of my two Nagios deployments, I'm getting a problem I haven't seen yet.

If I leave the UI open (normally when viewing "Host Groups"), I will eventually get the "sad face" in Chrome where it can't connect to it. If I click on the "Host Groups" button again on the left, nothing gets updated on the screen. After waiting a few more seconds, if I keep clicking on it, it will eventually bring up the Host Groups list again.

The process ID doesn't change, so the nagios.service itself isn't restarting, though it may be getting a HUP signal or something.

Following the /var/log/messages, there are tons of these:

[...]
Feb 23 08:42:35 dal10-build-Nagios nagios: job 1 (pid=25363): read() returned error 11
Feb 23 08:42:35 dal10-build-Nagios nagios: job 1 (pid=25362): read() returned error 11
[...]

but those messages have been happening for a long time, so I would have seen this long ago if they were responsible. They're the result of a plugin that is timing out (it's a legit timeout, the host has connection issues).

Debug is already at level 128, but the debug file hasn't updated in about a week. There are no issues with disk space.

Where can I even begin to look at what's causing this?
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: lose the UI for a few minutes, constantly

Post by npolovenko »

Hello, @DMKatIBM. I don't think read error 11 has to do with this issue. Looks like it's some deprecated entry that will be removed in the future: https://github.com/NagiosEnterprises/na ... issues/362

Could it be that the session is timing out? Can you open the /etc/php.ini file and increase the session.gc_maxlifetime? I'd suggest doubling the value that you already have there.
Then restart httpd with:

Code: Select all

service httpd restart
Let me know if this helps.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
DMKatIBM
Posts: 22
Joined: Thu Jan 11, 2018 3:41 pm

Re: lose the UI for a few minutes, constantly

Post by DMKatIBM »

The default setting was 1440. I changed it to 2880, but that main frame on the right still goes blank.

Where else could I look to see what could be causing this?
DMKatIBM
Posts: 22
Joined: Thu Jan 11, 2018 3:41 pm

Re: lose the UI for a few minutes, constantly

Post by DMKatIBM »

Some additional information for this.....

The message it displays is "<host> took too long to respond"

So it seems that whatever is auto-refreshing that page is exceeding some timeout somewhere. Changing that maxlifetime didn't resolve it.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: lose the UI for a few minutes, constantly

Post by npolovenko »

@DMKatIBM, Can you replicate the timeout error again, and then immediately run the following commands and post the output:

Code: Select all

tail -25 /var/log/httpd/error_log
tail -25 /var/log/httpd/access_log
Are you using a proxy or have you modified any apache conf files recently? Have you always had this problem on this server?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
DMKatIBM
Posts: 22
Joined: Thu Jan 11, 2018 3:41 pm

Re: lose the UI for a few minutes, constantly

Post by DMKatIBM »

I was running a tail on multiple logs, and between when it was working and when the screen went to "too long to respond", this is the only log that was written:

==> ssl_access_log <==
32.97.110.61 - nagiosadmin [27/Feb/2018:06:45:10 -0600] "GET /nagios/cgi-bin/extinfo.cgi?type=1&host=server-xyz HTTP/1.1" 200 11927
==> ssl_request_log <==
[27/Feb/2018:06:45:10 -0600] 32.97.110.61 TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 "GET /nagios/cgi-bin/extinfo.cgi?type=1&host=server-xyz HTTP/1.1" 11927

EDIT: it appears that another person on my team also had up the Nagios window, on a specific system, so the two above logs may be unrelated. They were a screen update to what he was looking at.

About a minute later it also outputs this log:

==> ssl_error_log <==
[Tue Feb 27 06:46:53.502345 2018] [autoindex:error] [pid 1291] [client 127.0.0.1:39156] AH01276: Cannot serve directory /var/www/html/: No matching DirectoryIndex (index.html,index.php) found, and server-generated directory index forbidden by Options directive
Last edited by DMKatIBM on Tue Feb 27, 2018 8:09 am, edited 1 time in total.
DMKatIBM
Posts: 22
Joined: Thu Jan 11, 2018 3:41 pm

Re: lose the UI for a few minutes, constantly

Post by DMKatIBM »

Also, no proxy server (all connections to remote hosts are direct), and I haven't modified anything in apache in quite some time.

I've been adding extra systems to Nagios to monitor, though. Currently have 71 systems and about 2600 services.

Another thing to note is that it's only the main.php frame that is throwing the "too long to respond" and going blank.
DMKatIBM
Posts: 22
Joined: Thu Jan 11, 2018 3:41 pm

Re: lose the UI for a few minutes, constantly

Post by DMKatIBM »

Okay, well here's something interesting.

So I decided to try Internet Explorer instead of Chrome, and it shows a lot more access logs when it's redrawing the frame. It doesn't output any of these logs in Chrome when it redraws it:

Code: Select all

==> ssl_access_log <==
199.246.40.53 - nagiosadmin [27/Feb/2018:07:20:27 -0600] "GET /nagios/cgi-bin/status.cgi?hostgroup=all&style=overview HTTP/1.1" 200 80917
==> ssl_request_log <==
[27/Feb/2018:07:20:27 -0600] 199.246.40.53 TLSv1.2 ECDHE-RSA-AES256-SHA384 "GET /nagios/cgi-bin/status.cgi?hostgroup=all&style=overview HTTP/1.1" 80917
==> ssl_access_log <==
199.246.40.53 - nagiosadmin [27/Feb/2018:07:20:27 -0600] "GET /nagios/stylesheets/common.css HTTP/1.1" 304 -
==> ssl_request_log <==
[27/Feb/2018:07:20:27 -0600] 199.246.40.53 TLSv1.2 ECDHE-RSA-AES256-SHA384 "GET /nagios/stylesheets/common.css HTTP/1.1" -
==> ssl_access_log <==
199.246.40.53 - nagiosadmin [27/Feb/2018:07:20:27 -0600] "GET /nagios/stylesheets/status.css HTTP/1.1" 304 -
==> ssl_request_log <==
[27/Feb/2018:07:20:27 -0600] 199.246.40.53 TLSv1.2 ECDHE-RSA-AES256-SHA384 "GET /nagios/stylesheets/status.css HTTP/1.1" -
==> ssl_access_log <==
199.246.40.53 - nagiosadmin [27/Feb/2018:07:20:27 -0600] "GET /nagios/stylesheets/nag_funcs.css HTTP/1.1" 304 -
==> ssl_request_log <==
[27/Feb/2018:07:20:27 -0600] 199.246.40.53 TLSv1.2 ECDHE-RSA-AES256-SHA384 "GET /nagios/stylesheets/nag_funcs.css HTTP/1.1" -
==> ssl_access_log <==
199.246.40.53 - nagiosadmin [27/Feb/2018:07:20:27 -0600] "GET /nagios/js/jquery-1.7.1.min.js HTTP/1.1" 304 -
==> ssl_request_log <==
[27/Feb/2018:07:20:27 -0600] 199.246.40.53 TLSv1.2 ECDHE-RSA-AES256-SHA384 "GET /nagios/js/jquery-1.7.1.min.js HTTP/1.1" -
==> ssl_access_log <==
199.246.40.53 - nagiosadmin [27/Feb/2018:07:20:27 -0600] "GET /nagios/images/detail.gif HTTP/1.1" 304 -
==> ssl_request_log <==
[27/Feb/2018:07:20:27 -0600] 199.246.40.53 TLSv1.2 ECDHE-RSA-AES256-SHA384 "GET /nagios/images/detail.gif HTTP/1.1" -
==> ssl_access_log <==
199.246.40.53 - nagiosadmin [27/Feb/2018:07:20:27 -0600] "GET /nagios/js/nag_funcs.js HTTP/1.1" 304 -
==> ssl_request_log <==
[27/Feb/2018:07:20:27 -0600] 199.246.40.53 TLSv1.2 ECDHE-RSA-AES256-SHA384 "GET /nagios/js/nag_funcs.js HTTP/1.1" -
==> ssl_access_log <==
199.246.40.53 - nagiosadmin [27/Feb/2018:07:20:27 -0600] "GET /nagios/images/status2.gif HTTP/1.1" 304 -
==> ssl_request_log <==
[27/Feb/2018:07:20:27 -0600] 199.246.40.53 TLSv1.2 ECDHE-RSA-AES256-SHA384 "GET /nagios/images/status2.gif HTTP/1.1" -
Yet a few seconds after I posted that, it too came up with "this page can not be displayed" in the main.php frame.

Here's the logs from apache at the time it failed:

Code: Select all

[root@dal10-build-Nagios httpd]# tail -20 ssl_error_log
[Tue Feb 27 06:46:53.502345 2018] [autoindex:error] [pid 1291] [client 127.0.0.1:39156] AH01276: Cannot serve directory /var/www/html/: No matching DirectoryIndex (index.html,index.php) found, and server-generated directory index forbidden by Options directive
[Tue Feb 27 06:47:25.966986 2018] [autoindex:error] [pid 9447] [client 10.87.40.141:44406] AH01276: Cannot serve directory /var/www/html/: No matching DirectoryIndex (index.html,index.php) found, and server-generated directory index forbidden by Options directive
[Tue Feb 27 06:51:53.509330 2018] [autoindex:error] [pid 1289] [client 127.0.0.1:42716] AH01276: Cannot serve directory /var/www/html/: No matching DirectoryIndex (index.html,index.php) found, and server-generated directory index forbidden by Options directive
[Tue Feb 27 06:52:19.312239 2018] [autoindex:error] [pid 1291] [client 10.87.40.141:46438] AH01276: Cannot serve directory /var/www/html/: No matching DirectoryIndex (index.html,index.php) found, and server-generated directory index forbidden by Options directive
[Tue Feb 27 06:52:26.141276 2018] [autoindex:error] [pid 9447] [client 10.87.40.141:47210] AH01276: Cannot serve directory /var/www/html/: No matching DirectoryIndex (index.html,index.php) found, and server-generated directory index forbidden by Options directive
[Tue Feb 27 06:56:53.519980 2018] [autoindex:error] [pid 1292] [client 127.0.0.1:44642] AH01276: Cannot serve directory /var/www/html/: No matching DirectoryIndex (index.html,index.php) found, and server-generated directory index forbidden by Options directive
[Tue Feb 27 06:57:26.318883 2018] [autoindex:error] [pid 1289] [client 10.87.40.141:49740] AH01276: Cannot serve directory /var/www/html/: No matching DirectoryIndex (index.html,index.php) found, and server-generated directory index forbidden by Options directive
[Tue Feb 27 07:01:53.529426 2018] [autoindex:error] [pid 50188] [client 127.0.0.1:48140] AH01276: Cannot serve directory /var/www/html/: No matching DirectoryIndex (index.html,index.php) found, and server-generated directory index forbidden by Options directive
[Tue Feb 27 07:02:24.521226 2018] [autoindex:error] [pid 50189] [client 10.87.40.141:52064] AH01276: Cannot serve directory /var/www/html/: No matching DirectoryIndex (index.html,index.php) found, and server-generated directory index forbidden by Options directive
[Tue Feb 27 07:02:26.514722 2018] [autoindex:error] [pid 50190] [client 10.87.40.141:52546] AH01276: Cannot serve directory /var/www/html/: No matching DirectoryIndex (index.html,index.php) found, and server-generated directory index forbidden by Options directive
[Tue Feb 27 07:06:53.539964 2018] [autoindex:error] [pid 1292] [client 127.0.0.1:50216] AH01276: Cannot serve directory /var/www/html/: No matching DirectoryIndex (index.html,index.php) found, and server-generated directory index forbidden by Options directive
[Tue Feb 27 07:07:26.700482 2018] [autoindex:error] [pid 1289] [client 10.87.40.141:55086] AH01276: Cannot serve directory /var/www/html/: No matching DirectoryIndex (index.html,index.php) found, and server-generated directory index forbidden by Options directive
[Tue Feb 27 07:11:53.548733 2018] [autoindex:error] [pid 1291] [client 127.0.0.1:53862] AH01276: Cannot serve directory /var/www/html/: No matching DirectoryIndex (index.html,index.php) found, and server-generated directory index forbidden by Options directive
[Tue Feb 27 07:12:24.830960 2018] [autoindex:error] [pid 9447] [client 10.87.40.141:57282] AH01276: Cannot serve directory /var/www/html/: No matching DirectoryIndex (index.html,index.php) found, and server-generated directory index forbidden by Options directive
[Tue Feb 27 07:12:26.895063 2018] [autoindex:error] [pid 50188] [client 10.87.40.141:57800] AH01276: Cannot serve directory /var/www/html/: No matching DirectoryIndex (index.html,index.php) found, and server-generated directory index forbidden by Options directive
[Tue Feb 27 07:16:53.557872 2018] [autoindex:error] [pid 50189] [client 127.0.0.1:55748] AH01276: Cannot serve directory /var/www/html/: No matching DirectoryIndex (index.html,index.php) found, and server-generated directory index forbidden by Options directive
[Tue Feb 27 07:17:27.082062 2018] [autoindex:error] [pid 1293] [client 10.87.40.141:60428] AH01276: Cannot serve directory /var/www/html/: No matching DirectoryIndex (index.html,index.php) found, and server-generated directory index forbidden by Options directive
[Tue Feb 27 07:21:53.565259 2018] [autoindex:error] [pid 1293] [client 127.0.0.1:59372] AH01276: Cannot serve directory /var/www/html/: No matching DirectoryIndex (index.html,index.php) found, and server-generated directory index forbidden by Options directive
[Tue Feb 27 07:22:24.056869 2018] [autoindex:error] [pid 50191] [client 10.87.40.141:34022] AH01276: Cannot serve directory /var/www/html/: No matching DirectoryIndex (index.html,index.php) found, and server-generated directory index forbidden by Options directive
[Tue Feb 27 07:22:27.261291 2018] [autoindex:error] [pid 50189] [client 10.87.40.141:34832] AH01276: Cannot serve directory /var/www/html/: No matching DirectoryIndex (index.html,index.php) found, and server-generated directory index forbidden by Options directive
[root@dal10-build-Nagios httpd]# tail -20 ssl_access_log
199.246.40.53 - nagiosadmin [27/Feb/2018:07:18:56 -0600] "GET /nagios/cgi-bin/status.cgi?hostgroup=all&style=overview HTTP/1.1" 200 80917
199.246.40.53 - nagiosadmin [27/Feb/2018:07:18:56 -0600] "GET /nagios/stylesheets/common.css HTTP/1.1" 304 -
199.246.40.53 - nagiosadmin [27/Feb/2018:07:18:56 -0600] "GET /nagios/stylesheets/status.css HTTP/1.1" 304 -
199.246.40.53 - nagiosadmin [27/Feb/2018:07:18:56 -0600] "GET /nagios/stylesheets/nag_funcs.css HTTP/1.1" 304 -
199.246.40.53 - nagiosadmin [27/Feb/2018:07:18:56 -0600] "GET /nagios/js/nag_funcs.js HTTP/1.1" 304 -
199.246.40.53 - nagiosadmin [27/Feb/2018:07:18:56 -0600] "GET /nagios/js/jquery-1.7.1.min.js HTTP/1.1" 304 -
199.246.40.53 - nagiosadmin [27/Feb/2018:07:18:56 -0600] "GET /nagios/images/status2.gif HTTP/1.1" 304 -
199.246.40.53 - nagiosadmin [27/Feb/2018:07:18:56 -0600] "GET /nagios/images/detail.gif HTTP/1.1" 304 -
199.246.40.53 - nagiosadmin [27/Feb/2018:07:20:27 -0600] "GET /nagios/cgi-bin/status.cgi?hostgroup=all&style=overview HTTP/1.1" 200 80917
199.246.40.53 - nagiosadmin [27/Feb/2018:07:20:27 -0600] "GET /nagios/stylesheets/common.css HTTP/1.1" 304 -
199.246.40.53 - nagiosadmin [27/Feb/2018:07:20:27 -0600] "GET /nagios/stylesheets/status.css HTTP/1.1" 304 -
199.246.40.53 - nagiosadmin [27/Feb/2018:07:20:27 -0600] "GET /nagios/stylesheets/nag_funcs.css HTTP/1.1" 304 -
199.246.40.53 - nagiosadmin [27/Feb/2018:07:20:27 -0600] "GET /nagios/js/jquery-1.7.1.min.js HTTP/1.1" 304 -
199.246.40.53 - nagiosadmin [27/Feb/2018:07:20:27 -0600] "GET /nagios/images/detail.gif HTTP/1.1" 304 -
199.246.40.53 - nagiosadmin [27/Feb/2018:07:20:27 -0600] "GET /nagios/js/nag_funcs.js HTTP/1.1" 304 -
199.246.40.53 - nagiosadmin [27/Feb/2018:07:20:27 -0600] "GET /nagios/images/status2.gif HTTP/1.1" 304 -
199.246.40.53 - - [27/Feb/2018:07:20:48 -0600] "-" 408 -
127.0.0.1 - - [27/Feb/2018:07:21:53 -0600] "GET / HTTP/1.1" 403 3985
10.87.40.141 - - [27/Feb/2018:07:22:24 -0600] "GET / HTTP/1.0" 403 3985
10.87.40.141 - - [27/Feb/2018:07:22:27 -0600] "GET / HTTP/1.1" 403 3985
[root@dal10-build-Nagios httpd]# tail -20 ssl_request_log
[27/Feb/2018:07:18:56 -0600] 199.246.40.53 TLSv1.2 ECDHE-RSA-AES256-SHA384 "GET /nagios/cgi-bin/status.cgi?hostgroup=all&style=overview HTTP/1.1" 80917
[27/Feb/2018:07:18:56 -0600] 199.246.40.53 TLSv1.2 ECDHE-RSA-AES256-SHA384 "GET /nagios/stylesheets/common.css HTTP/1.1" -
[27/Feb/2018:07:18:56 -0600] 199.246.40.53 TLSv1.2 ECDHE-RSA-AES256-SHA384 "GET /nagios/stylesheets/status.css HTTP/1.1" -
[27/Feb/2018:07:18:56 -0600] 199.246.40.53 TLSv1.2 ECDHE-RSA-AES256-SHA384 "GET /nagios/stylesheets/nag_funcs.css HTTP/1.1" -
[27/Feb/2018:07:18:56 -0600] 199.246.40.53 TLSv1.2 ECDHE-RSA-AES256-SHA384 "GET /nagios/js/nag_funcs.js HTTP/1.1" -
[27/Feb/2018:07:18:56 -0600] 199.246.40.53 TLSv1.2 ECDHE-RSA-AES256-SHA384 "GET /nagios/js/jquery-1.7.1.min.js HTTP/1.1" -
[27/Feb/2018:07:18:56 -0600] 199.246.40.53 TLSv1.2 ECDHE-RSA-AES256-SHA384 "GET /nagios/images/status2.gif HTTP/1.1" -
[27/Feb/2018:07:18:56 -0600] 199.246.40.53 TLSv1.2 ECDHE-RSA-AES256-SHA384 "GET /nagios/images/detail.gif HTTP/1.1" -
[27/Feb/2018:07:20:27 -0600] 199.246.40.53 TLSv1.2 ECDHE-RSA-AES256-SHA384 "GET /nagios/cgi-bin/status.cgi?hostgroup=all&style=overview HTTP/1.1" 80917
[27/Feb/2018:07:20:27 -0600] 199.246.40.53 TLSv1.2 ECDHE-RSA-AES256-SHA384 "GET /nagios/stylesheets/common.css HTTP/1.1" -
[27/Feb/2018:07:20:27 -0600] 199.246.40.53 TLSv1.2 ECDHE-RSA-AES256-SHA384 "GET /nagios/stylesheets/status.css HTTP/1.1" -
[27/Feb/2018:07:20:27 -0600] 199.246.40.53 TLSv1.2 ECDHE-RSA-AES256-SHA384 "GET /nagios/stylesheets/nag_funcs.css HTTP/1.1" -
[27/Feb/2018:07:20:27 -0600] 199.246.40.53 TLSv1.2 ECDHE-RSA-AES256-SHA384 "GET /nagios/js/jquery-1.7.1.min.js HTTP/1.1" -
[27/Feb/2018:07:20:27 -0600] 199.246.40.53 TLSv1.2 ECDHE-RSA-AES256-SHA384 "GET /nagios/images/detail.gif HTTP/1.1" -
[27/Feb/2018:07:20:27 -0600] 199.246.40.53 TLSv1.2 ECDHE-RSA-AES256-SHA384 "GET /nagios/js/nag_funcs.js HTTP/1.1" -
[27/Feb/2018:07:20:27 -0600] 199.246.40.53 TLSv1.2 ECDHE-RSA-AES256-SHA384 "GET /nagios/images/status2.gif HTTP/1.1" -
[27/Feb/2018:07:20:48 -0600] 199.246.40.53 TLSv1.2 ECDHE-RSA-AES256-SHA384 "-" -
[27/Feb/2018:07:21:53 -0600] 127.0.0.1 TLSv1.2 ECDHE-RSA-AES256-GCM-SHA384 "GET / HTTP/1.1" 3985
[27/Feb/2018:07:22:24 -0600] 10.87.40.141 TLSv1.2 ECDHE-RSA-AES256-GCM-SHA384 "GET / HTTP/1.0" 3985
[27/Feb/2018:07:22:27 -0600] 10.87.40.141 TLSv1.2 ECDHE-RSA-AES256-GCM-SHA384 "GET / HTTP/1.1" 3985
[root@dal10-build-Nagios httpd]#
DMKatIBM
Posts: 22
Joined: Thu Jan 11, 2018 3:41 pm

Re: lose the UI for a few minutes, constantly

Post by DMKatIBM »

I've also tried altering this to 180 (default was 30) in /etc/php.ini:

max_execution_time = 180

This also resulted in no change.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: lose the UI for a few minutes, constantly

Post by npolovenko »

@DMKatIBM, Can you run the following commands and show me the output:

Code: Select all

ls -l /var/www/html/
ls -ld /var/www/html/
Also, please post the following configuration file:

Code: Select all

/etc/httpd/conf.d/nagios.conf
Thank you
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked