Page 1 of 5

Cannot access Cluster Status page after 2.1.0 update

Posted: Wed Oct 02, 2019 10:19 am
by rferebee
Good morning, I updated my Log Server environment to 2.1.0 yesterday and this morning I am unable to access the Cluster Status page. I'm getting an HTTP 500 error.

Any suggestions?

Re: Cannot access Cluster Status page after 2.1.0 update

Posted: Wed Oct 02, 2019 10:21 am
by rferebee
I also can't access the Index Status page. Same error.

Re: Cannot access Cluster Status page after 2.1.0 update

Posted: Wed Oct 02, 2019 2:52 pm
by mbellerue
Can you run ls -lh /var/www/html/nagioslogserver/application/views/admin/ and post the results here? I'd like to see if the permissions are good on the files for those pages.

Did the upgrade seem successful? No errors or warnings? Do you know if you were able to access these pages yesterday after the upgrade?

You might also grab a system profile if you're going to be ssh'd into the system. To generate a profile from the command line, run /usr/local/nagioslogserver/scripts/profile.sh, and that will drop a system profile in /tmp. You'll have to use a program like WinSCP to download it, though.

Re: Cannot access Cluster Status page after 2.1.0 update

Posted: Wed Oct 02, 2019 3:01 pm
by rferebee

Code: Select all

root@nagioslscc2:/root> ls -lh /var/www/html/nagioslogserver/application/views/admin/
total 320K
-rw-r--r-- 1 root root 5.0K Oct  1 11:37 activate.php
-rw-r--r-- 1 root root  11K Oct  1 11:37 audit_log.php
drwxr-xr-x 2 root root   55 Oct  1 11:37 auth_servers
-rw-r--r-- 1 root root 7.4K Oct  1 11:37 cluster.php
-rw-r--r-- 1 root root  26K Oct  1 11:37 create_user.php
-rw-r--r-- 1 root root  15K Oct  1 11:37 custom_includes.php
-rw-r--r-- 1 root root  28K Oct  1 11:37 edit_user.php
-rw-r--r-- 1 root root 7.0K Oct  1 11:37 globals.php
-rw-r--r-- 1 root root 4.9K Oct  1 11:37 home.php
-rw-r--r-- 1 root root 8.3K Oct  1 11:37 host_lists.php
-rw-r--r-- 1 root root 7.2K Oct  1 11:37 import_users_final.php
-rw-r--r-- 1 root root 2.4K Oct  1 11:37 import_users.php
-rw-r--r-- 1 root root 7.3K Oct  1 11:37 import_users_select.php
-rw-r--r-- 1 root root 3.7K Oct  1 11:37 index_status.php
-rw-r--r-- 1 root root  24K Oct  1 11:37 indices_table.php
-rw-r--r-- 1 root root  46K Oct  1 11:37 instance_status.php
-rw-r--r-- 1 root root 2.6K Oct  1 11:37 leftbar.php
-rw-r--r-- 1 root root 8.4K Oct  1 11:37 license.php
-rw-r--r-- 1 root root 7.9K Oct  1 11:37 mail.php
-rw-r--r-- 1 root root 8.9K Oct  1 11:37 ncpa.php
-rw-r--r-- 1 root root 3.0K Oct  1 11:37 proxy.php
drwxr-xr-x 2 root root   68 Oct  1 11:37 rss
-rw-r--r-- 1 root root  28K Oct  1 11:37 snapshots.php
-rw-r--r-- 1 root root 9.1K Oct  1 11:37 subsystem.php
-rw-r--r-- 1 root root 5.4K Oct  1 11:37 system_status.php
-rw-r--r-- 1 root root 4.5K Oct  1 11:37 users.php
-rw-r--r-- 1 root root    0 Oct  1 11:37 wizards.php
I verified that the permissions match on all 3 of my Log Server nodes.

The upgrade seemed like it completed without issue on every server. The only issue was that logstash service crashed after the update on two out of three of them. I remember specifically looking at the cluster status page after the upgrade to see how many shards needed to be reloaded.

Re: Cannot access Cluster Status page after 2.1.0 update

Posted: Wed Oct 02, 2019 4:44 pm
by mbellerue
This is the first thing that's jumping out at me.

Code: Select all

PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 92 bytes) in /var/www/html/nagioslogserver/application/libraries/Elasticsearch.php on line 0, referer: http://nagioslogserver.state.nv.us/nagioslogserver/admin
From the profile it looks like you have plenty of actual memory available, so I'm wondering if it's running up against the Elasticsearch memory limit. Let's go ahead and try to restart Elasticsearch.

Code: Select all

systemctl restart elasticsearch.service
Give that a shot and let me know how it goes.

Re: Cannot access Cluster Status page after 2.1.0 update

Posted: Wed Oct 02, 2019 5:17 pm
by rferebee
I tried restarting Elasticsearch, but it didn't resolve the issue.

Out of frustration I rebooted all the servers and it's still not loading those two pages.

Re: Cannot access Cluster Status page after 2.1.0 update

Posted: Thu Oct 03, 2019 9:16 am
by rferebee
It's strange, my snapshots don't seem to be running either.

The command subsystem screen says that they are, but there are no snapshots being created. I just confirmed that I can write to my repository by creating a test folder from one of my log servers to the repository directory. So, I don't think it's a permissions issue.

Re: Cannot access Cluster Status page after 2.1.0 update

Posted: Thu Oct 03, 2019 9:33 am
by rferebee
It seems like something is wrong, but I don't know where to start:

Code: Select all

root@nagioslscc2:/var/log/elasticsearch> curl -XGET 'http://localhost:9200/_snapshot/nlsrepcc/_all?pretty'
{
  "error" : "RepositoryMissingException[[nlsrepcc] missing]",
  "status" : 404
}

Re: Cannot access Cluster Status page after 2.1.0 update

Posted: Thu Oct 03, 2019 10:01 am
by mbellerue
Okay, I think we're going to need to enable PHP error logging for this. In /etc/php.ini, find the entry log_errors and make sure it is set to On. Below it, create another entry called error_log and set it to something like /var/log/php-errors.log. Restart the Apache daemon with systemctl restart httpd.service Then try accessing the pages that aren't working, and see if it logs anything in the file you specified.

Re: Cannot access Cluster Status page after 2.1.0 update

Posted: Thu Oct 03, 2019 10:03 am
by mbellerue
rferebee wrote:It seems like something is wrong, but I don't know where to start:

Code: Select all

root@nagioslscc2:/var/log/elasticsearch> curl -XGET 'http://localhost:9200/_snapshot/nlsrepcc/_all?pretty'
{
  "error" : "RepositoryMissingException[[nlsrepcc] missing]",
  "status" : 404
}
I did see this error message in the logs, but it looked like nlsrepcc was an NFS share. It doesn't seem to be mounted. Are you storing anything related to Log Server on this share?