Page 1 of 1

NLS upgrade failing - NLS not available

Posted: Fri Jul 10, 2020 6:10 am
by jabi27
We have a 2 node cluster and after/doing upgrade we are not able to start/login to NLS.

Here is what we did:
We have:
- nagios-logserver1.stil.dk/
- nagios-logserver2.stil.dk

# close shards

Code: Select all

curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.enable":"none"}}'
server #2:

Code: Select all

mkdir /root/upgrade_2.1.6
cd /root/upgrade_2.1.6
wget -O upgrade.sh https://assets.nagios.com/downloads/nagios-log-server/upgrade.sh
chmod 700 upgrade.sh
./upgrade
Completed successfully
server #1:

Code: Select all

mkdir /root/upgrade_2.1.6
cd /root/upgrade_2.1.6
wget -O upgrade.sh https://assets.nagios.com/downloads/nagios-log-server/upgrade.sh
./upgrade
..
Kibana upgraded OK
..
Hanging forever ...
The hanging line in upgrade seems to be:

Code: Select all

/usr/bin/php $proddir/www/index.php install/upgrade/$oldversion
Even the first server was upgraded we are not able to use it.

The system is down. Can you advice ?

Best

/Jan

Re: NLS upgrade failing - NLS not available

Posted: Fri Jul 10, 2020 2:42 pm
by cdienger
Is the server that is hanging forever still hanging or did you exit out of it?

What version of NLS are you upgrading from?

Check that the services are running:

Code: Select all

systemctl status elasticsearch
systemctl status logstash
systemctl status httpd
Also check the status of the cluster:

Code: Select all

curl 'localhost:9200/_cat/nodes?v'
curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
Run the second command a couple times to see if the numbers are changing.

Please PM me a profile from both systems. It can be gathered under Admin > System > System Status > Download System Profile or from the command line with:

Code: Select all

/usr/local/nagioslogserver/scripts/profile.sh
This will create /tmp/system-profile.tar.gz.

Note that this file can be very large and may not be able to be uploaded through the system. This is usually due to the logs in the Logstash and/or Elasticsearch directories found in it. If it is too large, please open the profile, extract these directories/files and send them separately.

Re: NLS upgrade failing - NLS not available

Posted: Sat Jul 11, 2020 1:10 am
by jabi27
Hi

Yes the services are running:
root@nagios-logserver1:~# systemctl status elasticsearch | grep running
Active: active (running) since Fri 2020-07-10 12:31:44 CEST; 19h ago
root@nagios-logserver1:~# systemctl status logstash | grep running
Active: active (running) since Fri 2020-07-10 14:40:47 CEST; 17h ago
root@nagios-logserver1:~# systemctl status apache2 | grep running
Active: active (running) since Fri 2020-07-10 13:02:13 CEST; 18h ago
I do not know for sure what version we was coming from but I think it was 2.1.2

Code: Select all

root@nagios-logserver2:~# curl 'localhost:9200/_cat/nodes?v'
..
..
Hanging...

Code: Select all

root@nagios-logserver2:~# curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
  "cluster_name" : "a70328ae-b00b-42d8-a48e-8607a24bb151",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 167,
  "active_shards" : 167,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 177,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0
}
The time the Kibana.. was hanging was around 20-30 min (lunch break). There was no activity from the process.

I will run and uploade the system profiles.

Thanks and Best

/Jan

Re: NLS upgrade failing - NLS not available

Posted: Sat Jul 11, 2020 1:13 am
by jabi27
Hi,

I forgot, yes I did a Ctrl-c to the process.

Best

/Jan

Re: NLS upgrade failing - NLS not available

Posted: Sat Jul 11, 2020 1:26 am
by jabi27
This eventually went back:

Code: Select all

root@nagios-logserver1:~#  curl 'localhost:9200/_cat/nodes?v'
host              ip              heap.percent ram.percent load node.role master name                                 
nagios-logserver1 195.231.242.169           99          14 1.43 d         *      4f455685-4ec4-42f4-932e-54121c3871af 

Re: NLS upgrade failing - NLS not available

Posted: Sat Jul 11, 2020 1:46 am
by jabi27
And now server 2 finished

Code: Select all

host              ip              heap.percent ram.percent load node.role master name                                 
nagios-logserver1 195.231.242.169           99          14 2.84 d         *      4f455685-4ec4-42f4-932e-54121c3871af 


Re: NLS upgrade failing - NLS not available

Posted: Mon Jul 13, 2020 9:33 am
by cdienger
Thanks for the update and data. I've taken ownership of the ticket you've opened for this case. We'll close out this thread and I will respond shortly to the ticket.