NLS upgrade failing - NLS not available

jabi27 · Post by **jabi27** » Fri Jul 10, 2020 6:10 am

We have a 2 node cluster and after/doing upgrade we are not able to start/login to NLS.

Here is what we did:
We have:
- nagios-logserver1.stil.dk/
- nagios-logserver2.stil.dk

# close shards

Code: Select all

curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.enable":"none"}}'

server #2:

Code: Select all

mkdir /root/upgrade_2.1.6
cd /root/upgrade_2.1.6
wget -O upgrade.sh https://assets.nagios.com/downloads/nagios-log-server/upgrade.sh
chmod 700 upgrade.sh
./upgrade
Completed successfully

server #1:

Code: Select all

mkdir /root/upgrade_2.1.6
cd /root/upgrade_2.1.6
wget -O upgrade.sh https://assets.nagios.com/downloads/nagios-log-server/upgrade.sh
./upgrade
..
Kibana upgraded OK
..
Hanging forever ...

The hanging line in upgrade seems to be:

Code: Select all

/usr/bin/php $proddir/www/index.php install/upgrade/$oldversion

Even the first server was upgraded we are not able to use it.

The system is down. Can you advice ?

Best

/Jan

Post by **cdienger** » Fri Jul 10, 2020 2:42 pm

Is the server that is hanging forever still hanging or did you exit out of it?

What version of NLS are you upgrading from?

Check that the services are running:

Code: Select all

systemctl status elasticsearch
systemctl status logstash
systemctl status httpd

Also check the status of the cluster:

Code: Select all

curl 'localhost:9200/_cat/nodes?v'
curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'

Run the second command a couple times to see if the numbers are changing.

Please PM me a profile from both systems. It can be gathered under Admin > System > System Status > Download System Profile or from the command line with:

Code: Select all

/usr/local/nagioslogserver/scripts/profile.sh

This will create /tmp/system-profile.tar.gz.

Note that this file can be very large and may not be able to be uploaded through the system. This is usually due to the logs in the Logstash and/or Elasticsearch directories found in it. If it is too large, please open the profile, extract these directories/files and send them separately.

jabi27 · Post by **jabi27** » Sat Jul 11, 2020 1:10 am

Hi

Yes the services are running:

root@nagios-logserver1:~# systemctl status elasticsearch | grep running
Active: active (running) since Fri 2020-07-10 12:31:44 CEST; 19h ago
root@nagios-logserver1:~# systemctl status logstash | grep running
Active: active (running) since Fri 2020-07-10 14:40:47 CEST; 17h ago
root@nagios-logserver1:~# systemctl status apache2 | grep running
Active: active (running) since Fri 2020-07-10 13:02:13 CEST; 18h ago

I do not know for sure what version we was coming from but I think it was 2.1.2

Code: Select all

root@nagios-logserver2:~# curl 'localhost:9200/_cat/nodes?v'

..
..
Hanging...

Code: Select all

root@nagios-logserver2:~# curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
  "cluster_name" : "a70328ae-b00b-42d8-a48e-8607a24bb151",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 167,
  "active_shards" : 167,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 177,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0
}

The time the Kibana.. was hanging was around 20-30 min (lunch break). There was no activity from the process.

I will run and uploade the system profiles.

Thanks and Best

/Jan

jabi27 · Post by **jabi27** » Sat Jul 11, 2020 1:13 am

Hi,

I forgot, yes I did a Ctrl-c to the process.

Best

/Jan

jabi27 · Post by **jabi27** » Sat Jul 11, 2020 1:26 am

This eventually went back:

Code: Select all

root@nagios-logserver1:~#  curl 'localhost:9200/_cat/nodes?v'
host              ip              heap.percent ram.percent load node.role master name                                 
nagios-logserver1 195.231.242.169           99          14 1.43 d         *      4f455685-4ec4-42f4-932e-54121c3871af

jabi27 · Post by **jabi27** » Sat Jul 11, 2020 1:46 am

And now server 2 finished

Code: Select all

host              ip              heap.percent ram.percent load node.role master name                                 
nagios-logserver1 195.231.242.169           99          14 2.84 d         *      4f455685-4ec4-42f4-932e-54121c3871af

Post by **cdienger** » Mon Jul 13, 2020 9:33 am

Thanks for the update and data. I've taken ownership of the ticket you've opened for this case. We'll close out this thread and I will respond shortly to the ticket.

Nagios Support Forum

NLS upgrade failing - NLS not available

NLS upgrade failing - NLS not available

Re: NLS upgrade failing - NLS not available

Re: NLS upgrade failing - NLS not available

Re: NLS upgrade failing - NLS not available

Re: NLS upgrade failing - NLS not available

Re: NLS upgrade failing - NLS not available

Re: NLS upgrade failing - NLS not available