Page 1 of 1
NLS upgrade failing - NLS not available
Posted: Fri Jul 10, 2020 6:10 am
by jabi27
We have a 2 node cluster and after/doing upgrade we are not able to start/login to NLS.
Here is what we did:
We have:
- nagios-logserver1.stil.dk/
- nagios-logserver2.stil.dk
# close shards
Code: Select all
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.enable":"none"}}'
server #2:
Code: Select all
mkdir /root/upgrade_2.1.6
cd /root/upgrade_2.1.6
wget -O upgrade.sh https://assets.nagios.com/downloads/nagios-log-server/upgrade.sh
chmod 700 upgrade.sh
./upgrade
Completed successfully
server #1:
Code: Select all
mkdir /root/upgrade_2.1.6
cd /root/upgrade_2.1.6
wget -O upgrade.sh https://assets.nagios.com/downloads/nagios-log-server/upgrade.sh
./upgrade
..
Kibana upgraded OK
..
Hanging forever ...
The hanging line in upgrade seems to be:
Code: Select all
/usr/bin/php $proddir/www/index.php install/upgrade/$oldversion
Even the first server was upgraded we are not able to use it.
The system is down. Can you advice ?
Best
/Jan
Re: NLS upgrade failing - NLS not available
Posted: Fri Jul 10, 2020 2:42 pm
by cdienger
Is the server that is hanging forever still hanging or did you exit out of it?
What version of NLS are you upgrading from?
Check that the services are running:
Code: Select all
systemctl status elasticsearch
systemctl status logstash
systemctl status httpd
Also check the status of the cluster:
Code: Select all
curl 'localhost:9200/_cat/nodes?v'
curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
Run the second command a couple times to see if the numbers are changing.
Please PM me a profile from both systems. It can be gathered under Admin > System > System Status > Download System Profile or from the command line with:
Code: Select all
/usr/local/nagioslogserver/scripts/profile.sh
This will create /tmp/system-profile.tar.gz.
Note that this file can be very large and may not be able to be uploaded through the system. This is usually due to the logs in the Logstash and/or Elasticsearch directories found in it. If it is too large, please open the profile, extract these directories/files and send them separately.
Re: NLS upgrade failing - NLS not available
Posted: Sat Jul 11, 2020 1:10 am
by jabi27
Hi
Yes the services are running:
root@nagios-logserver1:~# systemctl status elasticsearch | grep running
Active: active (running) since Fri 2020-07-10 12:31:44 CEST; 19h ago
root@nagios-logserver1:~# systemctl status logstash | grep running
Active: active (running) since Fri 2020-07-10 14:40:47 CEST; 17h ago
root@nagios-logserver1:~# systemctl status apache2 | grep running
Active: active (running) since Fri 2020-07-10 13:02:13 CEST; 18h ago
I do not know for sure what version we was coming from but I think it was 2.1.2
Code: Select all
root@nagios-logserver2:~# curl 'localhost:9200/_cat/nodes?v'
..
..
Hanging...
Code: Select all
root@nagios-logserver2:~# curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "a70328ae-b00b-42d8-a48e-8607a24bb151",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 167,
"active_shards" : 167,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 177,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0
}
The time the Kibana.. was hanging was around 20-30 min (lunch break). There was no activity from the process.
I will run and uploade the system profiles.
Thanks and Best
/Jan
Re: NLS upgrade failing - NLS not available
Posted: Sat Jul 11, 2020 1:13 am
by jabi27
Hi,
I forgot, yes I did a Ctrl-c to the process.
Best
/Jan
Re: NLS upgrade failing - NLS not available
Posted: Sat Jul 11, 2020 1:26 am
by jabi27
This eventually went back:
Code: Select all
root@nagios-logserver1:~# curl 'localhost:9200/_cat/nodes?v'
host ip heap.percent ram.percent load node.role master name
nagios-logserver1 195.231.242.169 99 14 1.43 d * 4f455685-4ec4-42f4-932e-54121c3871af
Re: NLS upgrade failing - NLS not available
Posted: Sat Jul 11, 2020 1:46 am
by jabi27
And now server 2 finished
Code: Select all
host ip heap.percent ram.percent load node.role master name
nagios-logserver1 195.231.242.169 99 14 2.84 d * 4f455685-4ec4-42f4-932e-54121c3871af
Re: NLS upgrade failing - NLS not available
Posted: Mon Jul 13, 2020 9:33 am
by cdienger
Thanks for the update and data. I've taken ownership of the ticket you've opened for this case. We'll close out this thread and I will respond shortly to the ticket.