adding extra NLS datapath

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
rocheryderm
Posts: 69
Joined: Fri Jul 13, 2018 1:09 pm

adding extra NLS datapath

Post by rocheryderm »

Hello

I'm on RHEL and running out of space. For reasons I cannot change, I can't extend the existing volume group that my ES data is on. I will need to rebuild my instances later, but in the short term I need to add more diskspace to my existing instances.

I've read the document here: https://library.nagios.com/library/prod ... tore-path/

It hints that I can add an extra data path, but no example of that.

I am going to make an educated guess, that I can modify the default configuration in /etc/sysconfig/elasticsearch from

Code: Select all

DATA_DIR="$ES_HOME/data"
to

Code: Select all

DATA_DIR="$ES_HOME/data,/newpath/elasticsearch/data"
Which would continue to recognize the original path, along with the new path, "/newpath/elasticsearch/data"

Is this correct?

Mike
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: adding extra NLS datapath

Post by scottwilkerson »

That is correct.

Once you make the change you will need to restart ES for the changes to take effect.

When having multiple paths NLS will favor adding new data to the path with the most available storage.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
rocheryderm
Posts: 69
Joined: Fri Jul 13, 2018 1:09 pm

Re: adding extra NLS datapath

Post by rocheryderm »

So... I performed the steps on 3 out of 4 instances. I hesitate to do the last because of these errors and the current cluster condition (red)

Code: Select all

[2019-11-14 14:16:28,113][INFO ][cluster.service          ] [39e75611-f913-4be5-969e-b6ad41fd5437] detected_master [e38e8f57-86f7-467a-a4ef-849d57c973ae][VX5-6N3xTbabphqE_qwfrg][rbbusnls2p][inet[/151.120.113.52:9300]]{max_local_storage_nodes=1}, added {[e38e8f57-86f7-467a-a4ef-849d57c973ae][VX5-6N3xTbabphqE_qwfrg][rbbusnls2p][inet[/151.120.113.52:9300]]{max_local_storage_nodes=1},[77596958-30db-4cb4-bf11-09e114a44012][gzig2kumQWK-gOdvuMK2Ig][rbbusnls1p][inet[/151.120.113.51:9300]]{max_local_storage_nodes=1},[dda7f85c-6641-4b98-b573-fbdf7121c025][fospnu7wQXiD6d3jkEd_bg][rbbusnls4p][inet[/151.120.113.54:9300]]{max_local_storage_nodes=1},}, reason: zen-disco-receive(from master [[e38e8f57-86f7-467a-a4ef-849d57c973ae][VX5-6N3xTbabphqE_qwfrg][rbbusnls2p][inet[/151.120.113.52:9300]]{max_local_storage_nodes=1}])
[2019-11-14 14:16:28,437][INFO ][http                     ] [39e75611-f913-4be5-969e-b6ad41fd5437] bound_address {inet[/0.0.0.0:9200]}, publish_address {inet[/151.120.113.53:9200]}
[2019-11-14 14:16:28,437][INFO ][node                     ] [39e75611-f913-4be5-969e-b6ad41fd5437] started
[2019-11-14 14:16:28,668][INFO ][indices.store            ] [39e75611-f913-4be5-969e-b6ad41fd5437] Failed to open / find files while reading metadata snapshot
[2019-11-14 14:16:28,677][INFO ][indices.store            ] [39e75611-f913-4be5-969e-b6ad41fd5437] Failed to open / find files while reading metadata snapshot
[2019-11-14 14:16:28,678][INFO ][indices.store            ] [39e75611-f913-4be5-969e-b6ad41fd5437] Failed to open / find files while reading metadata snapshot
[2019-11-14 14:16:28,713][INFO ][indices.store            ] [39e75611-f913-4be5-969e-b6ad41fd5437] Failed to open / find files while reading metadata snapshot
[2019-11-14 14:16:28,713][INFO ][indices.store            ] [39e75611-f913-4be5-969e-b6ad41fd5437] Failed to open / find files while reading metadata snapshot
I get a few screenfulls of that "Failed to open" error, but the service is started.

Except now...

Code: Select all

[root@rbbusnls1p ~]# curl -XGET 'localhost:9200/_cluster/health?pretty'
{
  "cluster_name" : "15edd11f-8263-4eb7-9054-8ace66feebb6",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 4,
  "number_of_data_nodes" : 4,
  "active_primary_shards" : 936,
  "active_shards" : 1069,
  "relocating_shards" : 0,
  "initializing_shards" : 8,
  "unassigned_shards" : 2439,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0
}
[root@rbbusnls1p ~]#
Is this normal or "ok"? Or do I need to debug anything or wait for all unassigned shards to be re-assigned?

Thank you
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: adding extra NLS datapath

Post by scottwilkerson »

You probably just need to wait for the unassigned shards to redistribute.

Generally speaking I would recommend only making changes to 1 instance at a time which should keep it always green.

Are you seeing files being placed in /newpath/elasticsearch/data?

Also you did set the permissions on the path correct?

Code: Select all

chown -R nagios:nagios /newpath/elasticsearch/data
chmod -R 0775 /newpath/elasticsearch/data
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
rocheryderm
Posts: 69
Joined: Fri Jul 13, 2018 1:09 pm

Re: adding extra NLS datapath

Post by rocheryderm »

Yes... indices are showing up on the new path. I did not realize the significant load this would create on the cluster, I didn't expect it to move shards onto the new path immediately. So busy I can't even logon to the NLS GUI (timeout)

All SSDs and still... taking a loooooong time to assign shards.

I think I'll wait on the last instance until tomorrow...
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: adding extra NLS datapath

Post by scottwilkerson »

Hmm, to be honest I didn't know that it was going to re-shuffle everything either or I would have warned you...
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
rocheryderm
Posts: 69
Joined: Fri Jul 13, 2018 1:09 pm

Re: adding extra NLS datapath

Post by rocheryderm »

scottwilkerson wrote:Hmm, to be honest I didn't know that it was going to re-shuffle everything either or I would have warned you...
Note to self... :lol:

Ah well, hopefully it's all worked out tomorrow...
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: adding extra NLS datapath

Post by scottwilkerson »

rocheryderm wrote: Note to self... :lol:

Ah well, hopefully it's all worked out tomorrow...
I'm definitely making note... Let us know how it turns out tomorrow
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
rocheryderm
Posts: 69
Joined: Fri Jul 13, 2018 1:09 pm

Re: adding extra NLS datapath

Post by rocheryderm »

So...

Cluster stopped dealing with UNASSIGNED shards a few hours after I left work yesterday.
No progress since then. Nothing helpful in the log files /var/log/elasticsearch...

Decided to stop the whole cluster and restart.

Of course, **more** shards are unassigned thanks to that, but at least now the response to

Code: Select all

curl -XGet 'localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason' | grep UNASSIGNED
is shards with a status of "UNASSIGNED CLUSTER_RECOVERED", which the cluster seems to be re-assigning... at a glacier's pace. Most troubling is that during this, the cluster times out when I try to logon at the GUI. Frustrating.

The blades aren't anywhere near SSDs, network, cpu or memory capacity due to this - I feel like ES is throttling this behavior -- can you think of any ways I could temporarily adjust something to pick up the pace?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: adding extra NLS datapath

Post by scottwilkerson »

I believe you can bump up the speed with the following

Code: Select all

curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" :{
"indices.recovery.max_bytes_per_sec" : "250mb"
}
}';
run on any node in the cluster
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked