Nagios Support Forum

Posted: **Thu Nov 14, 2019 10:31 am**

Hello

I'm on RHEL and running out of space. For reasons I cannot change, I can't extend the existing volume group that my ES data is on. I will need to rebuild my instances later, but in the short term I need to add more diskspace to my existing instances.

I've read the document here: https://library.nagios.com/library/prod ... tore-path/

It hints that I can add an extra data path, but no example of that.

I am going to make an educated guess, that I can modify the default configuration in /etc/sysconfig/elasticsearch from

Code: Select all

DATA_DIR="$ES_HOME/data"

to

Code: Select all

DATA_DIR="$ES_HOME/data,/newpath/elasticsearch/data"

Which would continue to recognize the original path, along with the new path, "/newpath/elasticsearch/data"

Is this correct?

Mike

Posted: **Thu Nov 14, 2019 11:29 am**

That is correct.

Once you make the change you will need to restart ES for the changes to take effect.

When having multiple paths NLS will favor adding new data to the path with the most available storage.

Posted: **Thu Nov 14, 2019 2:22 pm**

So... I performed the steps on 3 out of 4 instances. I hesitate to do the last because of these errors and the current cluster condition (red)

Code: Select all

[2019-11-14 14:16:28,113][INFO ][cluster.service          ] [39e75611-f913-4be5-969e-b6ad41fd5437] detected_master [e38e8f57-86f7-467a-a4ef-849d57c973ae][VX5-6N3xTbabphqE_qwfrg][rbbusnls2p][inet[/151.120.113.52:9300]]{max_local_storage_nodes=1}, added {[e38e8f57-86f7-467a-a4ef-849d57c973ae][VX5-6N3xTbabphqE_qwfrg][rbbusnls2p][inet[/151.120.113.52:9300]]{max_local_storage_nodes=1},[77596958-30db-4cb4-bf11-09e114a44012][gzig2kumQWK-gOdvuMK2Ig][rbbusnls1p][inet[/151.120.113.51:9300]]{max_local_storage_nodes=1},[dda7f85c-6641-4b98-b573-fbdf7121c025][fospnu7wQXiD6d3jkEd_bg][rbbusnls4p][inet[/151.120.113.54:9300]]{max_local_storage_nodes=1},}, reason: zen-disco-receive(from master [[e38e8f57-86f7-467a-a4ef-849d57c973ae][VX5-6N3xTbabphqE_qwfrg][rbbusnls2p][inet[/151.120.113.52:9300]]{max_local_storage_nodes=1}])
[2019-11-14 14:16:28,437][INFO ][http                     ] [39e75611-f913-4be5-969e-b6ad41fd5437] bound_address {inet[/0.0.0.0:9200]}, publish_address {inet[/151.120.113.53:9200]}
[2019-11-14 14:16:28,437][INFO ][node                     ] [39e75611-f913-4be5-969e-b6ad41fd5437] started
[2019-11-14 14:16:28,668][INFO ][indices.store            ] [39e75611-f913-4be5-969e-b6ad41fd5437] Failed to open / find files while reading metadata snapshot
[2019-11-14 14:16:28,677][INFO ][indices.store            ] [39e75611-f913-4be5-969e-b6ad41fd5437] Failed to open / find files while reading metadata snapshot
[2019-11-14 14:16:28,678][INFO ][indices.store            ] [39e75611-f913-4be5-969e-b6ad41fd5437] Failed to open / find files while reading metadata snapshot
[2019-11-14 14:16:28,713][INFO ][indices.store            ] [39e75611-f913-4be5-969e-b6ad41fd5437] Failed to open / find files while reading metadata snapshot
[2019-11-14 14:16:28,713][INFO ][indices.store            ] [39e75611-f913-4be5-969e-b6ad41fd5437] Failed to open / find files while reading metadata snapshot

I get a few screenfulls of that "Failed to open" error, but the service is started.

Except now...

Code: Select all

[root@rbbusnls1p ~]# curl -XGET 'localhost:9200/_cluster/health?pretty'
{
  "cluster_name" : "15edd11f-8263-4eb7-9054-8ace66feebb6",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 4,
  "number_of_data_nodes" : 4,
  "active_primary_shards" : 936,
  "active_shards" : 1069,
  "relocating_shards" : 0,
  "initializing_shards" : 8,
  "unassigned_shards" : 2439,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0
}
[root@rbbusnls1p ~]#

Is this normal or "ok"? Or do I need to debug anything or wait for all unassigned shards to be re-assigned?

Thank you

Posted: **Thu Nov 14, 2019 2:58 pm**

You probably just need to wait for the unassigned shards to redistribute.

Generally speaking I would recommend only making changes to 1 instance at a time which should keep it always green.

Are you seeing files being placed in /newpath/elasticsearch/data?

Also you did set the permissions on the path correct?

Code: Select all

chown -R nagios:nagios /newpath/elasticsearch/data
chmod -R 0775 /newpath/elasticsearch/data

Posted: **Thu Nov 14, 2019 4:04 pm**

Yes... indices are showing up on the new path. I did not realize the significant load this would create on the cluster, I didn't expect it to move shards onto the new path immediately. So busy I can't even logon to the NLS GUI (timeout)

All SSDs and still... taking a loooooong time to assign shards.

I think I'll wait on the last instance until tomorrow...

Posted: **Thu Nov 14, 2019 4:09 pm**

Hmm, to be honest I didn't know that it was going to re-shuffle everything either or I would have warned you...

Posted: **Thu Nov 14, 2019 5:27 pm**

scottwilkerson wrote:Hmm, to be honest I didn't know that it was going to re-shuffle everything either or I would have warned you...

Note to self...

Ah well, hopefully it's all worked out tomorrow...

Posted: **Thu Nov 14, 2019 5:34 pm**

rocheryderm wrote: Note to self...

Ah well, hopefully it's all worked out tomorrow...

I'm definitely making note... Let us know how it turns out tomorrow

Posted: **Fri Nov 15, 2019 10:09 am**

So...

Cluster stopped dealing with UNASSIGNED shards a few hours after I left work yesterday.
No progress since then. Nothing helpful in the log files /var/log/elasticsearch...

Decided to stop the whole cluster and restart.

Of course, **more** shards are unassigned thanks to that, but at least now the response to

Code: Select all

curl -XGet 'localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason' | grep UNASSIGNED

is shards with a status of "UNASSIGNED CLUSTER_RECOVERED", which the cluster seems to be re-assigning... at a glacier's pace. Most troubling is that during this, the cluster times out when I try to logon at the GUI. Frustrating.

The blades aren't anywhere near SSDs, network, cpu or memory capacity due to this - I feel like ES is throttling this behavior -- can you think of any ways I could temporarily adjust something to pick up the pace?

Posted: **Fri Nov 15, 2019 10:15 am**

I believe you can bump up the speed with the following

Code: Select all

curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" :{
"indices.recovery.max_bytes_per_sec" : "250mb"
}
}';

run on any node in the cluster

Nagios Support Forum

adding extra NLS datapath

adding extra NLS datapath

Re: adding extra NLS datapath

Re: adding extra NLS datapath

Re: adding extra NLS datapath

Re: adding extra NLS datapath

Re: adding extra NLS datapath

Re: adding extra NLS datapath

Re: adding extra NLS datapath

Re: adding extra NLS datapath

Re: adding extra NLS datapath