Page 3 of 5
Re: Trying to figure out why logstash changed to active (exi
Posted: Tue Nov 19, 2019 6:52 pm
by rferebee
I was reading this thread:
https://support.nagios.com/forum/viewto ... 38&t=55892
It looks like our LS_HEAP_SIZE is only set to "500m". We have 64GB of memory available on each node. I know that Elasticsearch uses half automatically, but would it make sense to increase the LS_HEAP_SIZE to something like 2048m which is what was recommended to another customer?
Thank you.
Re: Trying to figure out why logstash changed to active (exi
Posted: Wed Nov 20, 2019 12:04 pm
by cdienger
Yes, absolutely. I thought we had provide this instruction before which is why we didn't point it out in this thread. Let us know if that helps things.
Re: Trying to figure out why logstash changed to active (exi
Posted: Wed Nov 20, 2019 3:40 pm
by rferebee
Another question, is there a way to write this variable so that it uses half the total memory minus 1 gigabyte? So, in my case, since we have 64GBs total it would still end up being 31GBs.
Code: Select all
ES_HEAP_SIZE=$(expr $(free -m|awk '/^Mem:/{print $2}') / 2 )m
I'm hesitant to hard code it to 31000m because, if for some reason, someone were to decide to reduce the amount of system memory without telling me it would totally hose the environment. Whereas if we use a dynamic expression it eliminates that very unlikely risk.
I want to reduce it to 31GBs because every time elasticsearch crashes it can't restart because there isn't enough memory available. The server itself is always using some and so is logstash.
Re: Trying to figure out why logstash changed to active (exi
Posted: Wed Nov 20, 2019 4:59 pm
by cdienger
It can be changed to:
Code: Select all
ES_HEAP_SIZE=$(expr $(free -m|awk '/^Mem:/{print $2}') / 2 - 1000)m
Re: Trying to figure out why logstash changed to active (exi
Posted: Thu Nov 21, 2019 12:53 pm
by rferebee
Something happened this morning just before 8AM and my Log Server environment is running really poorly. CPU usage is way down from normal and it's not collecting logs like it should be even though it says logstash and elasticsearch are running on all 3 nodes...
Looking at the elasticsearch logs, I'm seeing this over and over again:
Code: Select all
[2019-11-21 09:48:56,857][WARN ][monitor.jvm ] [4deb767e-abbd-43f0-8839-049760687e98] [gc][old][72541][6055] duration [23.1s], collections [1]/[23.8s], total [23.1s]/[1.8h], memory [30.9gb]->[30.8gb]/[31.3gb], all_pools {[young] [21.8mb]->[7.1mb]/[399.4mb]}{[survivor] [0b]->[0b]/[49.8mb]}{[old] [30.8gb]->[30.8gb]/[30.8gb]}
[2019-11-21 09:49:16,413][WARN ][monitor.jvm ] [4deb767e-abbd-43f0-8839-049760687e98] [gc][old][72542][6056] duration [18.8s], collections [1]/[19.5s], total [18.8s]/[1.8h], memory [30.8gb]->[30.9gb]/[31.3gb], all_pools {[young] [7.1mb]->[25.9mb]/[399.4mb]}{[survivor] [0b]->[0b]/[49.8mb]}{[old] [30.8gb]->[30.8gb]/[30.8gb]}
Any ideas?
Re: Trying to figure out why logstash changed to active (exi
Posted: Thu Nov 21, 2019 2:47 pm
by rferebee
I seem to be having very similar issues described in this thread:
https://support.nagios.com/forum/viewto ... 38&t=35491
I majority of the recommendations have already been addressed, but the symptoms apply none the less.
Re: Trying to figure out why logstash changed to active (exi
Posted: Thu Nov 21, 2019 5:51 pm
by cdienger
Can you send me the entire log for review? It appears to be doing garbage collection - removing items from memory - which can pause things. Ideally it shouldn't happen frequently or last long, but if you are still seeing it then restarting the elasticsearch service will free up the memory and get things back in order.
Re: Trying to figure out why logstash changed to active (exi
Posted: Fri Nov 22, 2019 11:51 am
by rferebee
PM sent.
Re: Trying to figure out why logstash changed to active (exi
Posted: Fri Nov 22, 2019 4:06 pm
by cdienger
Received and reviewing. Did you end up restarting the service? Has performance returned?
Re: Trying to figure out why logstash changed to active (exi
Posted: Fri Nov 22, 2019 4:20 pm
by rferebee
Actually, I had to reboot the entire environment. Despite all nodes indicating that both elasticsearch and logstash were up and running, there was almost zero log collection for about 2 (around 8AM to 10AM) hours yesterday.
Everything seems to be working today.