Trying to figure out why logstash changed to active (exited)

rferebee · Post by **rferebee** » Tue Nov 19, 2019 6:52 pm

I was reading this thread: https://support.nagios.com/forum/viewto ... 38&t=55892

It looks like our LS_HEAP_SIZE is only set to "500m". We have 64GB of memory available on each node. I know that Elasticsearch uses half automatically, but would it make sense to increase the LS_HEAP_SIZE to something like 2048m which is what was recommended to another customer?

Thank you.

Post by **cdienger** » Wed Nov 20, 2019 12:04 pm

Yes, absolutely. I thought we had provide this instruction before which is why we didn't point it out in this thread. Let us know if that helps things.

rferebee · Post by **rferebee** » Wed Nov 20, 2019 3:40 pm

Another question, is there a way to write this variable so that it uses half the total memory minus 1 gigabyte? So, in my case, since we have 64GBs total it would still end up being 31GBs.

Code: Select all

ES_HEAP_SIZE=$(expr $(free -m|awk '/^Mem:/{print $2}') / 2 )m

I'm hesitant to hard code it to 31000m because, if for some reason, someone were to decide to reduce the amount of system memory without telling me it would totally hose the environment. Whereas if we use a dynamic expression it eliminates that very unlikely risk.

I want to reduce it to 31GBs because every time elasticsearch crashes it can't restart because there isn't enough memory available. The server itself is always using some and so is logstash.

Post by **cdienger** » Wed Nov 20, 2019 4:59 pm

It can be changed to:

Code: Select all

ES_HEAP_SIZE=$(expr $(free -m|awk '/^Mem:/{print $2}') / 2 - 1000)m

rferebee · Post by **rferebee** » Thu Nov 21, 2019 12:53 pm

Something happened this morning just before 8AM and my Log Server environment is running really poorly. CPU usage is way down from normal and it's not collecting logs like it should be even though it says logstash and elasticsearch are running on all 3 nodes...

Looking at the elasticsearch logs, I'm seeing this over and over again:

Code: Select all

[2019-11-21 09:48:56,857][WARN ][monitor.jvm              ] [4deb767e-abbd-43f0-8839-049760687e98] [gc][old][72541][6055] duration [23.1s], collections [1]/[23.8s], total [23.1s]/[1.8h], memory [30.9gb]->[30.8gb]/[31.3gb], all_pools {[young] [21.8mb]->[7.1mb]/[399.4mb]}{[survivor] [0b]->[0b]/[49.8mb]}{[old] [30.8gb]->[30.8gb]/[30.8gb]}
[2019-11-21 09:49:16,413][WARN ][monitor.jvm              ] [4deb767e-abbd-43f0-8839-049760687e98] [gc][old][72542][6056] duration [18.8s], collections [1]/[19.5s], total [18.8s]/[1.8h], memory [30.8gb]->[30.9gb]/[31.3gb], all_pools {[young] [7.1mb]->[25.9mb]/[399.4mb]}{[survivor] [0b]->[0b]/[49.8mb]}{[old] [30.8gb]->[30.8gb]/[30.8gb]}

Any ideas?

rferebee · Post by **rferebee** » Thu Nov 21, 2019 2:47 pm

I seem to be having very similar issues described in this thread: https://support.nagios.com/forum/viewto ... 38&t=35491

I majority of the recommendations have already been addressed, but the symptoms apply none the less.

Post by **cdienger** » Thu Nov 21, 2019 5:51 pm

Can you send me the entire log for review? It appears to be doing garbage collection - removing items from memory - which can pause things. Ideally it shouldn't happen frequently or last long, but if you are still seeing it then restarting the elasticsearch service will free up the memory and get things back in order.

rferebee · Post by **rferebee** » Fri Nov 22, 2019 11:51 am

PM sent.

Post by **cdienger** » Fri Nov 22, 2019 4:06 pm

Received and reviewing. Did you end up restarting the service? Has performance returned?

rferebee · Post by **rferebee** » Fri Nov 22, 2019 4:20 pm

Actually, I had to reboot the entire environment. Despite all nodes indicating that both elasticsearch and logstash were up and running, there was almost zero log collection for about 2 (around 8AM to 10AM) hours yesterday.

Everything seems to be working today.

Nagios Support Forum

Trying to figure out why logstash changed to active (exited)

Re: Trying to figure out why logstash changed to active (exi

Re: Trying to figure out why logstash changed to active (exi

Re: Trying to figure out why logstash changed to active (exi

Re: Trying to figure out why logstash changed to active (exi

Re: Trying to figure out why logstash changed to active (exi

Re: Trying to figure out why logstash changed to active (exi

Re: Trying to figure out why logstash changed to active (exi

Re: Trying to figure out why logstash changed to active (exi

Re: Trying to figure out why logstash changed to active (exi

Re: Trying to figure out why logstash changed to active (exi