Trying to figure out why logstash changed to active (exited)

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Trying to figure out why logstash changed to active (exi

Post by rferebee »

I was reading this thread: https://support.nagios.com/forum/viewto ... 38&t=55892

It looks like our LS_HEAP_SIZE is only set to "500m". We have 64GB of memory available on each node. I know that Elasticsearch uses half automatically, but would it make sense to increase the LS_HEAP_SIZE to something like 2048m which is what was recommended to another customer?

Thank you.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Trying to figure out why logstash changed to active (exi

Post by cdienger »

Yes, absolutely. I thought we had provide this instruction before which is why we didn't point it out in this thread. Let us know if that helps things.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Trying to figure out why logstash changed to active (exi

Post by rferebee »

Another question, is there a way to write this variable so that it uses half the total memory minus 1 gigabyte? So, in my case, since we have 64GBs total it would still end up being 31GBs.

Code: Select all

ES_HEAP_SIZE=$(expr $(free -m|awk '/^Mem:/{print $2}') / 2 )m
I'm hesitant to hard code it to 31000m because, if for some reason, someone were to decide to reduce the amount of system memory without telling me it would totally hose the environment. Whereas if we use a dynamic expression it eliminates that very unlikely risk.

I want to reduce it to 31GBs because every time elasticsearch crashes it can't restart because there isn't enough memory available. The server itself is always using some and so is logstash.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Trying to figure out why logstash changed to active (exi

Post by cdienger »

It can be changed to:

Code: Select all

ES_HEAP_SIZE=$(expr $(free -m|awk '/^Mem:/{print $2}') / 2 - 1000)m
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Trying to figure out why logstash changed to active (exi

Post by rferebee »

Something happened this morning just before 8AM and my Log Server environment is running really poorly. CPU usage is way down from normal and it's not collecting logs like it should be even though it says logstash and elasticsearch are running on all 3 nodes...

Looking at the elasticsearch logs, I'm seeing this over and over again:

Code: Select all

[2019-11-21 09:48:56,857][WARN ][monitor.jvm              ] [4deb767e-abbd-43f0-8839-049760687e98] [gc][old][72541][6055] duration [23.1s], collections [1]/[23.8s], total [23.1s]/[1.8h], memory [30.9gb]->[30.8gb]/[31.3gb], all_pools {[young] [21.8mb]->[7.1mb]/[399.4mb]}{[survivor] [0b]->[0b]/[49.8mb]}{[old] [30.8gb]->[30.8gb]/[30.8gb]}
[2019-11-21 09:49:16,413][WARN ][monitor.jvm              ] [4deb767e-abbd-43f0-8839-049760687e98] [gc][old][72542][6056] duration [18.8s], collections [1]/[19.5s], total [18.8s]/[1.8h], memory [30.8gb]->[30.9gb]/[31.3gb], all_pools {[young] [7.1mb]->[25.9mb]/[399.4mb]}{[survivor] [0b]->[0b]/[49.8mb]}{[old] [30.8gb]->[30.8gb]/[30.8gb]}
Any ideas?
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Trying to figure out why logstash changed to active (exi

Post by rferebee »

I seem to be having very similar issues described in this thread: https://support.nagios.com/forum/viewto ... 38&t=35491

I majority of the recommendations have already been addressed, but the symptoms apply none the less.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Trying to figure out why logstash changed to active (exi

Post by cdienger »

Can you send me the entire log for review? It appears to be doing garbage collection - removing items from memory - which can pause things. Ideally it shouldn't happen frequently or last long, but if you are still seeing it then restarting the elasticsearch service will free up the memory and get things back in order.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Trying to figure out why logstash changed to active (exi

Post by rferebee »

PM sent.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Trying to figure out why logstash changed to active (exi

Post by cdienger »

Received and reviewing. Did you end up restarting the service? Has performance returned?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Trying to figure out why logstash changed to active (exi

Post by rferebee »

Actually, I had to reboot the entire environment. Despite all nodes indicating that both elasticsearch and logstash were up and running, there was almost zero log collection for about 2 (around 8AM to 10AM) hours yesterday.

Everything seems to be working today.
Locked