Heap Space: OutOfMemoryError

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Locked
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Heap Space: OutOfMemoryError

Post by CFT6Server »

We had some issues this morning on the infrastructure and the nodes had to be reboot. Logstash is crashing in one of the nodes with the following message:

Code: Select all

 Exception in thread "Ruby-0-Thread-40: /usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:92" java.lang.ArrayIndexOutOfBoundsException: -1
        at org.jruby.runtime.ThreadContext.popRubyClass(ThreadContext.java:697)
        at org.jruby.runtime.ThreadContext.postYield(ThreadContext.java:1257)
        at org.jruby.runtime.ContextAwareBlockBody.post(ContextAwareBlockBody.java:29)
        at org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:198)
        at org.jruby.runtime.Interpreted19Block.call(Interpreted19Block.java:125)
        at org.jruby.runtime.Block.call(Block.java:101)
        at org.jruby.RubyProc.call(RubyProc.java:290)
        at org.jruby.RubyProc.call(RubyProc.java:228)
        at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:99)
        at java.lang.Thread.run(Thread.java:745)
Exception in thread "elasticsearch[30ab2b2c-439f-4bcc-977d-7c0e9a90f3a5][generic][T#1]" Exception in thread "elasticsearch[30ab2b2c-439f-4bcc-977d-7c0e9a90f3a5][generic][T#4]" java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
Error: Your application used more memory than the safety cap of 500M.
Specify -J-Xmx####m to increase it (#### = cap size in MB).
Specify -w for full OutOfMemoryError stack trace
The other nodes are fine. This is the JVM info.
JVM.JPG
JVM on another node that's working
JVM_good.JPG
I tried giving it more RAM so it can recover, but I continue to get errors on this node and logstash is crashing.

Code: Select all

WARN: org.elasticsearch.transport.netty: [30ab2b2c-439f-4bcc-977d-7c0e9a90f3a5] exception caught on transport layer [[id: 0xc0622298, /127.0.0.1:48013 :> localhost/127.0.0.1:9300]], closing connection
java.io.StreamCorruptedException: invalid internal transport message format
You do not have the required permissions to view the files attached to this post.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Heap Space: OutOfMemoryError

Post by jolson »

We'll need to increase the HEAP_SIZE of Logstash itself. Give this a try.

Open up /etc/sysconfig/logstash

Change:
#LS_HEAP_SIZE="256m"

To:
LS_HEAP_SIZE="1024m"

And run a service logstash restart. Let me know if this helps with your heap problems. Thanks!
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Heap Space: OutOfMemoryError

Post by CFT6Server »

Not sure if this is related, but the nodes are now throwing indices.fielddata.breaker errors. Looks like we are tripping the fielddata.limit. I would like to increase this to 70% instead. This is the current breaker stats.

Code: Select all

"cluster_name" : "80e9022e-f73f-429e-8927-f23d0d88dfd2",
  "nodes" : {
    "kcJDKIbyTUWwnXtbyJ9gpQ" : {
      "timestamp" : 1438022601405,
      "name" : "30ab2b2c-439f-4bcc-977d-7c0e9a90f3a5",
      "transport_address" : "inet[/10.242.102.108:9300]",
      "host" : "kdcnagls1n2.bchydro.bc.ca",
      "ip" : [ "inet[/10.242.102.108:9300]", "NONE" ],
      "attributes" : {
        "max_local_storage_nodes" : "1"
      },
      "fielddata_breaker" : {
        "maximum_size_in_bytes" : 5026951987,
        "maximum_size" : "4.6gb",
        "estimated_size_in_bytes" : 4876893485,
        "estimated_size" : "4.5gb",
        "overhead" : 1.03,
        "tripped" : 271
      }
    },
    "uZ8wjGAYQFykeK7MhxIhMQ" : {
      "timestamp" : 1438022601392,
      "name" : "e63648a3-d912-4f5d-a867-1b99282a5e7c",
      "transport_address" : "inet[/10.242.102.109:9300]",
      "host" : "kdcnagls1n3.bchydro.bc.ca",
      "ip" : [ "inet[/10.242.102.109:9300]", "NONE" ],
      "attributes" : {
        "max_local_storage_nodes" : "1"
      },
      "fielddata_breaker" : {
        "maximum_size_in_bytes" : 5026951987,
        "maximum_size" : "4.6gb",
        "estimated_size_in_bytes" : 4511806069,
        "estimated_size" : "4.2gb",
        "overhead" : 1.03,
        "tripped" : 424
      }
    },
    "Qc57wXjdTC-2LWeqy54XMw" : {
      "timestamp" : 1438022601409,
      "name" : "4521585a-88af-47c9-81e5-c4d13cffb148",
      "transport_address" : "inet[/10.242.102.107:9300]",
      "host" : "kdcnagls1n1.bchydro.bc.ca",
      "ip" : [ "inet[/10.242.102.107:9300]", "NONE" ],
      "attributes" : {
        "max_local_storage_nodes" : "1"
      },
      "fielddata_breaker" : {
        "maximum_size_in_bytes" : 5026951987,
        "maximum_size" : "4.6gb",
        "estimated_size_in_bytes" : 4891010595,
        "estimated_size" : "4.5gb",
        "overhead" : 1.03,
        "tripped" : 493
      }
Any recommendations that I can set on top of the limit? (ie. cache.size)
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Heap Space: OutOfMemoryError

Post by jolson »

You should definitely read through this page of documentation before adjusting the fielddata breaker setting: https://www.elastic.co/guide/en/elastic ... usage.html
This setting is a safeguard, not a solution for insufficient memory.
If you don’t have enough memory to keep your fielddata resident in memory, Elasticsearch will constantly have to reload data from disk, and evict other data to make space. Evictions cause heavy disk I/O and generate a large amount of garbage in memory, which must be garbage collected later on.
If possible, increase the amount of memory allocated to your Nagios Log Server node, and the fielddata will automatically adjust accordingly. If you'd like to adjust this setting manually, you may do so in the elasticsearch.yml file:
To prevent this scenario, place an upper limit on the fielddata by adding this setting to the config/elasticsearch.yml file:

indices.fielddata.cache.size: 40%
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Heap Space: OutOfMemoryError

Post by CFT6Server »

We already have 16GB allocated to these nodes. The current limit of is allocated based now JVM Heap memory size which is 50% (8GB). So out of the Heap memory, 4.6GB is allocated as field data. These are quite conservative and we are definitely running into the fielddata breaker limit. So we are looking at increase this to accommodate for our queries. In our case, I think the limit isn't large enough to even have Elasticsearch evict data in memory and doesn't run the disk. Have you guys run into this?
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Heap Space: OutOfMemoryError

Post by jolson »

Have you guys run into this?
I have not run into this before, and I have a couple of thoughts.

The recommended way to address this would be increasing the amount of physical RAM in the box.

A few other ways to address this:

You could manually set the Elasticsearch HEAP_SIZE value to a higher number, the configuration file for setting this is located in /etc/sysconfig/elasticsearch. With 16GB of RAM, I would say that you could probably set it to ~8-10g.

You could implement a higher fielddata breaker limit as described in my previous post - I would start with something rather conservative (60%) and increase the value as necessary.

I don't have a lot of experience with this particular setting, but I spoke with a developer and his recommendations are in line with mine.

Thanks!

Jesse
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Heap Space: OutOfMemoryError

Post by CFT6Server »

Thanks! Let me play around with these settings. In order for me to apply them, I have to do a rolling restart on the cluster, so will take a while. I am still waiting for the unassigned shards to clear from the first node.
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Heap Space: OutOfMemoryError

Post by CFT6Server »

Looks like the first node I rebooted is stuck allocating shards and nothing is happening when I watch the resource usage on that node. Nothing is showing in the logs. Could it be allocating the unassigned shards but not generating any resource usage?

Code: Select all

# curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
  "cluster_name" : "80e9022e-f73f-429e-8927-f23d0d88dfd2",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 86,
  "active_shards" : 136,
  "relocating_shards" : 0,
  "initializing_shards" : 6,
  "unass
igned_shards" : 30
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Heap Space: OutOfMemoryError

Post by jolson »

CFT6Server wrote:Looks like the first node I rebooted is stuck allocating shards and nothing is happening when I watch the resource usage on that node. Nothing is showing in the logs. Could it be allocating the unassigned shards but not generating any resource usage?

Code: Select all

# curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
  "cluster_name" : "80e9022e-f73f-429e-8927-f23d0d88dfd2",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 86,
  "active_shards" : 136,
  "relocating_shards" : 0,
  "initializing_shards" : 6,
  "unass
igned_shards" : 30
It certainly could. Let me know if your shards are still stuck in any unassigned or initializing states.
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
Locked