logstash dying

vmesquita · Post by **vmesquita** » Fri Oct 14, 2016 11:42 am

The logstash process keeps dying in one of the nodes, I restart it and after a while it dies again. This time I left the console open and got those messages:

Code: Select all

Oct 14, 2016 12:08:34 PM org.elasticsearch.plugins.PluginsService <init>
INFO: [6a7ce4ea-e1b9-47a1-af18-1c4d47243d20] loaded [], sites []
Oct 14, 2016 12:08:37 PM org.elasticsearch.plugins.PluginsService <init>
INFO: [6a7ce4ea-e1b9-47a1-af18-1c4d47243d20] loaded [], sites []
Oct 14, 2016 12:08:37 PM org.elasticsearch.plugins.PluginsService <init>
INFO: [6a7ce4ea-e1b9-47a1-af18-1c4d47243d20] loaded [], sites []
Oct 14, 2016 12:08:37 PM org.elasticsearch.plugins.PluginsService <init>
INFO: [6a7ce4ea-e1b9-47a1-af18-1c4d47243d20] loaded [], sites []
Oct 14, 2016 12:08:37 PM org.elasticsearch.plugins.PluginsService <init>
INFO: [6a7ce4ea-e1b9-47a1-af18-1c4d47243d20] loaded [], sites []
Exception in thread "Ruby-0-Thread-46: /usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-0.2.8-java/lib/logstash/outputs/elasticsearch.rb:406" Exception in thread "elasticsearch[6a7ce4ea-e1b9-47a1-af18-1c4d47243d20][generic][T#1]" Exception in thread "Ruby-0-Thread-39: /usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-0.2.8-java/lib/logstash/outputs/elasticsearch.rb:406" Oct 14, 2016 12:30:54 PM org.elasticsearch.client.transport.TransportClientNodesService$SimpleNodeSampler doSample
INFO: [6a7ce4ea-e1b9-47a1-af18-1c4d47243d20] failed to get node info for [#transport#-1][sa585][inet[localhost/127.0.0.1:9300]], disconnecting...
java.lang.OutOfMemoryError: GC overhead limit exceeded

java.lang.ArrayIndexOutOfBoundsException: -1
        at org.jruby.runtime.ThreadContext.popRubyClass(ThreadContext.java:702)
        at org.jruby.runtime.ThreadContext.postYield(ThreadContext.java:1269)
        at org.jruby.runtime.ContextAwareBlockBody.post(ContextAwareBlockBody.java:29)
        at org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:198)
        at org.jruby.runtime.Interpreted19Block.call(Interpreted19Block.java:125)
        at org.jruby.runtime.Block.call(Block.java:101)
        at org.jruby.RubyProc.call(RubyProc.java:290)
        at org.jruby.RubyProc.call(RubyProc.java:228)
        at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:99)
        at java.lang.Thread.run(Thread.java:745)
java.lang.ArrayIndexOutOfBoundsException: -1
        at org.jruby.runtime.ThreadContext.popRubyClass(ThreadContext.java:702)
        at org.jruby.runtime.ThreadContext.postYield(ThreadContext.java:1269)
        at org.jruby.runtime.ContextAwareBlockBody.post(ContextAwareBlockBody.java:29)
        at org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:198)
        at org.jruby.runtime.Interpreted19Block.call(Interpreted19Block.java:125)
        at org.jruby.runtime.Block.call(Block.java:101)
        at org.jruby.RubyProc.call(RubyProc.java:290)
        at org.jruby.RubyProc.call(RubyProc.java:228)
        at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:99)
        at java.lang.Thread.run(Thread.java:745)
java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "LogStash::Runner" java.lang.OutOfMemoryError: GC overhead limit exceeded
        at org.jruby.RubyString.+(org/jruby/RubyString.java:1174)
        at Time.xmlschema(/usr/local/nagioslogserver/logstash/vendor/jruby/lib/ruby/1.9/time.rb:533)
        at Time.xmlschema(/usr/local/nagioslogserver/logstash/vendor/jruby/lib/ruby/1.9/time.rb:533)
        at LogStash::Timestamp.to_iso8601(/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.1-java/lib/logstash/timestamp.rb:89)
        at LogStash::Timestamp.to_iso8601(/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.1-java/lib/logstash/timestamp.rb:89)
        at org.elasticsearch.common.xcontent.XContentBuilder.writeValue(org/elasticsearch/common/xcontent/XContentBuilder.java:1272)
        at org.elasticsearch.common.xcontent.XContentBuilder.writeMap(org/elasticsearch/common/xcontent/XContentBuilder.java:1163)
        at org.elasticsearch.common.xcontent.XContentBuilder.map(org/elasticsearch/common/xcontent/XContentBuilder.java:1072)
        at org.elasticsearch.action.index.IndexRequest.source(org/elasticsearch/action/index/IndexRequest.java:379)
        at org.elasticsearch.action.index.IndexRequest.source(org/elasticsearch/action/index/IndexRequest.java:368)
        at java.lang.reflect.Method.invoke(java/lang/reflect/Method.java:606)
Oct 14, 2016 12:30:56 PM org.elasticsearch.client.transport.TransportClientNodesService$SimpleNodeSampler doSample
INFO: [6a7ce4ea-e1b9-47a1-af18-1c4d47243d20] failed to get node info for [#transport#-1][sa585][inet[localhost/127.0.0.1:9300]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[localhost/127.0.0.1:9300]][cluster:monitor/nodes/info] request_id [1057] timed out after [13544ms]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:529)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Oct 14, 2016 12:30:56 PM org.elasticsearch.transport.netty.NettyTransport exceptionCaught
WARNING: [6a7ce4ea-e1b9-47a1-af18-1c4d47243d20] exception caught on transport layer [[id: 0x41d0c3b9, /127.0.0.1:35161 => localhost/127.0.0.1:9300]], closing connection
java.lang.OutOfMemoryError: GC overhead limit exceeded

Oct 14, 2016 12:30:56 PM org.elasticsearch.transport.netty.NettyTransport exceptionCaught
WARNING: [6a7ce4ea-e1b9-47a1-af18-1c4d47243d20] exception caught on transport layer [[id: 0x41d0c3b9, /127.0.0.1:35161 :> localhost/127.0.0.1:9300]], closing connection
java.io.StreamCorruptedException: invalid internal transport message format, got (0,0,0,0)
        at org.elasticsearch.transport.netty.SizeHeaderFrameDecoder.decode(SizeHeaderFrameDecoder.java:47)
        at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425)
        at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:482)
        at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.channelDisconnected(FrameDecoder.java:365)
        at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:102)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
        at org.elasticsearch.common.netty.channel.Channels.fireChannelDisconnected(Channels.java:396)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:360)
        at org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:58)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:574)
        at org.elasticsearch.common.netty.channel.Channels.close(Channels.java:812)
        at org.elasticsearch.common.netty.channel.AbstractChannel.close(AbstractChannel.java:206)
        at org.elasticsearch.transport.netty.NettyTransport.exceptionCaught(NettyTransport.java:611)
        at org.elasticsearch.transport.netty.MessageChannelHandler.exceptionCaught(MessageChannelHandler.java:237)
        at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:377)
        at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
        at org.elasticsearch.common.netty.channel.Channels.fireExceptionCaught(Channels.java:525)
        at org.elasticsearch.common.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:48)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.notifyHandlerException(DefaultChannelPipeline.java:658)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:566)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
        at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
        at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
        at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
        at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Any ideas?

rkennedy · Post by **rkennedy** » Fri Oct 14, 2016 11:54 am

The issue is due to running out of memory -

Code: Select all

java.lang.OutOfMemoryError: GC overhead limit exceeded

You'll want to increase the amount of memory on the machine, or reduce the amount of days that you're keeping indexes open on the Backup & Maintenance page. What are your current Backup & Maintenance settings?

vmesquita · Post by **vmesquita** » Tue Nov 29, 2016 1:28 pm

Hello!

Sorry for the late reply. After your message, we realized that one of the nodes had 8 Gb, while the other node had 16 GB. So we assumed that increasing the memory to 16 Gb would probably fix the issue. However now we finally increased the server memory and the issue keeps happening. Here's our Backup and Maintenance config:

Optimize Indexes older than 2 days
Close indexes older than 7 days
Delete indexes older than 0 days
Repository to store backups in
Delete backups older than 720 days
Enable Maintenance and Backups Yes

I am thinking of dropping the "Close index" parameter to 2 just to see if the crashes stop. Then we increase it little by little. Does it make sense?

Post by **mcapra** » Tue Nov 29, 2016 3:32 pm

That makes sense. You might also try increasing the logstash heap size and open files limits by modifying the following values in /etc/init.d/logstash:

Code: Select all

LS_HEAP_SIZE="1000m"
LS_OPEN_FILES=65535

vmesquita · Post by **vmesquita** » Wed Dec 28, 2016 1:36 pm

That did the trick, so far. Thanks.

Post by **mcapra** » Wed Dec 28, 2016 1:45 pm

Awesome! Did you have additional questions regarding this issue, or is it ok if we close this thread and mark the issue as resolved?

vmesquita · Post by **vmesquita** » Thu Jan 05, 2017 12:50 pm

You can close the thread.

rkennedy · Post by **rkennedy** » Thu Jan 05, 2017 1:27 pm

Will do! Feel free to make a new one if you have questions in the future.

Nagios Support Forum

logstash dying

logstash dying

Re: logstash dying

Re: logstash dying

Re: logstash dying

Re: logstash dying

Re: logstash dying

Re: logstash dying

Re: logstash dying