Logstash crashing
Posted: Wed Nov 25, 2015 11:33 am
We have a 2 server cluster where logstash keeps crashing. Logstash logs show nothing useful, but elasticsearch log file on the main instance shows the following:
I've made minor changes in the entries for privacy purposes:
node01 is the second node
node00 is the first node (where I took the log entries from)
1.2.3.4 is the IP address for node01
This log entries keep scrolling through the elasticsearch log on node00 many times a second and won't stop until I stop elasticsearch and logstash on node01.
The elasticsearch logs in node01 show:
I'm not sure what's wrong with the shard. If there's any way you can help bring my cluster back to a working state, I'd very much appreciate it. If you need any additional information, please let me know. I'm also open to a remote session to get this resolved as quickly as possible. At the moment, node01 has been taken offline to keep things working. node00 is online and working by itself.
Code: Select all
[2015-11-24 21:43:53,376][INFO ][indices.recovery ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] Recovery with sync ID 0 numDocs: 0 vs. true
[2015-11-24 21:43:53,396][WARN ][cluster.action.shard ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] [kibana-int][1] received shard failed for [kibana-int][1], node[t3mh4oeOQVq9zQNYiy7Icw], [R], s[INITIALIZING], indexUUID [RZ7Jphw1RPSwKpaGWDrLGg], reason [shard failure [failed recovery][RecoveryFailedException[[kibana-int][1]: Recovery failed from [abb8401f-0039-4a01-8742-b25a4fdf7e8f][2SsDkHY8S4--qUrfsaKOhg][-node01-][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1} into [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][t3mh4oeOQVq9zQNYiy7Icw][-node01-][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1}]; nested: RemoteTransportException[[abb8401f-0039-4a01-8742-b25a4fdf7e8f][inet[/5.6.7.8:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[kibana-int][1] Phase[2] Execution failed]; nested: RemoteTransportException[[e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][inet[/1.2.3.4:9300]][internal:index/shard/recovery/prepare_translog]]; nested: TranslogException[[kibana-int][1] failed to create new translog file]; nested: FileNotFoundException[/usr/local/nagioslogserver/elasticsearch/data/98ec1d45-73e0-4296-ba7f-d7d71953b7e8/nodes/0/indices/kibana-int/1/translog/translog-1431537652243 (Permission denied)]; ]]
[2015-11-24 21:43:53,942][INFO ][indices.recovery ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] Recovery with sync ID 4 numDocs: 4 vs. true
[2015-11-24 21:43:53,967][WARN ][cluster.action.shard ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] [kibana-int][4] received shard failed for [kibana-int][4], node[t3mh4oeOQVq9zQNYiy7Icw], [R], s[INITIALIZING], indexUUID [RZ7Jphw1RPSwKpaGWDrLGg], reason [shard failure [failed recovery][RecoveryFailedException[[kibana-int][4]: Recovery failed from [abb8401f-0039-4a01-8742-b25a4fdf7e8f][2SsDkHY8S4--qUrfsaKOhg][-node01-][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1} into [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][t3mh4oeOQVq9zQNYiy7Icw][-node01-][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1}]; nested: RemoteTransportException[[abb8401f-0039-4a01-8742-b25a4fdf7e8f][inet[/5.6.7.8:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[kibana-int][4] Phase[2] Execution failed]; nested: RemoteTransportException[[e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][inet[/1.2.3.4:9300]][internal:index/shard/recovery/prepare_translog]]; nested: TranslogException[[kibana-int][4] failed to create new translog file]; nested: FileNotFoundException[/usr/local/nagioslogserver/elasticsearch/data/98ec1d45-73e0-4296-ba7f-d7d71953b7e8/nodes/0/indices/kibana-int/4/translog/translog-1431537652325 (Permission denied)]; ]]
[2015-11-24 21:43:54,516][INFO ][indices.recovery ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] Recovery with sync ID 4 numDocs: 4 vs. true
[2015-11-24 21:43:54,543][WARN ][cluster.action.shard ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] [kibana-int][3] received shard failed for [kibana-int][3], node[t3mh4oeOQVq9zQNYiy7Icw], [R], s[INITIALIZING], indexUUID [RZ7Jphw1RPSwKpaGWDrLGg], reason [shard failure [failed recovery][RecoveryFailedException[[kibana-int][3]: Recovery failed from [abb8401f-0039-4a01-8742-b25a4fdf7e8f][2SsDkHY8S4--qUrfsaKOhg][-node01-][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1} into [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][t3mh4oeOQVq9zQNYiy7Icw][-node01-][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1}]; nested: RemoteTransportException[[abb8401f-0039-4a01-8742-b25a4fdf7e8f][inet[/.168.115.67:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[kibana-int][3] Phase[2] Execution failed]; nested: RemoteTransportException[[e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][inet[/1.2.3.4:9300]][internal:index/shard/recovery/prepare_translog]]; nested: TranslogException[[kibana-int][3] failed to create new translog file]; nested: FileNotFoundException[/usr/local/nagioslogserver/elasticsearch/data/98ec1d45-73e0-4296-ba7f-d7d71953b7e8/nodes/0/indices/kibana-int/3/translog/translog-1431537652221 (Permission denied)]; ]]
[2015-11-24 21:43:55,080][INFO ][indices.recovery ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] Recovery with sync ID 3 numDocs: 3 vs. true
[2015-11-24 21:43:55,107][WARN ][cluster.action.shard ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] [kibana-int][2] received shard failed for [kibana-int][2], node[t3mh4oeOQVq9zQNYiy7Icw], [R], s[INITIALIZING], indexUUID [RZ7Jphw1RPSwKpaGWDrLGg], reason [shard failure [failed recovery][RecoveryFailedException[[kibana-int][2]: Recovery failed from [abb8401f-0039-4a01-8742-b25a4fdf7e8f][2SsDkHY8S4--qUrfsaKOhg][-node01-][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1} into [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][t3mh4oeOQVq9zQNYiy7Icw][-node01-][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1}]; nested: RemoteTransportException[[abb8401f-0039-4a01-8742-b25a4fdf7e8f][inet[/5.6.7.8:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[kibana-int][2] Phase[2] Execution failed]; nested: RemoteTransportException[[e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][inet[/1.2.3.4:9300]][internal:index/shard/recovery/prepare_translog]]; nested: TranslogException[[kibana-int][2] failed to create new translog file]; nested: FileNotFoundException[/usr/local/nagioslogserver/elasticsearch/data/98ec1d45-73e0-4296-ba7f-d7d71953b7e8/nodes/0/indices/kibana-int/2/translog/translog-1431537652277 (Permission denied)]; ]]
node01 is the second node
node00 is the first node (where I took the log entries from)
1.2.3.4 is the IP address for node01
This log entries keep scrolling through the elasticsearch log on node00 many times a second and won't stop until I stop elasticsearch and logstash on node01.
The elasticsearch logs in node01 show:
Code: Select all
[2015-11-24 21:43:55,102][WARN ][indices.cluster ] [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] [[kibana-int][2]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [kibana-int][2]: Recovery failed from [abb8401f-0039-4a01-8742-b25a4fdf7e8f][2SsDkHY8S4--qUrfsaKOhg][-node00-][inet[/5.6.7.8:9300]]{max_local_storage_nodes=1} into [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][t3mh4oeOQVq9zQNYiy7Icw][-node01][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1}
at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:280)
at org.elasticsearch.indices.recovery.RecoveryTarget.access$700(RecoveryTarget.java:70)
at org.elasticsearch.indices.recovery.RecoveryTarget$RecoveryRunner.doRun(RecoveryTarget.java:561)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.transport.RemoteTransportException: [abb8401f-0039-4a01-8742-b25a4fdf7e8f][inet[/5.6.7.8:9300]][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: [kibana-int][2] Phase[2] Execution failed
at org.elasticsearch.index.engine.InternalEngine.recover(InternalEngine.java:917)
at org.elasticsearch.index.shard.IndexShard.recover(IndexShard.java:780)
at org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:125)
at org.elasticsearch.indices.recovery.RecoverySource.access$200(RecoverySource.java:49)
at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:146)
at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:132)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.transport.RemoteTransportException: [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][inet[/1.2.3.4:9300]][internal:index/shard/recovery/prepare_translog]
Caused by: org.elasticsearch.index.translog.TranslogException: [kibana-int][2] failed to create new translog file
at org.elasticsearch.index.translog.fs.FsTranslog.newTranslog(FsTranslog.java:253)
at org.elasticsearch.index.engine.InternalEngine.createSearcherManager(InternalEngine.java:199)
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:156)
at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:32)
at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:1351)
at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:1346)
at org.elasticsearch.index.shard.IndexShard.prepareForTranslogRecovery(IndexShard.java:866)
at org.elasticsearch.indices.recovery.RecoveryTarget$PrepareForTranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:307)
at org.elasticsearch.indices.recovery.RecoveryTarget$PrepareForTranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:290)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: /usr/local/nagioslogserver/elasticsearch/data/98ec1d45-73e0-4296-ba7f-d7d71953b7e8/nodes/0/indices/kibana-int/2/translog/translog-1431537652277 (Permission denied)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
at org.elasticsearch.index.translog.fs.RafReference.<init>(RafReference.java:49)
at org.elasticsearch.index.translog.fs.FsTranslog.newTranslog(FsTranslog.java:251)
... 13 more
[2015-11-24 21:43:55,118][INFO ][node ] [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] stopping ...
[2015-11-24 21:43:55,181][WARN ][netty.channel.DefaultChannelPipeline] An exception was thrown by an exception handler.
java.util.concurrent.RejectedExecutionException: Worker has already been shutdown
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.registerTask(AbstractNioSelector.java:120)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:72)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:56)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioChannelSink.execute(AbstractNioChannelSink.java:34)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.execute(DefaultChannelPipeline.java:636)
at org.elasticsearch.common.netty.channel.Channels.fireExceptionCaughtLater(Channels.java:496)
at org.elasticsearch.common.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:46)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.notifyHandlerException(DefaultChannelPipeline.java:658)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:577)
at org.elasticsearch.common.netty.channel.Channels.write(Channels.java:704)
at org.elasticsearch.common.netty.channel.Channels.write(Channels.java:671)
at org.elasticsearch.common.netty.channel.AbstractChannel.write(AbstractChannel.java:348)
at org.elasticsearch.transport.netty.NettyTransportChannel.sendResponse(NettyTransportChannel.java:105)
at org.elasticsearch.transport.netty.NettyTransportChannel.sendResponse(NettyTransportChannel.java:76)
at org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:526)
at org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:462)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2015-11-24 21:43:55,182][WARN ][indices.cluster ] [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] [[logstash-2015.10.27][3]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [logstash-2015.10.27][3]: Recovery failed from [abb8401f-0039-4a01-8742-b25a4fdf7e8f][2SsDkHY8S4--qUrfsaKOhg][-node00-][inet[/5.6.7.8:9300]]{max_local_storage_nodes=1} into [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][t3mh4oeOQVq9zQNYiy7Icw][-node01-][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1}
at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:280)
at org.elasticsearch.indices.recovery.RecoveryTarget.access$700(RecoveryTarget.java:70)
at org.elasticsearch.indices.recovery.RecoveryTarget$RecoveryRunner.doRun(RecoveryTarget.java:561)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.transport.TransportException: transport stopped, action: internal:index/shard/recovery/start_recovery
at org.elasticsearch.transport.TransportService$2.run(TransportService.java:178)
... 3 more
[2015-11-24 21:43:55,183][WARN ][cluster.action.shard ] [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] failed to send failed shard to [abb8401f-0039-4a01-8742-b25a4fdf7e8f][2SsDkHY8S4--qUrfsaKOhg][-node00-][inet[/5.6.7.8:9300]]{max_local_storage_nodes=1}
org.elasticsearch.transport.SendRequestTransportException: [abb8401f-0039-4a01-8742-b25a4fdf7e8f][inet[/5.6.7.8:9300]][internal:cluster/shard/failure]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:286)
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:249)
at org.elasticsearch.cluster.action.shard.ShardStateAction.innerShardFailed(ShardStateAction.java:98)
at org.elasticsearch.cluster.action.shard.ShardStateAction.shardFailed(ShardStateAction.java:85)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.sendFailShard(IndicesClusterStateService.java:878)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.failAndRemoveShard(IndicesClusterStateService.java:870)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.handleRecoveryFailure(IndicesClusterStateService.java:825)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.access$300(IndicesClusterStateService.java:84)
at org.elasticsearch.indices.cluster.IndicesClusterStateService$PeerRecoveryListener.onRecoveryFailure(IndicesClusterStateService.java:819)
at org.elasticsearch.indices.recovery.RecoveryStatus.fail(RecoveryStatus.java:175)
at org.elasticsearch.indices.recovery.RecoveriesCollection.failRecovery(RecoveriesCollection.java:122)
at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:280)
at org.elasticsearch.indices.recovery.RecoveryTarget.access$700(RecoveryTarget.java:70)
at org.elasticsearch.indices.recovery.RecoveryTarget$RecoveryRunner.doRun(RecoveryTarget.java:561)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.transport.TransportException: TransportService is closed stopped can't send request
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:270)
... 17 more