Logstash crashing

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
User avatar
mike4vr
Posts: 89
Joined: Wed Feb 04, 2015 2:23 pm

Logstash crashing

Post by mike4vr »

We have a 2 server cluster where logstash keeps crashing. Logstash logs show nothing useful, but elasticsearch log file on the main instance shows the following:

Code: Select all

[2015-11-24 21:43:53,376][INFO ][indices.recovery         ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] Recovery with sync ID 0 numDocs: 0 vs. true
[2015-11-24 21:43:53,396][WARN ][cluster.action.shard     ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] [kibana-int][1] received shard failed for [kibana-int][1], node[t3mh4oeOQVq9zQNYiy7Icw], [R], s[INITIALIZING], indexUUID [RZ7Jphw1RPSwKpaGWDrLGg], reason [shard failure [failed recovery][RecoveryFailedException[[kibana-int][1]: Recovery failed from [abb8401f-0039-4a01-8742-b25a4fdf7e8f][2SsDkHY8S4--qUrfsaKOhg][-node01-][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1} into [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][t3mh4oeOQVq9zQNYiy7Icw][-node01-][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1}]; nested: RemoteTransportException[[abb8401f-0039-4a01-8742-b25a4fdf7e8f][inet[/5.6.7.8:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[kibana-int][1] Phase[2] Execution failed]; nested: RemoteTransportException[[e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][inet[/1.2.3.4:9300]][internal:index/shard/recovery/prepare_translog]]; nested: TranslogException[[kibana-int][1] failed to create new translog file]; nested: FileNotFoundException[/usr/local/nagioslogserver/elasticsearch/data/98ec1d45-73e0-4296-ba7f-d7d71953b7e8/nodes/0/indices/kibana-int/1/translog/translog-1431537652243 (Permission denied)]; ]]
[2015-11-24 21:43:53,942][INFO ][indices.recovery         ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] Recovery with sync ID 4 numDocs: 4 vs. true
[2015-11-24 21:43:53,967][WARN ][cluster.action.shard     ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] [kibana-int][4] received shard failed for [kibana-int][4], node[t3mh4oeOQVq9zQNYiy7Icw], [R], s[INITIALIZING], indexUUID [RZ7Jphw1RPSwKpaGWDrLGg], reason [shard failure [failed recovery][RecoveryFailedException[[kibana-int][4]: Recovery failed from [abb8401f-0039-4a01-8742-b25a4fdf7e8f][2SsDkHY8S4--qUrfsaKOhg][-node01-][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1} into [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][t3mh4oeOQVq9zQNYiy7Icw][-node01-][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1}]; nested: RemoteTransportException[[abb8401f-0039-4a01-8742-b25a4fdf7e8f][inet[/5.6.7.8:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[kibana-int][4] Phase[2] Execution failed]; nested: RemoteTransportException[[e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][inet[/1.2.3.4:9300]][internal:index/shard/recovery/prepare_translog]]; nested: TranslogException[[kibana-int][4] failed to create new translog file]; nested: FileNotFoundException[/usr/local/nagioslogserver/elasticsearch/data/98ec1d45-73e0-4296-ba7f-d7d71953b7e8/nodes/0/indices/kibana-int/4/translog/translog-1431537652325 (Permission denied)]; ]]
[2015-11-24 21:43:54,516][INFO ][indices.recovery         ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] Recovery with sync ID 4 numDocs: 4 vs. true
[2015-11-24 21:43:54,543][WARN ][cluster.action.shard     ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] [kibana-int][3] received shard failed for [kibana-int][3], node[t3mh4oeOQVq9zQNYiy7Icw], [R], s[INITIALIZING], indexUUID [RZ7Jphw1RPSwKpaGWDrLGg], reason [shard failure [failed recovery][RecoveryFailedException[[kibana-int][3]: Recovery failed from [abb8401f-0039-4a01-8742-b25a4fdf7e8f][2SsDkHY8S4--qUrfsaKOhg][-node01-][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1} into [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][t3mh4oeOQVq9zQNYiy7Icw][-node01-][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1}]; nested: RemoteTransportException[[abb8401f-0039-4a01-8742-b25a4fdf7e8f][inet[/.168.115.67:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[kibana-int][3] Phase[2] Execution failed]; nested: RemoteTransportException[[e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][inet[/1.2.3.4:9300]][internal:index/shard/recovery/prepare_translog]]; nested: TranslogException[[kibana-int][3] failed to create new translog file]; nested: FileNotFoundException[/usr/local/nagioslogserver/elasticsearch/data/98ec1d45-73e0-4296-ba7f-d7d71953b7e8/nodes/0/indices/kibana-int/3/translog/translog-1431537652221 (Permission denied)]; ]]
[2015-11-24 21:43:55,080][INFO ][indices.recovery         ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] Recovery with sync ID 3 numDocs: 3 vs. true
[2015-11-24 21:43:55,107][WARN ][cluster.action.shard     ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] [kibana-int][2] received shard failed for [kibana-int][2], node[t3mh4oeOQVq9zQNYiy7Icw], [R], s[INITIALIZING], indexUUID [RZ7Jphw1RPSwKpaGWDrLGg], reason [shard failure [failed recovery][RecoveryFailedException[[kibana-int][2]: Recovery failed from [abb8401f-0039-4a01-8742-b25a4fdf7e8f][2SsDkHY8S4--qUrfsaKOhg][-node01-][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1} into [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][t3mh4oeOQVq9zQNYiy7Icw][-node01-][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1}]; nested: RemoteTransportException[[abb8401f-0039-4a01-8742-b25a4fdf7e8f][inet[/5.6.7.8:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[kibana-int][2] Phase[2] Execution failed]; nested: RemoteTransportException[[e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][inet[/1.2.3.4:9300]][internal:index/shard/recovery/prepare_translog]]; nested: TranslogException[[kibana-int][2] failed to create new translog file]; nested: FileNotFoundException[/usr/local/nagioslogserver/elasticsearch/data/98ec1d45-73e0-4296-ba7f-d7d71953b7e8/nodes/0/indices/kibana-int/2/translog/translog-1431537652277 (Permission denied)]; ]]
I've made minor changes in the entries for privacy purposes:
node01 is the second node
node00 is the first node (where I took the log entries from)
1.2.3.4 is the IP address for node01

This log entries keep scrolling through the elasticsearch log on node00 many times a second and won't stop until I stop elasticsearch and logstash on node01.

The elasticsearch logs in node01 show:

Code: Select all

[2015-11-24 21:43:55,102][WARN ][indices.cluster          ] [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] [[kibana-int][2]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [kibana-int][2]: Recovery failed from [abb8401f-0039-4a01-8742-b25a4fdf7e8f][2SsDkHY8S4--qUrfsaKOhg][-node00-][inet[/5.6.7.8:9300]]{max_local_storage_nodes=1} into [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][t3mh4oeOQVq9zQNYiy7Icw][-node01][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1}
        at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:280)
        at org.elasticsearch.indices.recovery.RecoveryTarget.access$700(RecoveryTarget.java:70)
        at org.elasticsearch.indices.recovery.RecoveryTarget$RecoveryRunner.doRun(RecoveryTarget.java:561)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.transport.RemoteTransportException: [abb8401f-0039-4a01-8742-b25a4fdf7e8f][inet[/5.6.7.8:9300]][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: [kibana-int][2] Phase[2] Execution failed
        at org.elasticsearch.index.engine.InternalEngine.recover(InternalEngine.java:917)
        at org.elasticsearch.index.shard.IndexShard.recover(IndexShard.java:780)
        at org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:125)
        at org.elasticsearch.indices.recovery.RecoverySource.access$200(RecoverySource.java:49)
        at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:146)
        at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:132)
        at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.transport.RemoteTransportException: [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][inet[/1.2.3.4:9300]][internal:index/shard/recovery/prepare_translog]
Caused by: org.elasticsearch.index.translog.TranslogException: [kibana-int][2] failed to create new translog file
        at org.elasticsearch.index.translog.fs.FsTranslog.newTranslog(FsTranslog.java:253)
        at org.elasticsearch.index.engine.InternalEngine.createSearcherManager(InternalEngine.java:199)
        at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:156)
        at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:32)
        at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:1351)
        at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:1346)
        at org.elasticsearch.index.shard.IndexShard.prepareForTranslogRecovery(IndexShard.java:866)
        at org.elasticsearch.indices.recovery.RecoveryTarget$PrepareForTranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:307)
        at org.elasticsearch.indices.recovery.RecoveryTarget$PrepareForTranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:290)
        at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: /usr/local/nagioslogserver/elasticsearch/data/98ec1d45-73e0-4296-ba7f-d7d71953b7e8/nodes/0/indices/kibana-int/2/translog/translog-1431537652277 (Permission denied)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
        at org.elasticsearch.index.translog.fs.RafReference.<init>(RafReference.java:49)
        at org.elasticsearch.index.translog.fs.FsTranslog.newTranslog(FsTranslog.java:251)
        ... 13 more
[2015-11-24 21:43:55,118][INFO ][node                     ] [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] stopping ...
[2015-11-24 21:43:55,181][WARN ][netty.channel.DefaultChannelPipeline] An exception was thrown by an exception handler.
java.util.concurrent.RejectedExecutionException: Worker has already been shutdown
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.registerTask(AbstractNioSelector.java:120)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:72)
        at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:56)
        at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioChannelSink.execute(AbstractNioChannelSink.java:34)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.execute(DefaultChannelPipeline.java:636)
        at org.elasticsearch.common.netty.channel.Channels.fireExceptionCaughtLater(Channels.java:496)
        at org.elasticsearch.common.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:46)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.notifyHandlerException(DefaultChannelPipeline.java:658)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:577)
        at org.elasticsearch.common.netty.channel.Channels.write(Channels.java:704)
        at org.elasticsearch.common.netty.channel.Channels.write(Channels.java:671)
        at org.elasticsearch.common.netty.channel.AbstractChannel.write(AbstractChannel.java:348)
        at org.elasticsearch.transport.netty.NettyTransportChannel.sendResponse(NettyTransportChannel.java:105)
        at org.elasticsearch.transport.netty.NettyTransportChannel.sendResponse(NettyTransportChannel.java:76)
        at org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:526)
        at org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:462)
        at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
[2015-11-24 21:43:55,182][WARN ][indices.cluster          ] [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] [[logstash-2015.10.27][3]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [logstash-2015.10.27][3]: Recovery failed from [abb8401f-0039-4a01-8742-b25a4fdf7e8f][2SsDkHY8S4--qUrfsaKOhg][-node00-][inet[/5.6.7.8:9300]]{max_local_storage_nodes=1} into [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][t3mh4oeOQVq9zQNYiy7Icw][-node01-][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1}
        at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:280)
        at org.elasticsearch.indices.recovery.RecoveryTarget.access$700(RecoveryTarget.java:70)
        at org.elasticsearch.indices.recovery.RecoveryTarget$RecoveryRunner.doRun(RecoveryTarget.java:561)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.transport.TransportException: transport stopped, action: internal:index/shard/recovery/start_recovery
        at org.elasticsearch.transport.TransportService$2.run(TransportService.java:178)
        ... 3 more
[2015-11-24 21:43:55,183][WARN ][cluster.action.shard     ] [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] failed to send failed shard to [abb8401f-0039-4a01-8742-b25a4fdf7e8f][2SsDkHY8S4--qUrfsaKOhg][-node00-][inet[/5.6.7.8:9300]]{max_local_storage_nodes=1}
org.elasticsearch.transport.SendRequestTransportException: [abb8401f-0039-4a01-8742-b25a4fdf7e8f][inet[/5.6.7.8:9300]][internal:cluster/shard/failure]
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:286)
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:249)
        at org.elasticsearch.cluster.action.shard.ShardStateAction.innerShardFailed(ShardStateAction.java:98)
        at org.elasticsearch.cluster.action.shard.ShardStateAction.shardFailed(ShardStateAction.java:85)
        at org.elasticsearch.indices.cluster.IndicesClusterStateService.sendFailShard(IndicesClusterStateService.java:878)
        at org.elasticsearch.indices.cluster.IndicesClusterStateService.failAndRemoveShard(IndicesClusterStateService.java:870)
        at org.elasticsearch.indices.cluster.IndicesClusterStateService.handleRecoveryFailure(IndicesClusterStateService.java:825)
        at org.elasticsearch.indices.cluster.IndicesClusterStateService.access$300(IndicesClusterStateService.java:84)
        at org.elasticsearch.indices.cluster.IndicesClusterStateService$PeerRecoveryListener.onRecoveryFailure(IndicesClusterStateService.java:819)
        at org.elasticsearch.indices.recovery.RecoveryStatus.fail(RecoveryStatus.java:175)
        at org.elasticsearch.indices.recovery.RecoveriesCollection.failRecovery(RecoveriesCollection.java:122)
        at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:280)
        at org.elasticsearch.indices.recovery.RecoveryTarget.access$700(RecoveryTarget.java:70)
        at org.elasticsearch.indices.recovery.RecoveryTarget$RecoveryRunner.doRun(RecoveryTarget.java:561)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.transport.TransportException: TransportService is closed stopped can't send request
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:270)
        ... 17 more
I'm not sure what's wrong with the shard. If there's any way you can help bring my cluster back to a working state, I'd very much appreciate it. If you need any additional information, please let me know. I'm also open to a remote session to get this resolved as quickly as possible. At the moment, node01 has been taken offline to keep things working. node00 is online and working by itself.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Logstash crashing

Post by jolson »

If Logstash is crashing, it's _almost always_ caused by a lack of system memory. Is there any chance that you're running low on memory in your cluster with both instances turned on?

Code: Select all

free -m
I'm also interested in the following from both of your nodes.

Code: Select all

cat /usr/local/nagioslogserver/var/cluster_hosts
cat /usr/local/nagioslogserver/logstash/etc/conf.d/999_outputs.conf
ls -l /var/log/logstash/
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
User avatar
mike4vr
Posts: 89
Joined: Wed Feb 04, 2015 2:23 pm

Re: Logstash crashing

Post by mike4vr »

node00

Code: Select all

# free -m
             total       used       free     shared    buffers     cached
Mem:         32110      31610        499          0        188      13163
-/+ buffers/cache:      18258      13852
Swap:         4031          0       4031
node01 (elasticsearch and logstash are turned off atm)

Code: Select all

# free -m
             total       used       free     shared    buffers     cached
Mem:         32110      15909      16200          0        161      15126
-/+ buffers/cache:        622      31487
Swap:         4031          0       4031
node00

Code: Select all

# cat /usr/local/nagioslogserver/var/cluster_hosts
localhost
5.6.7.8
1.2.3.4

Code: Select all

# cat /usr/local/nagioslogserver/logstash/etc/conf.d/999_outputs.conf
#
# Logstash Configuration File
# Dynamically created by Nagios Log Server
#
# DO NOT EDIT THIS FILE. IT WILL BE OVERWRITTEN.
#
# Created Wed, 13 May 2015 15:43:15 -0700
#

#
# Required output for Nagios Log Server
#

output {
    elasticsearch {
        cluster => '98ec1d45-73e0-4296-ba7f-d7d71953b7e8'
        host => 'localhost'
        document_type => '%{type}'
        node_name => 'abb8401f-0039-4a01-8742-b25a4fdf7e8f'
        protocol => 'transport'
        workers => 4
    }
}

#
# Global outputs
#



#
# Local outputs
#

Code: Select all

# ls -l /var/log/logstash/
'total 4468
-rw-r--r--. 1 nagios nagios 2658222 Nov 25 08:50 logstash.log
-rw-r--r--  1 nagios nagios  440108 Nov 19 03:50 logstash.log-20151119.gz
-rw-r--r--  1 nagios nagios  422600 Nov 20 03:24 logstash.log-20151120.gz
-rw-r--r--  1 nagios nagios  412243 Nov 21 03:24 logstash.log-20151121.gz
-rw-r--r--  1 nagios nagios  128746 Nov 22 03:44 logstash.log-20151122.gz
-rw-r--r--  1 nagios nagios   55936 Nov 23 03:27 logstash.log-20151123.gz
-rw-r--r--  1 nagios nagios  175285 Nov 24 03:16 logstash.log-20151124.gz
-rw-r--r--  1 nagios nagios  264176 Nov 25 03:12 logstash.log-20151125.gz
node01

Code: Select all

 cat /usr/local/nagioslogserver/var/cluster_hosts
localhost
5.6.7.8
1.2.3.4

Code: Select all

# cat /usr/local/nagioslogserver/logstash/etc/conf.d/999_outputs.conf
#
# Logstash Configuration File
# Dynamically created by Nagios Log Server
#
# DO NOT EDIT THIS FILE. IT WILL BE OVERWRITTEN.
#
# Created Wed, 13 May 2015 15:43:16 -0700
#

#
# Required output for Nagios Log Server
#

output {
    elasticsearch {
        cluster => '98ec1d45-73e0-4296-ba7f-d7d71953b7e8'
        host => 'localhost'
        document_type => '%{type}'
        node_name => 'e53e50f6-8704-45ae-96d4-1ef4a0f6e93c'
        protocol => 'transport'
        workers => 4
    }
}

#
# Global outputs
#



#
# Local outputs
#

Code: Select all

# ls -l /var/log/logstash/
total 1164
-rw-r--r--  1 nagios nagios      0 Nov 25 03:29 logstash.log
-rw-r--r--  1 nagios nagios    305 Oct  4 03:40 logstash.log-20151004.gz
-rw-r--r--  1 nagios nagios    144 Oct  6 03:24 logstash.log-20151006.gz
-rw-r--r--  1 nagios nagios    980 Nov 10 03:36 logstash.log-20151110.gz
-rw-r--r--  1 nagios nagios   8156 Nov 21 03:35 logstash.log-20151121.gz
-rw-r--r--  1 nagios nagios  19139 Nov 22 03:16 logstash.log-20151122.gz
-rw-r--r--  1 nagios nagios  14754 Nov 23 03:32 logstash.log-20151123.gz
-rw-r--r--  1 nagios nagios 136462 Nov 25 03:29 logstash.log-20151125.gz
-rw-r--r--. 1 nagios nagios 988603 May 13  2015 logstash.log.old
Memory was 16GB on both, which I upgraded to 32 last night assuming that was the problem. It still crashes after the upgrade.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Logstash crashing

Post by jolson »

Everything looks good with regards to memory and configuration. If you could PM me your full logstash.log file from the active node in your cluster I would appreciate it. I'd also like to see your elasticsearch log:
tail -n500 /var/log/elasticsearch/*.log

What is your cluster health status? (Should be yellow/green):

Code: Select all

curl 'localhost:9200/_cluster/health?level=indices&pretty'
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
User avatar
mike4vr
Posts: 89
Joined: Wed Feb 04, 2015 2:23 pm

Re: Logstash crashing

Post by mike4vr »

PM sent.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Logstash crashing

Post by jolson »

It looks like you have many indexes in a bad state (anything that comes back as 'red' is in a corrupt state). If those indexes are not important to you, please remove them:
curl -XDELETE localhost:9200/$indexname

Replace $indexname with the name of the red index you intend on removing.

Bad indexes can cause a lot of Elasticsearch strange-ness, which in turn can make logstash crash. We need to get your cluster to a yellow/green state before we consider other options. Let me know if removing the indexes is too not possible.

For your reference, a list of red indexes:
logstash-2015.11.24
logstash-2015.11.12
logstash-2015.11.21
logstash-2015.11.20
logstash-2015.11.19
logstash-2015.11.18
logstash-2015.11.17
logstash-2015.11.16
logstash-2015.11.15
logstash-2015.11.14
logstash-2015.11.13
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
User avatar
mike4vr
Posts: 89
Joined: Wed Feb 04, 2015 2:23 pm

Re: Logstash crashing

Post by mike4vr »

- deleted indexes
- stopped elasticsearch and logstash on node00
- started elasticsearch and logstash on node00
log output on node00

Code: Select all

[2015-11-25 11:06:29,710][INFO ][node                     ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] version[1.6.0], pid[15593], build[cdd3ac4/2015-06-09T13:36:34Z]
[2015-11-25 11:06:29,710][INFO ][node                     ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] initializing ...
[2015-11-25 11:06:29,731][INFO ][plugins                  ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] loaded [knapsack-1.5.2.0-f340ad1], sites []
[2015-11-25 11:06:29,801][INFO ][env                      ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] using [1] data paths, mounts [[/ (/dev/mapper/VolGroup-lv_root)]], net usable_space [550.9gb], net total_space [957.6gb], types [ext4]
[2015-11-25 11:06:34,573][INFO ][node                     ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] initialized
[2015-11-25 11:06:34,573][INFO ][node                     ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] starting ...
[2015-11-25 11:06:34,811][INFO ][transport                ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/5.6.7.8:9300]}
[2015-11-25 11:06:34,823][INFO ][discovery                ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] 98ec1d45-73e0-4296-ba7f-d7d71953b7e8/-rPEtSJ5SeeZ5YlsjbwLlA
[2015-11-25 11:06:37,886][INFO ][cluster.service          ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] new_master [abb8401f-0039-4a01-8742-b25a4fdf7e8f][-rPEtSJ5SeeZ5YlsjbwLlA][node00][inet[/5.6.7.8:9300]]{max_local_storage_nodes=1}, reason: zen-disco-join (elected_as_master)
[2015-11-25 11:06:38,135][INFO ][http                     ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] bound_address {inet[/127.0.0.1:9200]}, publish_address {inet[localhost/127.0.0.1:9200]}
[2015-11-25 11:06:38,136][INFO ][node                     ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] started
[2015-11-25 11:06:38,484][INFO ][gateway                  ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] recovered [93] indices into cluster_state
[2015-11-25 11:06:41,653][DEBUG][action.search.type       ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] All shards failed for phase: [query_fetch]
org.elasticsearch.action.NoShardAvailableActionException: [nagioslogserver][0] null
	at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:160)
	at org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction.doExecute(TransportSearchQueryAndFetchAction.java:57)
	at org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction.doExecute(TransportSearchQueryAndFetchAction.java:47)
	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:75)
	at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:104)
	at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43)
	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:75)
	at org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:98)
	at org.elasticsearch.client.FilterClient.execute(FilterClient.java:66)
	at org.elasticsearch.rest.BaseRestHandler$HeadersAndContextCopyClient.execute(BaseRestHandler.java:92)
	at org.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:338)
	at org.elasticsearch.rest.action.search.RestSearchAction.handleRequest(RestSearchAction.java:84)
	at org.elasticsearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:53)
	at org.elasticsearch.rest.RestController.executeHandler(RestController.java:225)
	at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:170)
	at org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121)
	at org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83)
	at org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:327)
	at org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:63)
	at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
	at org.elasticsearch.http.netty.pipelining.HttpPipeliningHandler.messageReceived(HttpPipeliningHandler.java:60)
	at org.elasticsearch.common.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
	at org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145)
	at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
	at org.elasticsearch.common.netty.handler.codec.http.HttpContentDecoder.messageReceived(HttpContentDecoder.java:108)
	at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
	at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
	at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
	at org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
	at org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
	at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
	at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
	at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
	at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
	at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
	at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
	at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
	at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
	at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
	at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
	at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
[2015-11-25 11:06:41,672][DEBUG][action.search.type       ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] All shards failed for phase: [query_fetch]
org.elasticsearch.action.NoShardAvailableActionException: [nagioslogserver][0] null
	at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:160)
	at org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction.doExecute(TransportSearchQueryAndFetchAction.java:57)
	at org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction.doExecute(TransportSearchQueryAndFetchAction.java:47)
	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:75)
	at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:104)
	at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43)
	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:75)
	at org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:98)
	at org.elasticsearch.client.FilterClient.execute(FilterClient.java:66)
	at org.elasticsearch.rest.BaseRestHandler$HeadersAndContextCopyClient.execute(BaseRestHandler.java:92)
	at org.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:338)
	at org.elasticsearch.rest.action.search.RestSearchAction.handleRequest(RestSearchAction.java:84)
	at org.elasticsearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:53)
	at org.elasticsearch.rest.RestController.executeHandler(RestController.java:225)
	at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:170)
	at org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121)
	at org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83)
	at org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:327)
	at org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:63)
	at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
	at org.elasticsearch.http.netty.pipelining.HttpPipeliningHandler.messageReceived(HttpPipeliningHandler.java:60)
	at org.elasticsearch.common.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
	at org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145)
	at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
	at org.elasticsearch.common.netty.handler.codec.http.HttpContentDecoder.messageReceived(HttpContentDecoder.java:108)
	at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
	at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
	at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
	at org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
	at org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
	at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
	at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
	at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
	at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
	at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
	at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
	at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
	at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
	at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
	at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
	at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
- started elasticsearch and logstash on node01
log output on node01

Code: Select all

[2015-11-25 11:07:03,984][INFO ][node                     ] [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] version[1.6.0], pid[12310], build[cdd3ac4/2015-06-09T13:36:34Z]
[2015-11-25 11:07:03,985][INFO ][node                     ] [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] initializing ...
[2015-11-25 11:07:04,010][INFO ][plugins                  ] [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] loaded [knapsack-1.5.2.0-f340ad1], sites []
[2015-11-25 11:07:04,091][INFO ][env                      ] [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] using [1] data paths, mounts [[/ (/dev/mapper/VolGroup-lv_root)]], net usable_space [526gb], net total_space [956.6gb], types [ext4]
[2015-11-25 11:07:09,076][INFO ][node                     ] [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] initialized
[2015-11-25 11:07:09,076][INFO ][node                     ] [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] starting ...
[2015-11-25 11:07:09,358][INFO ][transport                ] [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/1.2.3.4:9300]}
[2015-11-25 11:07:09,383][INFO ][discovery                ] [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] 98ec1d45-73e0-4296-ba7f-d7d71953b7e8/WgF7jnRzTTOXPuDEuOr4FA
[2015-11-25 11:07:12,585][INFO ][cluster.service          ] [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] detected_master [abb8401f-0039-4a01-8742-b25a4fdf7e8f][-rPEtSJ5SeeZ5YlsjbwLlA][node00][inet[/5.6.7.8:9300]]{max_local_storage_nodes=1}, added {[abb8401f-0039-4a01-8742-b25a4fdf7e8f][-rPEtSJ5SeeZ5YlsjbwLlA][node00][inet[/5.6.7.8:9300]]{max_local_storage_nodes=1},}, reason: zen-disco-receive(from master [[abb8401f-0039-4a01-8742-b25a4fdf7e8f][-rPEtSJ5SeeZ5YlsjbwLlA][node00][inet[/5.6.7.8:9300]]{max_local_storage_nodes=1}])
[2015-11-25 11:07:12,944][INFO ][http                     ] [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] bound_address {inet[/127.0.0.1:9200]}, publish_address {inet[localhost/127.0.0.1:9200]}
[2015-11-25 11:07:12,944][INFO ][node                     ] [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] started
[2015-11-25 11:09:24,093][WARN ][indices.cluster          ] [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] [[kibana-int][0]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [kibana-int][0]: Recovery failed from [abb8401f-0039-4a01-8742-b25a4fdf7e8f][-rPEtSJ5SeeZ5YlsjbwLlA][node00][inet[/5.6.7.8:9300]]{max_local_storage_nodes=1} into [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][WgF7jnRzTTOXPuDEuOr4FA][node01][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1}
	at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:280)
	at org.elasticsearch.indices.recovery.RecoveryTarget.access$700(RecoveryTarget.java:70)
	at org.elasticsearch.indices.recovery.RecoveryTarget$RecoveryRunner.doRun(RecoveryTarget.java:561)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.transport.RemoteTransportException: [abb8401f-0039-4a01-8742-b25a4fdf7e8f][inet[/5.6.7.8:9300]][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: [kibana-int][0] Phase[2] Execution failed
	at org.elasticsearch.index.engine.InternalEngine.recover(InternalEngine.java:917)
	at org.elasticsearch.index.shard.IndexShard.recover(IndexShard.java:780)
	at org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:125)
	at org.elasticsearch.indices.recovery.RecoverySource.access$200(RecoverySource.java:49)
	at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:146)
	at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:132)
	at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.transport.RemoteTransportException: [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][inet[/1.2.3.4:9300]][internal:index/shard/recovery/prepare_translog]
Caused by: org.elasticsearch.index.translog.TranslogException: [kibana-int][0] failed to create new translog file
	at org.elasticsearch.index.translog.fs.FsTranslog.newTranslog(FsTranslog.java:253)
	at org.elasticsearch.index.engine.InternalEngine.createSearcherManager(InternalEngine.java:199)
	at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:156)
	at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:32)
	at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:1351)
	at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:1346)
	at org.elasticsearch.index.shard.IndexShard.prepareForTranslogRecovery(IndexShard.java:866)
	at org.elasticsearch.indices.recovery.RecoveryTarget$PrepareForTranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:307)
	at org.elasticsearch.indices.recovery.RecoveryTarget$PrepareForTranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:290)
	at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: /usr/local/nagioslogserver/elasticsearch/data/98ec1d45-73e0-4296-ba7f-d7d71953b7e8/nodes/0/indices/kibana-int/0/translog/translog-1431537652179 (Permission denied)
	at java.io.RandomAccessFile.open(Native Method)
	at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
	at org.elasticsearch.index.translog.fs.RafReference.<init>(RafReference.java:49)
	at org.elasticsearch.index.translog.fs.FsTranslog.newTranslog(FsTranslog.java:251)
	... 13 more
[2015-11-25 11:09:42,341][WARN ][indices.cluster          ] [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] [[kibana-int][1]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [kibana-int][1]: Recovery failed from [abb8401f-0039-4a01-8742-b25a4fdf7e8f][-rPEtSJ5SeeZ5YlsjbwLlA][node00][inet[/5.6.7.8:9300]]{max_local_storage_nodes=1} into [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][WgF7jnRzTTOXPuDEuOr4FA][node01][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1}
	at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:280)
	at org.elasticsearch.indices.recovery.RecoveryTarget.access$700(RecoveryTarget.java:70)
	at org.elasticsearch.indices.recovery.RecoveryTarget$RecoveryRunner.doRun(RecoveryTarget.java:561)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.transport.RemoteTransportException: [abb8401f-0039-4a01-8742-b25a4fdf7e8f][inet[/5.6.7.8:9300]][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: [kibana-int][1] Phase[2] Execution failed
	at org.elasticsearch.index.engine.InternalEngine.recover(InternalEngine.java:917)
	at org.elasticsearch.index.shard.IndexShard.recover(IndexShard.java:780)
	at org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:125)
	at org.elasticsearch.indices.recovery.RecoverySource.access$200(RecoverySource.java:49)
	at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:146)
	at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:132)
	at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.transport.RemoteTransportException: [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][inet[/1.2.3.4:9300]][internal:index/shard/recovery/prepare_translog]
Caused by: org.elasticsearch.index.translog.TranslogException: [kibana-int][1] failed to create new translog file
	at org.elasticsearch.index.translog.fs.FsTranslog.newTranslog(FsTranslog.java:253)
	at org.elasticsearch.index.engine.InternalEngine.createSearcherManager(InternalEngine.java:199)
	at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:156)
	at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:32)
	at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:1351)
	at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:1346)
	at org.elasticsearch.index.shard.IndexShard.prepareForTranslogRecovery(IndexShard.java:866)
	at org.elasticsearch.indices.recovery.RecoveryTarget$PrepareForTranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:307)
	at org.elasticsearch.indices.recovery.RecoveryTarget$PrepareForTranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:290)
	at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: /usr/local/nagioslogserver/elasticsearch/data/98ec1d45-73e0-4296-ba7f-d7d71953b7e8/nodes/0/indices/kibana-int/1/translog/translog-1431537652243 (Permission denied)
	at java.io.RandomAccessFile.open(Native Method)
	at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
	at org.elasticsearch.index.translog.fs.RafReference.<init>(RafReference.java:49)
	at org.elasticsearch.index.translog.fs.FsTranslog.newTranslog(FsTranslog.java:251)
	... 13 more
log output on node00 after node01 was brought online

Code: Select all

[2015-11-25 11:07:12,457][INFO ][cluster.service          ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] added {[e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][WgF7jnRzTTOXPuDEuOr4FA][node01][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1},}, reason: zen-disco-receive(join from node[[e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][WgF7jnRzTTOXPuDEuOr4FA][node01][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1}])
[2015-11-25 11:09:23,268][INFO ][indices.recovery         ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] Recovery with sync ID 8162792 numDocs: 8162792 vs. true
[2015-11-25 11:09:24,013][INFO ][indices.recovery         ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] Recovery with sync ID 4 numDocs: 4 vs. true
[2015-11-25 11:09:24,104][WARN ][cluster.action.shard     ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] [kibana-int][0] received shard failed for [kibana-int][0], node[WgF7jnRzTTOXPuDEuOr4FA], [R], s[INITIALIZING], indexUUID [RZ7Jphw1RPSwKpaGWDrLGg], reason [shard failure [failed recovery][RecoveryFailedException[[kibana-int][0]: Recovery failed from [abb8401f-0039-4a01-8742-b25a4fdf7e8f][-rPEtSJ5SeeZ5YlsjbwLlA][node00][inet[/5.6.7.8:9300]]{max_local_storage_nodes=1} into [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][WgF7jnRzTTOXPuDEuOr4FA][node01][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1}]; nested: RemoteTransportException[[abb8401f-0039-4a01-8742-b25a4fdf7e8f][inet[/5.6.7.8:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[kibana-int][0] Phase[2] Execution failed]; nested: RemoteTransportException[[e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][inet[/1.2.3.4:9300]][internal:index/shard/recovery/prepare_translog]]; nested: TranslogException[[kibana-int][0] failed to create new translog file]; nested: FileNotFoundException[/usr/local/nagioslogserver/elasticsearch/data/98ec1d45-73e0-4296-ba7f-d7d71953b7e8/nodes/0/indices/kibana-int/0/translog/translog-1431537652179 (Permission denied)]; ]]
[2015-11-25 11:09:42,308][INFO ][indices.recovery         ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] Recovery with sync ID 0 numDocs: 0 vs. true
[2015-11-25 11:09:42,348][WARN ][cluster.action.shard     ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] [kibana-int][1] received shard failed for [kibana-int][1], node[WgF7jnRzTTOXPuDEuOr4FA], [R], s[INITIALIZING], indexUUID [RZ7Jphw1RPSwKpaGWDrLGg], reason [shard failure [failed recovery][RecoveryFailedException[[kibana-int][1]: Recovery failed from [abb8401f-0039-4a01-8742-b25a4fdf7e8f][-rPEtSJ5SeeZ5YlsjbwLlA][node00][inet[/5.6.7.8:9300]]{max_local_storage_nodes=1} into [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][WgF7jnRzTTOXPuDEuOr4FA][node01][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1}]; nested: RemoteTransportException[[abb8401f-0039-4a01-8742-b25a4fdf7e8f][inet[/5.6.7.8:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[kibana-int][1] Phase[2] Execution failed]; nested: RemoteTransportException[[e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][inet[/1.2.3.4:9300]][internal:index/shard/recovery/prepare_translog]]; nested: TranslogException[[kibana-int][1] failed to create new translog file]; nested: FileNotFoundException[/usr/local/nagioslogserver/elasticsearch/data/98ec1d45-73e0-4296-ba7f-d7d71953b7e8/nodes/0/indices/kibana-int/1/translog/translog-1431537652243 (Permission denied)]; ]]
[2015-11-25 11:09:42,934][INFO ][indices.recovery         ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] Recovery with sync ID 8162700 numDocs: 8162700 vs. true
[2015-11-25 11:09:43,672][INFO ][indices.recovery         ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] Recovery with sync ID 8161020 numDocs: 8161020 vs. true
[2015-11-25 11:09:49,094][INFO ][indices.recovery         ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] Recovery with sync ID 4 numDocs: 4 vs. true
[2015-11-25 11:09:49,145][WARN ][cluster.action.shard     ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] [kibana-int][4] received shard failed for [kibana-int][4], node[WgF7jnRzTTOXPuDEuOr4FA], [R], s[INITIALIZING], indexUUID [RZ7Jphw1RPSwKpaGWDrLGg], reason [shard failure [failed recovery][RecoveryFailedException[[kibana-int][4]: Recovery failed from [abb8401f-0039-4a01-8742-b25a4fdf7e8f][-rPEtSJ5SeeZ5YlsjbwLlA][node00][inet[/5.6.7.8:9300]]{max_local_storage_nodes=1} into [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][WgF7jnRzTTOXPuDEuOr4FA][node01][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1}]; nested: RemoteTransportException[[abb8401f-0039-4a01-8742-b25a4fdf7e8f][inet[/5.6.7.8:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[kibana-int][4] Phase[2] Execution failed]; nested: RemoteTransportException[[e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][inet[/1.2.3.4:9300]][internal:index/shard/recovery/prepare_translog]]; nested: TranslogException[[kibana-int][4] failed to create new translog file]; nested: FileNotFoundException[/usr/local/nagioslogserver/elasticsearch/data/98ec1d45-73e0-4296-ba7f-d7d71953b7e8/nodes/0/indices/kibana-int/4/translog/translog-1431537652325 (Permission denied)]; ]]
[2015-11-25 11:09:49,716][INFO ][indices.recovery         ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] Recovery with sync ID 4 numDocs: 4 vs. true
also getting this echoed out to my shell, for some reason on node01

Code: Select all

# Nov 25, 2015 11:09:37 AM org.elasticsearch.client.transport.TransportClientNodesService$SimpleNodeSampler doSample
INFO: [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] failed to get node info for [#transport#-1][node01][inet[localhost/127.0.0.1:9300]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[localhost/127.0.0.1:9300]][cluster:monitor/nodes/info] request_id [26] timed out after [5000ms]
	at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:529)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

Nov 25, 2015 11:09:37 AM org.elasticsearch.client.transport.TransportClientNodesService$SimpleNodeSampler doSample
INFO: [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] failed to get node info for [#transport#-1][node01][inet[localhost/127.0.0.1:9300]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[localhost/127.0.0.1:9300]][cluster:monitor/nodes/info] request_id [123] timed out after [5000ms]
	at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:529)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

Nov 25, 2015 11:09:37 AM org.elasticsearch.client.transport.TransportClientNodesService$SimpleNodeSampler doSample
INFO: [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] failed to get node info for [#transport#-1][node01][inet[localhost/127.0.0.1:9300]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[localhost/127.0.0.1:9300]][cluster:monitor/nodes/info] request_id [121] timed out after [5001ms]
	at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:529)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

Nov 25, 2015 11:09:37 AM org.elasticsearch.client.transport.TransportClientNodesService$SimpleNodeSampler doSample
INFO: [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] failed to get node info for [#transport#-1][lnode01][inet[localhost/127.0.0.1:9300]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[localhost/127.0.0.1:9300]][cluster:monitor/nodes/info] request_id [122] timed out after [5003ms]
	at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:529)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

Nov 25, 2015 11:09:37 AM org.elasticsearch.client.transport.TransportClientNodesService$SimpleNodeSampler doSample
INFO: [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] failed to get node info for [#transport#-1][node01][inet[localhost/127.0.0.1:9300]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[localhost/127.0.0.1:9300]][cluster:monitor/nodes/info] request_id [126] timed out after [5002ms]
	at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:529)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Logstash hasn't crashed, yet. But all these errors are concerning and very confusing.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Logstash crashing

Post by jolson »

The errors have to do with elasticsearch trying to query before all of your indexes were loaded into memory - nothing to be concerned about unless they keep on going after your cluster hits a green state.

I'm betting that your ES logs are pretty quiet now - is that true? Check your logstash logs as well - they should both be relatively quiet (they only log errors).

Let me know if you continue to experience crashing. Thanks!
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
User avatar
mike4vr
Posts: 89
Joined: Wed Feb 04, 2015 2:23 pm

Re: Logstash crashing

Post by mike4vr »

They're not quiet at all, unfortunately.
elasticsearch log in node01 throws this several times a second:

Code: Select all

[2015-11-25 12:21:28,873][WARN ][indices.cluster          ] [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c] [[kibana-int][3]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [kibana-int][3]: Recovery failed from [abb8401f-0039-4a01-8742-b25a4fdf7e8f][-rPEtSJ5SeeZ5YlsjbwLlA][node00][inet[/5.6.7.8:9300]]{max_local_storage_nodes=1} into [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][WgF7jnRzTTOXPuDEuOr4FA][node00][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1}
	at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:280)
	at org.elasticsearch.indices.recovery.RecoveryTarget.access$700(RecoveryTarget.java:70)
	at org.elasticsearch.indices.recovery.RecoveryTarget$RecoveryRunner.doRun(RecoveryTarget.java:561)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.transport.RemoteTransportException: [abb8401f-0039-4a01-8742-b25a4fdf7e8f][inet[/5.6.7.8:9300]][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: [kibana-int][3] Phase[2] Execution failed
	at org.elasticsearch.index.engine.InternalEngine.recover(InternalEngine.java:917)
	at org.elasticsearch.index.shard.IndexShard.recover(IndexShard.java:780)
	at org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:125)
	at org.elasticsearch.indices.recovery.RecoverySource.access$200(RecoverySource.java:49)
	at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:146)
	at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:132)
	at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.transport.RemoteTransportException: [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][inet[/1.2.3.4:9300]][internal:index/shard/recovery/prepare_translog]
Caused by: org.elasticsearch.index.translog.TranslogException: [kibana-int][3] failed to create new translog file
	at org.elasticsearch.index.translog.fs.FsTranslog.newTranslog(FsTranslog.java:253)
	at org.elasticsearch.index.engine.InternalEngine.createSearcherManager(InternalEngine.java:199)
	at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:156)
	at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:32)
	at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:1351)
	at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:1346)
	at org.elasticsearch.index.shard.IndexShard.prepareForTranslogRecovery(IndexShard.java:866)
	at org.elasticsearch.indices.recovery.RecoveryTarget$PrepareForTranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:307)
	at org.elasticsearch.indices.recovery.RecoveryTarget$PrepareForTranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:290)
	at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: /usr/local/nagioslogserver/elasticsearch/data/98ec1d45-73e0-4296-ba7f-d7d71953b7e8/nodes/0/indices/kibana-int/3/translog/translog-1431537652221 (Permission denied)
	at java.io.RandomAccessFile.open(Native Method)
	at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
	at org.elasticsearch.index.translog.fs.RafReference.<init>(RafReference.java:49)
	at org.elasticsearch.index.translog.fs.FsTranslog.newTranslog(FsTranslog.java:251)
	... 13 more
elasticsearch logs on node00 throw this several times every second.

Code: Select all

[2015-11-25 12:22:31,328][WARN ][cluster.action.shard     ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] [kibana-int][2] received shard failed for [kibana-int][2], node[WgF7jnRzTTOXPuDEuOr4FA], [R], s[INITIALIZING], indexUUID [RZ7Jphw1RPSwKpaGWDrLGg], reason [shard failure [failed recovery][RecoveryFailedException[[kibana-int][2]: Recovery failed from [abb8401f-0039-4a01-8742-b25a4fdf7e8f][-rPEtSJ5SeeZ5YlsjbwLlA][node00][inet[/5.6.7.8:9300]]{max_local_storage_nodes=1} into [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][WgF7jnRzTTOXPuDEuOr4FA][node01][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1}]; nested: RemoteTransportException[[abb8401f-0039-4a01-8742-b25a4fdf7e8f][inet[/5.6.7.8:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[kibana-int][2] Phase[2] Execution failed]; nested: RemoteTransportException[[e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][inet[/1.2.3.4:9300]][internal:index/shard/recovery/prepare_translog]]; nested: TranslogException[[kibana-int][2] failed to create new translog file]; nested: FileNotFoundException[/usr/local/nagioslogserver/elasticsearch/data/98ec1d45-73e0-4296-ba7f-d7d71953b7e8/nodes/0/indices/kibana-int/2/translog/translog-1431537652277 (Permission denied)]; ]]
[2015-11-25 12:22:31,854][INFO ][indices.recovery         ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] Recovery with sync ID 4 numDocs: 4 vs. true
[2015-11-25 12:22:31,867][WARN ][cluster.action.shard     ] [abb8401f-0039-4a01-8742-b25a4fdf7e8f] [kibana-int][0] received shard failed for [kibana-int][0], node[WgF7jnRzTTOXPuDEuOr4FA], [R], s[INITIALIZING], indexUUID [RZ7Jphw1RPSwKpaGWDrLGg], reason [shard failure [failed recovery][RecoveryFailedException[[kibana-int][0]: Recovery failed from [abb8401f-0039-4a01-8742-b25a4fdf7e8f][-rPEtSJ5SeeZ5YlsjbwLlA][node00][inet[/5.6.7.8:9300]]{max_local_storage_nodes=1} into [e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][WgF7jnRzTTOXPuDEuOr4FA][node01][inet[/1.2.3.4:9300]]{max_local_storage_nodes=1}]; nested: RemoteTransportException[[abb8401f-0039-4a01-8742-b25a4fdf7e8f][inet[/5.6.7.8:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[kibana-int][0] Phase[2] Execution failed]; nested: RemoteTransportException[[e53e50f6-8704-45ae-96d4-1ef4a0f6e93c][inet[/1.2.3.4:9300]][internal:index/shard/recovery/prepare_translog]]; nested: TranslogException[[kibana-int][0] failed to create new translog file]; nested: FileNotFoundException[/usr/local/nagioslogserver/elasticsearch/data/98ec1d45-73e0-4296-ba7f-d7d71953b7e8/nodes/0/indices/kibana-int/0/translog/translog-1431537652179 (Permission denied)]; ]]
Logstash is quiet on node00 but I get this in node01

Code: Select all

{:timestamp=>"2015-11-25T11:40:29.764000-0800", :message=>"Failed to flush outgoing items", :outgoing_count=>538, :exception=>org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available: [], :backtrace=>["org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(org/elasticsearch/client/transport/TransportClientNodesService.java:279)", "org.elasticsearch.client.transport.TransportClientNodesService.execute(org/elasticsearch/client/transport/TransportClientNodesService.java:198)", "org.elasticsearch.client.transport.support.InternalTransportClient.execute(org/elasticsearch/client/transport/support/InternalTransportClient.java:106)", "org.elasticsearch.client.support.AbstractClient.bulk(org/elasticsearch/client/support/AbstractClient.java:163)", "org.elasticsearch.client.transport.TransportClient.bulk(org/elasticsearch/client/transport/TransportClient.java:356)", "org.elasticsearch.action.bulk.BulkRequestBuilder.doExecute(org/elasticsearch/action/bulk/BulkRequestBuilder.java:164)", "org.elasticsearch.action.ActionRequestBuilder.execute(org/elasticsearch/action/ActionRequestBuilder.java:91)", "org.elasticsearch.action.ActionRequestBuilder.execute(org/elasticsearch/action/ActionRequestBuilder.java:65)", "java.lang.reflect.Method.invoke(java/lang/reflect/Method.java:606)", "LogStash::Outputs::Elasticsearch::Protocols::NodeClient.bulk(/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-0.2.8-java/lib/logstash/outputs/elasticsearch/protocol.rb:224)", "LogStash::Outputs::Elasticsearch::Protocols::NodeClient.bulk(/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-0.2.8-java/lib/logstash/outputs/elasticsearch/protocol.rb:224)", "LogStash::Outputs::ElasticSearch.submit(/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-0.2.8-java/lib/logstash/outputs/elasticsearch.rb:466)", "LogStash::Outputs::ElasticSearch.submit(/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-0.2.8-java/lib/logstash/outputs/elasticsearch.rb:466)", "LogStash::Outputs::ElasticSearch.submit(/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-0.2.8-java/lib/logstash/outputs/elasticsearch.rb:465)", "LogStash::Outputs::ElasticSearch.submit(/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-0.2.8-java/lib/logstash/outputs/elasticsearch.rb:465)", "LogStash::Outputs::ElasticSearch.flush(/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-0.2.8-java/lib/logstash/outputs/elasticsearch.rb:490)", "LogStash::Outputs::ElasticSearch.flush(/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-0.2.8-java/lib/logstash/outputs/elasticsearch.rb:490)", "LogStash::Outputs::ElasticSearch.flush(/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-0.2.8-java/lib/logstash/outputs/elasticsearch.rb:489)", "LogStash::Outputs::ElasticSearch.flush(/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-0.2.8-java/lib/logstash/outputs/elasticsearch.rb:489)", "Stud::Buffer.buffer_flush(/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.19/lib/stud/buffer.rb:219)", "Stud::Buffer.buffer_flush(/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.19/lib/stud/buffer.rb:219)", "org.jruby.RubyHash.each(org/jruby/RubyHash.java:1341)", "Stud::Buffer.buffer_flush(/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.19/lib/stud/buffer.rb:216)", "Stud::Buffer.buffer_flush(/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.19/lib/stud/buffer.rb:216)", "Stud::Buffer.buffer_flush(/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.19/lib/stud/buffer.rb:193)", "Stud::Buffer.buffer_flush(/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.19/lib/stud/buffer.rb:193)", "RUBY.buffer_initialize(/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.19/lib/stud/buffer.rb:112)", "org.jruby.RubyKernel.loop(org/jruby/RubyKernel.java:1511)", "RUBY.buffer_initialize(/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.19/lib/stud/buffer.rb:110)", "java.lang.Thread.run(java/lang/Thread.java:745)"], :level=>:warn}
Also, the cluster status still shows RED.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Logstash crashing

Post by jolson »

I think a remote session is in order so that I can see the problem and do some troubleshooting in real time. Could you send us an email at [email protected] and reference this thread please? I'll pick up the ticket and we can get a remote scheduled. Thanks!
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
Locked