Logstash Crashing after 2015R2.0 update

polarbear1 · Post by **polarbear1** » Mon Jul 20, 2015 9:25 am

I've been running nice and happy on the previous stable release and all config remains the same, but ever since updating to the new release my logstash has been crashing fairly regularly. I have 2 nodes in my cluster, and they are both doing this.

crash:

Code: Select all

 Exception in thread "input|syslog|tcp|192.168.1.52:50526}" java.lang.ArrayIndexOutOfBoundsException: -1
        at org.jruby.runtime.ThreadContext.popRubyClass(ThreadContext.java:702)
        at org.jruby.runtime.ThreadContext.postYield(ThreadContext.java:1269)
        at org.jruby.runtime.ContextAwareBlockBody.post(ContextAwareBlockBody.java:29)
        at org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:198)
        at org.jruby.runtime.Interpreted19Block.call(Interpreted19Block.java:125)
        at org.jruby.runtime.Block.call(Block.java:101)
        at org.jruby.RubyProc.call(RubyProc.java:290)
        at org.jruby.RubyProc.call(RubyProc.java:228)
        at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:99)
        at java.lang.Thread.run(Thread.java:745)
ConcurrencyError: interrupted waiting for mutex: null
                       lock at org/jruby/ext/thread/Mutex.java:94
          execute_task_once at /usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/concurrent-ruby-0.8.0-java/lib/concurrent/delay.rb:83
                       wait at /usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/concurrent-ruby-0.8.0-java/lib/concurrent/delay.rb:60
                      value at /usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/concurrent-ruby-0.8.0-java/lib/concurrent/obligation.rb:47
           global_timer_set at /usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/concurrent-ruby-0.8.0-java/lib/concurrent/configuration.rb:58
  finalize_global_executors at /usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/concurrent-ruby-0.8.0-java/lib/concurrent/configuration.rb:137
                 Concurrent at /usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/concurrent-ruby-0.8.0-java/lib/concurrent/configuration.rb:165
Error: Your application used more memory than the safety cap of 500M.
Specify -J-Xmx####m to increase it (#### = cap size in MB).
Specify -w for full OutOfMemoryError stack trace

Code: Select all

Jul 20, 2015 9:36:19 AM org.elasticsearch.plugins.PluginsService <init>
INFO: [ea9ddcd0-c0a5-4d5d-a802-e741d9c51a5b] loaded [], sites []
Jul 20, 2015 9:36:21 AM org.elasticsearch.plugins.PluginsService <init>
INFO: [ea9ddcd0-c0a5-4d5d-a802-e741d9c51a5b] loaded [], sites []
Jul 20, 2015 9:36:21 AM org.elasticsearch.plugins.PluginsService <init>
INFO: [ea9ddcd0-c0a5-4d5d-a802-e741d9c51a5b] loaded [], sites []
Jul 20, 2015 9:36:21 AM org.elasticsearch.plugins.PluginsService <init>
INFO: [ea9ddcd0-c0a5-4d5d-a802-e741d9c51a5b] loaded [], sites []
Jul 20, 2015 9:36:21 AM org.elasticsearch.plugins.PluginsService <init>
INFO: [ea9ddcd0-c0a5-4d5d-a802-e741d9c51a5b] loaded [], sites []
Jul 20, 2015 9:38:39 AM org.elasticsearch.transport.netty.NettyInternalESLogger warn
WARNING: Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: GC overhead limit exceeded

Jul 20, 2015 9:37:57 AM org.elasticsearch.transport.netty.NettyInternalESLogger warn
WARNING: Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: GC overhead limit exceeded

Error: Your application used more memory than the safety cap of 500M.Error: Your application used more memory than the safety cap of 500M.
Jul 20, 2015 9:39:31 AM org.elasticsearch.transport.netty.NettyTransport exceptionCaught
WARNING: [ea9ddcd0-c0a5-4d5d-a802-e741d9c51a5b] exception caught on transport layer [[id: 0x1a0c7404, /127.0.0.1:55679 => localhost/127.0.0.1:9300]], closing connection
java.lang.OutOfMemoryError: Java heap space

Jul 20, 2015 9:39:31 AM org.elasticsearch.transport.netty.NettyTransport exceptionCaught
WARNING: [ea9ddcd0-c0a5-4d5d-a802-e741d9c51a5b] exception caught on transport layer [[id: 0x7c0fc4f6, /127.0.0.1:55740 => localhost/127.0.0.1:9300]], closing connection
java.lang.OutOfMemoryError: GC overhead limit exceeded

Jul 20, 2015 9:39:31 AM org.elasticsearch.client.transport.TransportClientNodesService$SimpleNodeSampler doSample
INFO: [ea9ddcd0-c0a5-4d5d-a802-e741d9c51a5b] failed to get node info for [#transport#-1][schpnag1][inet[localhost/127.0.0.1:9300]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[localhost/127.0.0.1:9300]][cluster:monitor/nodes/info] request_id [10] timed out after [24496ms]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:529)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

So obviously "OutOfMemory", but we are using 8GB for ES_HEAP_SIZE (1/2 the 16GB total in the machine) and it was stable for the longest time before the update.

/etc/sysconfig/elasticsearch:

Code: Select all

# Directory where the Elasticsearch binary distribution resides
APP_DIR="/usr/local/nagioslogserver"
ES_HOME="$APP_DIR/elasticsearch"

# Heap Size (defaults to 256m min, 1g max)
ES_HEAP_SIZE=8g

# Heap new generation
#ES_HEAP_NEWSIZE=

# max direct memory
#ES_DIRECT_SIZE=

# Additional Java OPTS
#ES_JAVA_OPTS=

# Maximum number of open files
MAX_OPEN_FILES=65535

# Maximum amount of locked memory
MAX_LOCKED_MEMORY=unlimited

# Maximum number of VMA (Virtual Memory Areas) a process can own
MAX_MAP_COUNT=262144

# Elasticsearch log directory
LOG_DIR=/var/log/elasticsearch

# Elasticsearch data directory
DATA_DIR="/nagios/data"

# Elasticsearch work directory
WORK_DIR="$APP_DIR/tmp/elasticsearch"

# Elasticsearch conf directory
CONF_DIR="$ES_HOME/config"

# Elasticsearch configuration file (elasticsearch.yml)
CONF_FILE="$ES_HOME/config/elasticsearch.yml"

# User to run as, change this to a specific elasticsearch user if possible
# Also make sure, this user can write into the log directories in case you change them
# This setting only works for the init script, but has to be configured separately for systemd startup
ES_USER=nagios
ES_GROUP=nagios

# Configure restart on package upgrade (true, every other setting will lead to not restarting)
#RESTART_ON_UPGRADE=true

if [ "x$1" == "xstart" -o "x$1" == "xrestart" -o "x$1" == "xreload" -o "x$1" == "xforce-reload" ];then
        GET_ES_CONFIG_MESSAGE="$( php $APP_DIR/scripts/get_es_config.php )"
        GET_ES_CONFIG_RETURN=$?

        if [ "$GET_ES_CONFIG_RETURN" != "0" ]; then
                echo $GET_ES_CONFIG_MESSAGE
                exit 1
        else
                ES_JAVA_OPTS="$GET_ES_CONFIG_MESSAGE"
        fi
fi

/etc/sysconfig/logstash:

Code: Select all

###############################
# Default settings for logstash
###############################

# Override Java location
#JAVACMD=/usr/bin/java

# Set a home directory
APP_DIR=/usr/local/nagioslogserver
LS_HOME="$APP_DIR/logstash"

# set ES_CLUSTER
ES_CLUSTER=$(cat $APP_DIR/var/cluster_uuid)

# Arguments to pass to java
#LS_HEAP_SIZE="256m"
LS_JAVA_OPTS="-Djava.io.tmpdir=$APP_DIR/tmp"

# Logstash filter worker threads
#LS_WORKER_THREADS=1

# pidfiles aren't used for upstart; this is for sysv users.
#LS_PIDFILE=/var/run/logstash.pid

# user id to be invoked as; for upstart: edit /etc/init/logstash.conf
LS_USER=nagios
LS_GROUP=nagios

# logstash logging
#LS_LOG_FILE=/var/log/logstash/logstash.log
#LS_USE_GC_LOGGING="true"

# logstash configuration directory
LS_CONF_DIR="$LS_HOME/etc/conf.d"

# Open file limit; cannot be overridden in upstart
#LS_OPEN_FILES=2048

# Nice level
#LS_NICE=0

# Increase Filter workers to 4 threads
LS_OPTS=" -w 4"

if [ "x$1" == "xstart" -o "x$1" == "xrestart" -o "x$1" == "xreload" ];then
        GET_LOGSTASH_CONFIG_MESSAGE=$( php /usr/local/nagioslogserver/scripts/get_logstash_config.php )
        GET_LOGSTASH_CONFIG_RETURN=$?
        if [ "$GET_LOGSTASH_CONFIG_RETURN" != "0" ]; then
                echo $GET_LOGSTASH_CONFIG_MESSAGE
                exit 1
        fi

jolson · Post by **jolson** » Mon Jul 20, 2015 9:40 am

What does your configuration look like? Some plugins are known to have memory leak errors - particularly the multiline codec.

Code: Select all

cat /usr/local/nagioslogserver/logstash/etc/conf.d/*

See if increasing the HEAP_SIZE of logstash helps.

Open up your logstash configuration file:

Code: Select all

vi /etc/sysconfig/logstash

Change:
#LS_HEAP_SIZE="256m"

To:
LS_HEAP_SIZE="1024m"

Restart logstash:
service logstash restart

Let me know if this improves the reliability of logstash. Thanks!

polarbear1 · Post by **polarbear1** » Mon Jul 20, 2015 9:44 am

Code: Select all

root@schpnag1 scripts]# cat /usr/local/nagioslogserver/logstash/etc/conf.d/*
#
# Logstash Configuration File
# Dynamically created by Nagios Log Server
#
# DO NOT EDIT THIS FILE. IT WILL BE OVERWRITTEN.
#
# Created Thu, 16 Jul 2015 14:00:14 -0500
#

#
# Global inputs
#

input {
    syslog {
        type => 'syslog'
        port => 5544
    }
    tcp {
        type => 'eventlog'
        port => 3515
        codec => json {
            charset => 'CP1252'
        }
    }
    tcp {
        type => 'import_raw'
        tags => 'import_raw'
        port => 2056
    }
    tcp {
        type => 'import_json'
        tags => 'import_json'
        port => 2057
        codec => json
    }
}

#
# Local inputs
#


#
# Logstash Configuration File
# Dynamically created by Nagios Log Server
#
# DO NOT EDIT THIS FILE. IT WILL BE OVERWRITTEN.
#
# Created Thu, 16 Jul 2015 14:00:14 -0500
#

#
# Global filters
#

filter {
    if [program] == 'apache_access' {
        grok {
            match => [ 'message', '%{COMBINEDAPACHELOG}']
        }
        date {
            match => [ 'timestamp', 'dd/MMM/yyyy:HH:mm:ss Z' ]
        }
        mutate {
            replace => [ 'type', 'apache_access' ]
             convert => [ 'bytes', 'integer' ]
             convert => [ 'response', 'integer' ]
        }
    }

    if [program] == 'apache_error' {
        grok {
            match => [ 'message', '\[(?<timestamp>%{DAY:day} %{MONTH:month} %{MONTHDAY} %{TIME} %{YEAR})\] \[%{WORD:class}\] \[%{WORD:originator} %{IP:clientip}\] %{GREEDYDATA:errmsg}']
        }
        mutate {
            replace => [ 'type', 'apache_error' ]
        }
    }
}

#
# Local filters
#


#
# Logstash Configuration File
# Dynamically created by Nagios Log Server
#
# DO NOT EDIT THIS FILE. IT WILL BE OVERWRITTEN.
#
# Created Thu, 16 Jul 2015 14:00:14 -0500
#

#
# Required output for Nagios Log Server
#

output {
    elasticsearch {
        cluster => '4f703585-84ab-40e0-9ff9-f72c904bdc38'
        host => 'localhost'
        document_type => '%{type}'
        node_name => 'ea9ddcd0-c0a5-4d5d-a802-e741d9c51a5b'
        protocol => 'transport'
        workers => 4
    }
}

#
# Global outputs
#



#
# Local outputs
#

Here's that for now. Will make the change and report back after I know what it's doing.

jolson · Post by **jolson** » Mon Jul 20, 2015 11:01 am

Sounds good, thanks! I don't see any known problematic plugins in your configuration, let us know if you have further troubles.

polarbear1 · Post by **polarbear1** » Tue Jul 21, 2015 9:30 am

About 24 hours in at this point with the increased logstash heap to 1024m without crashes. Think we can call this one fixed.

Thanks.

jolson · Post by **jolson** » Tue Jul 21, 2015 9:36 am

Great - I'll lock it up. Let me know if you need this thread re-opened at any point.

Nagios Support Forum

Logstash Crashing after 2015R2.0 update

Logstash Crashing after 2015R2.0 update

Re: Logstash Crashing after 2015R2.0 update

Re: Logstash Crashing after 2015R2.0 update

Re: Logstash Crashing after 2015R2.0 update

Re: Logstash Crashing after 2015R2.0 update

Re: Logstash Crashing after 2015R2.0 update