logstash failed action with response of 400, dropping action

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Locked
burkm
Posts: 31
Joined: Thu Jan 21, 2016 5:10 pm

logstash failed action with response of 400, dropping action

Post by burkm »

Hello,

The logstash.log file is filling rapidly with very long messages beginning with "failed action with response of 400, dropping action". Literally 99.9% of the messages are like this:

Code: Select all

{:timestamp=>"2017-03-06T15:12:23.292000-0700", :message=>"failed action with response of 400, dropping action: [\"index\", {:_id=>nil, :_index=>\"logstash-2017.03.06\", :_type=>\"syslog\", :_routing=>nil}, #<LogStash::Event:0x49bd9288 @metadata_accessors=#<LogStash::Util::Accessors:0x77e9b08c @store={\"retry_count\"=>0}, @lut={}>, @cancelled=false, @data={\"message\"=>\"blk_cloned_rq_check_limits: over max size limit.\\n\", \"@version\"=>\"1\", \"@timestamp\"=>\"2017-03-06T22:12:22.000Z\", \"type\"=>\"syslog\", \"host\"=>\"10.30.216.7\", \"priority\"=>3, \"timestamp\"=>\"Mar  6 15:12:22\", \"logsource\"=>\"nitrogen\", \"program\"=>\"kernel\", \"severity\"=>3, \"facility\"=>0, \"facility_label\"=>\"kernel\", \"severity_label\"=>\"Error\"}, @metadata={\"retry_count\"=>0}, @accessors=#<LogStash::Util::Accessors:0x12f372ac @store={\"message\"=>\"blk_cloned_rq_check_limits: over max size limit.\\n\", \"@version\"=>\"1\", \"@timestamp\"=>\"2017-03-06T22:12:22.000Z\", \"type\"=>\"syslog\", \"host\"=>\"10.30.216.7\", \"priority\"=>3, \"timestamp\"=>\"Mar  6 15:12:22\", \"logsource\"=>\"nitrogen\", \"program\"=>\"kernel\", \"severity\"=>3, \"facility\"=>0, \"facility_label\"=>\"kernel\", \"severity_label\"=>\"Error\"}, @lut={\"type\"=>[{\"message\"=>\"blk_cloned_rq_check_limits: over max size limit.\\n\", \"@version\"=>\"1\", \"@timestamp\"=>\"2017-03-06T22:12:22.000Z\", \"type\"=>\"syslog\", \"host\"=>\"10.30.216.7\", \"priority\"=>3, \"timestamp\"=>\"Mar  6 15:12:22\", \"logsource\"=>\"nitrogen\", \"program\"=>\"kernel\", \"severity\"=>3, \"facility\"=>0, \"facility_label\"=>\"kernel\", \"severity_label\"=>\"Error\"}, \"type\"], \"host\"=>[{\"message\"=>\"blk_cloned_rq_check_limits: over max size limit.\\n\", \"@version\"=>\"1\", \"@timestamp\"=>\"2017-03-06T22:12:22.000Z\", \"type\"=>\"syslog\", \"host\"=>\"10.30.216.7\", \"priority\"=>3, \"timestamp\"=>\"Mar  6 15:12:22\", \"logsource\"=>\"nitrogen\", \"program\"=>\"kernel\", \"severity\"=>3, \"facility\"=>0, \"facility_label\"=>\"kernel\", \"severity_label\"=>\"Error\"}, \"host\"], \"message\"=>[{\"message\"=>\"blk_cloned_rq_check_limits: over max size limit.\\n\", \"@version\"=>\"1\", \"@timestamp\"=>\"2017-03-06T22:12:22.000Z\", \"type\"=>\"syslog\", \"host\"=>\"10.30.216.7\", \"priority\"=>3, \"timestamp\"=>\"Mar  6 15:12:22\", \"logsource\"=>\"nitrogen\", \"program\"=>\"kernel\", \"severity\"=>3, \"facility\"=>0, \"facility_label\"=>\"kernel\", \"severity_label\"=>\"Error\"}, \"message\"], \"priority\"=>[{\"message\"=>\"blk_cloned_rq_check_limits: over max size limit.\\n\", \"@version\"=>\"1\", \"@timestamp\"=>\"2017-03-06T22:12:22.000Z\", \"type\"=>\"syslog\", \"host\"=>\"10.30.216.7\", \"priority\"=>3, \"timestamp\"=>\"Mar  6 15:12:22\", \"logsource\"=>\"nitrogen\", \"program\"=>\"kernel\", \"severity\"=>3, \"facility\"=>0, \"facility_label\"=>\"kernel\", \"severity_label\"=>\"Error\"}, \"priority\"], \"timestamp\"=>[{\"message\"=>\"blk_cloned_rq_check_limits: over max size limit.\\n\", \"@version\"=>\"1\", \"@timestamp\"=>\"2017-03-06T22:12:22.000Z\", \"type\"=>\"syslog\", \"host\"=>\"10.30.216.7\", \"priority\"=>3, \"timestamp\"=>\"Mar  6 15:12:22\", \"logsource\"=>\"nitrogen\", \"program\"=>\"kernel\", \"severity\"=>3, \"facility\"=>0, \"facility_label\"=>\"kernel\", \"severity_label\"=>\"Error\"}, \"timestamp\"], \"logsource\"=>[{\"message\"=>\"blk_cloned_rq_check_limits: over max size limit.\\n\", \"@version\"=>\"1\", \"@timestamp\"=>\"2017-03-06T22:12:22.000Z\", \"type\"=>\"syslog\", \"host\"=>\"10.30.216.7\", \"priority\"=>3, \"timestamp\"=>\"Mar  6 15:12:22\", \"logsource\"=>\"nitrogen\", \"program\"=>\"kernel\", \"severity\"=>3, \"facility\"=>0, \"facility_label\"=>\"kernel\", \"severity_label\"=>\"Error\"}, \"logsource\"], \"program\"=>[{\"message\"=>\"blk_cloned_rq_check_limits: over max size limit.\\n\", \"@version\"=>\"1\", \"@timestamp\"=>\"2017-03-06T22:12:22.000Z\", \"type\"=>\"syslog\", \"host\"=>\"10.30.216.7\", \"priority\"=>3, \"timestamp\"=>\"Mar  6 15:12:22\", \"logsource\"=>\"nitrogen\", \"program\"=>\"kernel\", \"severity\"=>3, \"facility\"=>0, \"facility_label\"=>\"kernel\", \"severity_label\"=>\"Error\"}, \"program\"], \"tags\"=>[{\"message\"=>\"blk_cloned_rq_check_limits: over max size limit.\\n\", \"@version\"=>\"1\", \"@timestamp\"=>\"2017-03-06T22:12:22.000Z\", \"type\"=>\"syslog\", \"host\"=>\"10.30.216.7\", \"priority\"=>3, \"timestamp\"=>\"Mar  6 15:12:22\", \"logsource\"=>\"nitrogen\", \"program\"=>\"kernel\", \"severity\"=>3, \"facility\"=>0, \"facility_label\"=>\"kernel\", \"severity_label\"=>\"Error\"}, \"tags\"], \"severity\"=>[{\"message\"=>\"blk_cloned_rq_check_limits: over max size limit.\\n\", \"@version\"=>\"1\", \"@timestamp\"=>\"2017-03-06T22:12:22.000Z\", \"type\"=>\"syslog\", \"host\"=>\"10.30.216.7\", \"priority\"=>3, \"timestamp\"=>\"Mar  6 15:12:22\", \"logsource\"=>\"nitrogen\", \"program\"=>\"kernel\", \"severity\"=>3, \"facility\"=>0, \"facility_label\"=>\"kernel\", \"severity_label\"=>\"Error\"}, \"severity\"], \"facility\"=>[{\"message\"=>\"blk_cloned_rq_check_limits: over max size limit.\\n\", \"@version\"=>\"1\", \"@timestamp\"=>\"2017-03-06T22:12:22.000Z\", \"type\"=>\"syslog\", \"host\"=>\"10.30.216.7\", \"priority\"=>3, \"timestamp\"=>\"Mar  6 15:12:22\", \"logsource\"=>\"nitrogen\", \"program\"=>\"kernel\", \"severity\"=>3, \"facility\"=>0, \"facility_label\"=>\"kernel\", \"severity_label\"=>\"Error\"}, \"facility\"], \"timestamp8601\"=>[{\"message\"=>\"blk_cloned_rq_check_limits: over max size limit.\\n\", \"@version\"=>\"1\", \"@timestamp\"=>\"2017-03-06T22:12:22.000Z\", \"type\"=>\"syslog\", \"host\"=>\"10.30.216.7\", \"priority\"=>3, \"timestamp\"=>\"Mar  6 15:12:22\", \"logsource\"=>\"nitrogen\", \"program\"=>\"kernel\", \"severity\"=>3, \"facility\"=>0, \"facility_label\"=>\"kernel\", \"severity_label\"=>\"Error\"}, \"timestamp8601\"], \"@timestamp\"=>[{\"message\"=>\"blk_cloned_rq_check_limits: over max size limit.\\n\", \"@version\"=>\"1\", \"@timestamp\"=>\"2017-03-06T22:12:22.000Z\", \"type\"=>\"syslog\", \"host\"=>\"10.30.216.7\", \"priority\"=>3, \"timestamp\"=>\"Mar  6 15:12:22\", \"logsource\"=>\"nitrogen\", \"program\"=>\"kernel\", \"severity\"=>3, \"facility\"=>0, \"facility_label\"=>\"kernel\", \"severity_label\"=>\"Error\"}, \"@timestamp\"], \"facility_label\"=>[{\"message\"=>\"blk_cloned_rq_check_limits: over max size limit.\\n\", \"@version\"=>\"1\", \"@timestamp\"=>\"2017-03-06T22:12:22.000Z\", \"type\"=>\"syslog\", \"host\"=>\"10.30.216.7\", \"priority\"=>3, \"timestamp\"=>\"Mar  6 15:12:22\", \"logsource\"=>\"nitrogen\", \"program\"=>\"kernel\", \"severity\"=>3, \"facility\"=>0, \"facility_label\"=>\"kernel\", \"severity_label\"=>\"Error\"}, \"facility_label\"], \"severity_label\"=>[{\"message\"=>\"blk_cloned_rq_check_limits: over max size limit.\\n\", \"@version\"=>\"1\", \"@timestamp\"=>\"2017-03-06T22:12:22.000Z\", \"type\"=>\"syslog\", \"host\"=>\"10.30.216.7\", \"priority\"=>3, \"timestamp\"=>\"Mar  6 15:12:22\", \"logsource\"=>\"nitrogen\", \"program\"=>\"kernel\", \"severity\"=>3, \"facility\"=>0, \"facility_label\"=>\"kernel\", \"severity_label\"=>\"Error\"}, \"severity_label\"], \"[program]\"=>[{\"message\"=>\"blk_cloned_rq_check_limits: over max size limit.\\n\", \"@version\"=>\"1\", \"@timestamp\"=>\"2017-03-06T22:12:22.000Z\", \"type\"=>\"syslog\", \"host\"=>\"10.30.216.7\", \"priority\"=>3, \"timestamp\"=>\"Mar  6 15:12:22\", \"logsource\"=>\"nitrogen\", \"program\"=>\"kernel\", \"severity\"=>3, \"facility\"=>0, \"facility_label\"=>\"kernel\", \"severity_label\"=>\"Error\"}, \"program\"]}>>]", :level=>:warn}
I see this question has been asked before, so I tried the suggestion of changing the port in /etc/rsyslog.d/nagioslogserver.conf to this:

Code: Select all

*.* @@localhost:2057
but that didn't seem to help.
This is not just a nuisance message - it's actually filling logstash.log up at a rate of >1G/hr (sometimes much faster) on the primary node.
The secondary node has no such messages in the log file.

Some things about this system:
2 node cluster, CentOS 7.3, NLS 1.4.4

Load average is light:

Code: Select all

08:50:01 AM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15   blocked
   ...
Average:            8       695      0.36      0.36      0.36         0
$ cat /etc/hosts

Code: Select all

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

$ head -n100 /usr/local/nagioslogserver/logstash/etc/conf.d/*

Code: Select all

==> /usr/local/nagioslogserver/logstash/etc/conf.d/000_inputs.conf <==
#
# Logstash Configuration File
# Dynamically created by Nagios Log Server
#
# DO NOT EDIT THIS FILE. IT WILL BE OVERWRITTEN.
#
# Created Wed, 22 Jun 2016 17:27:47 -0600
#

#
# Global inputs
#

input {
    syslog {
        type => 'syslog'
        port => 5544
    }
    tcp {
        type => 'eventlog'
        port => 3515
        codec => json {
            charset => 'CP1252'
        }
    }
    tcp {
        type => 'import_raw'
        tags => 'import_raw'
        port => 2056
    }
    tcp {
        type => 'import_json'
        tags => 'import_json'
        port => 2057
        codec => json
    }
    syslog {
        type => 'syslog'
        port => 1514
    }
    http {
        port => 8544
        codec => 'json'
    }
}

#
# Local inputs
#



==> /usr/local/nagioslogserver/logstash/etc/conf.d/500_filters.conf <==
#
# Logstash Configuration File
# Dynamically created by Nagios Log Server
#
# DO NOT EDIT THIS FILE. IT WILL BE OVERWRITTEN.
#
# Created Wed, 22 Jun 2016 17:27:47 -0600
#

#
# Global filters
#

filter {
    if [program] == 'apache_access' {
        grok {
            match => [ 'message', '%{COMBINEDAPACHELOG}']
        }
        date {
            match => [ 'timestamp', 'dd/MMM/yyyy:HH:mm:ss Z' ]
        }
        mutate {
            replace => [ 'type', 'apache_access' ]
             convert => [ 'bytes', 'integer' ]
             convert => [ 'response', 'integer' ]
        }
    }

    if [program] == 'apache_error' {
        grok {
            match => [ 'message', '\[(?<timestamp>%{DAY:day} %{MONTH:month} %{MONTHDAY} %{TIME} %{YEAR})\] \[%{WORD:class}\] \[%{WORD:originator} %{IP:clientip}\] %{GREEDYDATA:errmsg}']
        }
        mutate {
            replace => [ 'type', 'apache_error' ]
        }
    }
    if [program] == 'nginx_access' {
        grok {
            match => [ 'message', '%{COMBINEDAPACHELOG}']
        }
        date {
            match => [ 'timestamp', 'dd/MMM/yyyy:HH:mm:ss Z' ]
        }
        mutate {
            replace => [ 'type', 'nginx_access' ]
             convert => [ 'bytes', 'integer' ]
             convert => [ 'response', 'integer' ]
        }
    }

    if [program] == 'nginx_error' {
        grok {
            match => [ 'message', '\[(?<timestamp>%{DAY:day} %{MONTH:month} %{MONTHDAY} %{TIME} %{YEAR})\] \[%{WORD:class}\] \[%{WORD:originator} %{IP:clientip}\] %{GREEDYDATA:errmsg}']
        }
        mutate {
            replace => [ 'type', 'nginx_error' ]
        }
    }
}

#
# Local filters
#



==> /usr/local/nagioslogserver/logstash/etc/conf.d/999_outputs.conf <==
#
# Logstash Configuration File
# Dynamically created by Nagios Log Server
#
# DO NOT EDIT THIS FILE. IT WILL BE OVERWRITTEN.
#
# Created Wed, 22 Jun 2016 17:27:47 -0600
#

#
# Required output for Nagios Log Server
#

output {
    elasticsearch {
        cluster => '76900ee2-f769-413c-9948-850204a96b32'
        host => 'localhost'
        document_type => '%{type}'
        node_name => '707a55e2-e8cb-4073-a800-2fbc5bd4c6df'
        protocol => 'transport'
        workers => 4
    }
}

#
# Global outputs
#



#
# Local outputs
#

Not sure if this is relevant, but in a 15-min snapshot, 11841 messages came from 127.0.0.1, with only 379 from the next highest host.

This problem started just a a few days ago, and I am not aware of any recent changes.
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: logstash failed action with response of 400, dropping ac

Post by mcapra »

Can we see the contents of your Elasticsearch logs? They should be located in /var/log/elasticsearch.
Former Nagios employee
https://www.mcapra.com/
burkm
Posts: 31
Joined: Thu Jan 21, 2016 5:10 pm

Re: logstash failed action with response of 400, dropping ac

Post by burkm »

Here's a sample from the main log; the two slowlogs are empty.
Although the first part of the message varies, they all end with the MapperParsingException:

Code: Select all

[2017-03-06 16:11:07,778][DEBUG][action.bulk              ] [707a55e2-e8cb-4073-a800-2fbc5bd4c6df] [logstash-2017.03.06][2] failed to execute bulk item (index) index {[logstash-2017.03.06][syslog][AVql5AIA8OQvuOv9Cyt_], source[{"message":"Remote 10.30.216.22 accepted a Version 3 Packet\n","@version":"1","@timestamp":"2017-03-06T23:11:07.000Z","type":"syslog","host":"10.30.217.55","priority":13,"timestamp":"Mar  6 16:11:07","logsource":"cesium","program":"check_nrpe","severity":5,"facility":1,"facility_label":"user-level","severity_label":"Notice"}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [timestamp]
        at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:411)
        at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:706)
        at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:497)
        at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:544)
        at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:493)
        at org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:466)
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:418)
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:148)
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.performOnPrimary(TransportShardReplicationOperationAction.java:574)
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1.doRun(TransportShardReplicationOperationAction.java:440)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.mapper.MapperParsingException: failed to parse date field [Mar  6 16:11:07], tried both date format [dateOptionalTime], and timestamp number with locale []
        at org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:617)
        at org.elasticsearch.index.mapper.core.DateFieldMapper.innerParseCreateField(DateFieldMapper.java:535)
        at org.elasticsearch.index.mapper.core.NumberFieldMapper.parseCreateField(NumberFieldMapper.java:239)
        at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:401)
        ... 13 more
Caused by: java.lang.IllegalArgumentException: Invalid format: "Mar  6 16:11:07"
        at org.elasticsearch.common.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187)
        at org.elasticsearch.common.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:780)
        at org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:612)
        ... 16 more
The messages are accumulating at a rate of about 1/s.
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: logstash failed action with response of 400, dropping ac

Post by mcapra »

This is a known issue with the logstash-input-syslog plugin. I would suggest pointing that host (10.30.217.55) to a generic tcp/udp input for the time being (NLS uses 2056 by default). If you need help writing a filter to capture the fields, if you can provide some sample messages being collected I can assist with that.
Former Nagios employee
https://www.mcapra.com/
burkm
Posts: 31
Joined: Thu Jan 21, 2016 5:10 pm

Re: logstash failed action with response of 400, dropping ac

Post by burkm »

That host is just one of many generating these Java errors. Do you suggest that I change them all from port 5544 to port 2056?
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: logstash failed action with response of 400, dropping ac

Post by mcapra »

Yes. You'll probably need to write a custom filter rule to properly handle the date formatting presented by these logs. We can assist with that if needed. I just want to be sure that we can at least get the logs going to the right place first.
Former Nagios employee
https://www.mcapra.com/
Locked