Logstash errors and crashes
Posted: Tue Feb 09, 2016 12:50 pm
I have been noticing that logstash on one of our nodes is constantly crashing, I captured the logs from yesterday and found a few issues.....
This node handles syslogs and file input of apache log files.
Seeing a lot of these...
Then I am seeing these warnings...
I then see a lot of permission denied errors...
Then looks like eventually the listener dies...
Then the last entry was error looking up GeoIP Data
Now some of these access and error files don't have much entries in them and only has the header....
format=%Ses->client.ip% - %Req->vars.auth-user% [%SYSDATE%] "%Req->reqpb.clf-request%" %Req->srvhdrs.clf-status% %Req->srvhdrs.content-length%
snapshot of the permissions...
At this point I am suspecting the constant read errors is causing logstash to fall over. But just not sure why it would all of a sudden after a while....
This node handles syslogs and file input of apache log files.
Seeing a lot of these...
Code: Select all
{:timestamp=>"2016-02-08T13:31:47.687000-0800", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2016-02-08T13:31:47.687000-0800", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2016-02-08T13:31:47.687000-0800", :message=>"retrying failed action with response code: 503", :level=>:warn}Code: Select all
{:timestamp=>"2016-02-08T17:21:07.070000-0800", :message=>"retrying failed action with response code: 429", :level=>:warn}
{:timestamp=>"2016-02-08T17:21:07.070000-0800", :message=>"retrying failed action with response code: 429", :level=>:warn}
{:timestamp=>"2016-02-08T17:21:07.071000-0800", :message=>"retrying failed action with response code: 429", :level=>:warn}
{:timestamp=>"2016-02-08T17:21:07.071000-0800", :message=>"retrying failed action with response code: 429", :level=>:warn}Code: Select all
{:timestamp=>"2016-02-08T23:55:26.035000-0800", :message=>"failed to open /nfs/shared/somelogs:
Permission denied - /nfs/shared/somelogs", :level=>:warn}
Code: Select all
{:timestamp=>"2016-02-08T23:55:26.509000-0800", :message=>"syslog listener died", :protocol=>:tcp, :address=>"0.0.0.0:5544", :exception=>#<SocketError: probl
em when accepting>, :backtrace=>["org/jruby/ext/socket/RubyTCPServer.java:174:in `accept'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems
/logstash-input-syslog-0.1.6/lib/logstash/inputs/syslog.rb:155:in `tcp_listener'", "org/jruby/RubyKernel.java:1511:in `loop'", "/usr/local/nagioslogserver/lo
gstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-0.1.6/lib/logstash/inputs/syslog.rb:154:in `tcp_listener'", "/usr/local/nagioslogserver/logstash/ve
ndor/bundle/jruby/1.9/gems/logstash-input-syslog-0.1.6/lib/logstash/inputs/syslog.rb:117:in `server'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jru
by/1.9/gems/logstash-input-syslog-0.1.6/lib/logstash/inputs/syslog.rb:101:in `run'"], :level=>:warn}
{:timestamp=>"2016-02-08T23:55:26.791000-0800", :message=>"A plugin had an unrecoverable error. Will restart this plugin.\n Plugin: <LogStash::Inputs::File
path=>[\"<PATHS>"], type=>\"iplanet\", tags=>[\"ipla
net-DMZ\"], start_position=>\"end\", delimiter=>\"\\n\">\n Error: Bad file descriptor - Bad file descriptor", :level=>:error}
{:timestamp=>"2016-02-08T23:55:26.824000-0800", :message=>"UDP listener died", :exception=>#<SocketError: recvfrom: name or service not known>, :backtrace=>[
"/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-udp-0.1.4/lib/logstash/inputs/udp.rb:79:in `udp_listener'", "/usr/local/nagi
oslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-udp-0.1.4/lib/logstash/inputs/udp.rb:49:in `run'", "/usr/local/nagioslogserver/logstash/vend
or/bundle/jruby/1.9/gems/logstash-core-1.5.1-java/lib/logstash/pipeline.rb:176:in `inputworker'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.
9/gems/logstash-core-1.5.1-java/lib/logstash/pipeline.rb:170:in `start_input'"], :level=>:warn}
Code: Select all
{:timestamp=>"2016-02-08T23:55:26.878000-0800", :message=>"Unknown error while looking up GeoIP data", :exception=>#<Errno::EBADF: Bad file descriptor - Bad
file descriptor>, :field=>nil, :event=>#<LogStash::Event:0x22437c03 @metadata={}, @accessors=#<LogStash::Util::Accessors:0x3ebf475c Now some of these access and error files don't have much entries in them and only has the header....
format=%Ses->client.ip% - %Req->vars.auth-user% [%SYSDATE%] "%Req->reqpb.clf-request%" %Req->srvhdrs.clf-status% %Req->srvhdrs.content-length%
snapshot of the permissions...
Code: Select all
-rw-r--r-- 1 30002 300 143 Feb 1 23:55 access.201602022355
-rw-r--r-- 1 30002 300 143 Feb 2 23:55 access.201602032355
-rw-r--r-- 1 30002 300 143 Feb 3 23:55 access.201602042355
-rw-r--r-- 1 30002 300 143 Feb 4 23:55 access.201602052355
-rw-r--r-- 1 30002 300 143 Feb 5 23:55 access.201602062355
-rw-r--r-- 1 30002 300 143 Feb 6 23:55 access.201602072355
-rw-r--r-- 1 30002 300 143 Feb 7 23:55 access.201602082355