Logstash crashing repeatedly with MUTEX error

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
krobertson71
Posts: 444
Joined: Tue Feb 11, 2014 10:16 pm

Logstash crashing repeatedly with MUTEX error

Post by krobertson71 »

Here is the error we are seeing:

Code: Select all

ConcurrencyError: interrupted waiting for mutex: null
                       lock at org/jruby/ext/thread/Mutex.java:94
          execute_task_once at /usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/concurrent-ruby-0.8.0-java/lib/concurrent/delay.rb:83
                       wait at /usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/concurrent-ruby-0.8.0-java/lib/concurrent/delay.rb:60
                      value at /usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/concurrent-ruby-0.8.0-java/lib/concurrent/obligation.rb:47
           global_timer_set at /usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/concurrent-ruby-0.8.0-java/lib/concurrent/configuration.rb:58
  finalize_global_executors at /usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/concurrent-ruby-0.8.0-java/lib/concurrent/configuration.rb:137
                 Concurrent at /usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/concurrent-ruby-0.8.0-java/lib/concurrent/configuration.rb:165
Research on this issue turns up some ELK issues with something around /etc/resolve.conf or around the jruby resolv:host_class

This problem seems to be getting much worse.

NLS 1.4.0
Redhat EL6
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Logstash crashing repeatedly with MUTEX error

Post by mcapra »

The problem appears to be with the dns filter. The gist of it for posterity:
The problem I see is that we are using the Timeout library to abort lookups that take too long. This timeout interrupts the lookup thread and, I believe, results in this ConcurrencyError: interruptedwaitingformutex: null error.
So the issue is the dns filter is attempting to convert an IP address into an FQDN and timing out on the request. The specific error happens with how the filter is handling the timeout.

We might be able to make some recommendations to work around it though. Can you share the output of the following commands executed from the CLI of your Nagios Log Server machine:

Code: Select all

grep '' /usr/local/nagioslogserver/logstash/etc/conf.d/*
cat /etc/resolv.conf
cat /etc/hosts
Former Nagios employee
https://www.mcapra.com/
krobertson71
Posts: 444
Joined: Tue Feb 11, 2014 10:16 pm

Re: Logstash crashing repeatedly with MUTEX error

Post by krobertson71 »

cat /etc/hosts

Code: Select all

127.0.0.1			localhost.localdomain localhost
		nagilgp01.dcri.duke.net	nagilgp01.dcri.duke.net.dcri.duke.edu
		nagilgp01.dcri.duke.net.dcri.duke.edu	nagilgp01.dcri.duke.net
::1       localhost       loopback
cat /etc/resolv.d

Code: Select all

domain dcri.duke.edu
nameserver 152.16.48.220
nameserver 152.16.48.78
search dcri.duke.edu dcri.duke.net dhe.duke.edu
conf.d results attached.

I did not that conf.d did not list some of my local filters that I have created. Like I have drops for mountd and syslog debug messages. I also use the KV filter as well. Why would this not show up in conf.d?

conf.d.output.txt
You do not have the required permissions to view the files attached to this post.
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Logstash crashing repeatedly with MUTEX error

Post by mcapra »

Ah, run this command (and share the output) to capture all the local inputs/filters:

Code: Select all

curl -XGET 'http://localhost:9200/nagioslogserver/node/_search?size=2000'
krobertson71 wrote:Why would this not show up in conf.d?
Probably because the inputs are defined on a node different than the one the previous command was run on. The above one should grab everything in an albeit less pretty format.
Former Nagios employee
https://www.mcapra.com/
krobertson71
Posts: 444
Joined: Tue Feb 11, 2014 10:16 pm

Re: Logstash crashing repeatedly with MUTEX error

Post by krobertson71 »

Here you go:

Code: Select all

 curl -XGET 'http://localhost:9200/nagioslogserver/node/_search?size=2000'
{"took":1,"timed_out":false,"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":3,"max_score":1.0,"hits":[{"_index":"nagioslogserver","_type":"node","_id":"global","_score":1.0,"_source":{"config_inputs":[{"raw":"syslog {\r\n    type => 'syslog'\r\n    port => 5544\r\n}","name":"Syslog (Default)","active":"1"},{"raw":"tcp {\r\n    type => 'eventlog'\r\n    port => 3515\r\n    codec => json {\r\n        charset => 'CP1252'\r\n    }\r\n}","name":"Windows Event Log (Default)","active":"1"},{"raw":"tcp {\r\n    type => 'import_raw'\r\n    tags => 'import_raw'\r\n    port => 2056\r\n}","name":"Import Files - Raw (Default)","active":"1"},{"raw":"tcp {\r\n    type => 'import_json'\r\n    tags => 'import_json'\r\n    port => 2057\r\n    codec => json\r\n}","name":"Import Files - JSON (Default)","active":"1"},{"raw":"tcp {\r\n    type => 'import_raw'\r\n    tags => 'auditd'\r\n    port => 2999\r\n}","name":"AuditD ","active":"1"}],"config_filters":[{"raw":"if [program] == 'apache_access' {\r\n    grok {\r\n        match => [ 'message', '%{COMBINEDAPACHELOG}']\r\n    }\r\n    date {\r\n        match => [ 'timestamp', 'dd/MMM/yyyy:HH:mm:ss Z' ]\r\n    }\r\n    mutate {\r\n        replace => [ 'type', 'apache_access' ]\r\n         convert => [ 'bytes', 'integer' ]\r\n         convert => [ 'response', 'integer' ]\r\n    }\r\n}\r\n \r\nif [program] == 'apache_error' {\r\n    grok {\r\n        match => [ 'message', '\\[(?<timestamp>%{DAY:day} %{MONTH:month} %{MONTHDAY} %{TIME} %{YEAR})\\] \\[%{WORD:class}\\] \\[%{WORD:originator} %{IP:clientip}\\] %{GREEDYDATA:errmsg}']\r\n    }\r\n    mutate {\r\n        replace => [ 'type', 'apache_error' ]\r\n    }\r\n}","name":"Apache (Default)","active":"1"},{"raw":"if [severity] == 6 {\r\n    drop { }\r\n}","name":"Syslog Info Drop","active":"1"},{"raw":"if \"auditd\" in [tags] {\r\n    kv { }\r\n}","name":"Auditd Processer KV","active":"1"},{"raw":"if [program] == 'mountd' {\r\n    drop { }\r\n}","name":"mountd drop filter","active":"1"},{"raw":"if [type] == 'syslog' {\r\n          if [severity_label] == 'Debug' {\r\n    drop { }\r\n   }\r\n}","name":"syslog debug filter","active":"1"}],"config_outputs":[]}},{"_index":"nagioslogserver","_type":"node","_id":"b2733b10-233a-4593-9428-85145cd54c77","_score":1.0,"_source":{"last_updated":1486409328,"ls_version":"1.4.0","ls_release":140,"elasticsearch":{"status":"running","pid":"5001","message":"Search engine (elasticsearch) is running."},"logstash":{"status":"running","pid":"5529","message":"Log collector (logstash) is running."},"address":"10.0.103.180","hostname":"nagilgp01.dcri.duke.net"}},{"_index":"nagioslogserver","_type":"node","_id":"11fe29cc-9353-4cc1-a368-14a0b6977937","_score":1.0,"_source":{"last_updated":1486409329,"ls_version":"1.4.0","ls_release":140,"elasticsearch":{"status":"running","pid":"6308","message":"Search engine (elasticsearch) is running."},"logstash":{"status":"running","pid":"6420","message":"Log collector (logstash) is running."},"address":"10.136.132.107","hostname":"nagilgp02.dhe.duke.edu"}}]}}
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Logstash crashing repeatedly with MUTEX error

Post by mcapra »

Hmm, neither of these nodes appear to have any local input/filter rules defined and I can't find a DNS filter defined anywhere. How did you go about configuring the local inputs/filters? Do you know of any machine that's leveraging a DNS filter?
Former Nagios employee
https://www.mcapra.com/
krobertson71
Posts: 444
Joined: Tue Feb 11, 2014 10:16 pm

Re: Logstash crashing repeatedly with MUTEX error

Post by krobertson71 »

Here is the screenshot from our first node with the inputs. I am not sure what you mean by "They are not defined"?
Screen Shot 2017-02-06 at 4.24.19 PM.png
You do not have the required permissions to view the files attached to this post.
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Logstash crashing repeatedly with MUTEX error

Post by mcapra »

I meant to say that I don't see and specific local inputs/filters held in the database anywhere. The sort of stuff you define here:
2017_02_06_15_30_05_Admin_Dashboard_Nagios_Log_Server.png
I'm also not seeing any dns filters leveraged anywhere, so I think we can rule that out. Can you send over a more comprehensive set of Logstash logs? This command should do the trick:

Code: Select all

zip -r /tmp/logstash_42295.zip /var/log/logstash/
Then attach the logstash_42295.zip file to your post so that we may review it.
You do not have the required permissions to view the files attached to this post.
Former Nagios employee
https://www.mcapra.com/
krobertson71
Posts: 444
Joined: Tue Feb 11, 2014 10:16 pm

Re: Logstash crashing repeatedly with MUTEX error

Post by krobertson71 »

Here you go.

Also wanted to mention that if this MUTEX error is not the cause it would be nice to try and narrow down the issue. I would like to monitor for what the issue is to try and catch it earlier if possible to avoid a crash then notification.
logstash_42295.zip
You do not have the required permissions to view the files attached to this post.
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Logstash crashing repeatedly with MUTEX error

Post by mcapra »

Hmm, I don't actually see any MUTEX errors in this log. Perhaps this was happening on a different Nagios Log Server instance than the one you sent the Logstash logs for?

I think this particular message might be more relevant than the MUTEX errors:

Code: Select all

{:timestamp=>"2017-02-03T22:36:15.724000-0500", :message=>"Received an event that has a different character encoding than you configured.", :text=>"<7>Feb  3 22:36:15 ctmsssfp02 kernel: 27000000 424d53ff 0000e324 c00298c0 . . . ' \\xFF S M B $ \\xE3 . . \\xC0 . . \\xC0\\n", :expected_charset=>"UTF-8", :level=>:warn}
It appears as though one of your machines is sending messages that aren't UTF-8 encoded. Do you know what charset the ctmsssfp02 machine is using? We may need to give it a special input to account for the different charset.
Former Nagios employee
https://www.mcapra.com/
Locked