Page 1 of 1

Logstash stopping abruptly

Posted: Wed Aug 02, 2017 12:13 pm
by uma K
Hi, My logstash is abruptly stopping often with below error. Even though all nodes are up and running.

{:timestamp=>"2017-08-02T09:21:57.769000-0700", :message=>"Got error to send bulk of actions: None of the configured nodes are available: []", :level=>:error}
{:timestamp=>"2017-08-02T09:21:57.769000-0700", :message=>"Failed to flush outgoing items", :outgoing_count=>337, :exception=>org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available: [], :backtrace=>["org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(org/elasticsearch/client/transport/TransportClientNodesService.java:279)", "org.elasticsearch.client.transport.TransportClientNodesService.execute(org/elasticsearch/client/transport/TransportClientNodesService.java:198)",

Below is the result for top-bcn1

top - 10:07:32 up 142 days, 11 min, 2 users, load average: 1.12, 0.59, 0.45
Tasks: 121 total, 1 running, 120 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.0%us, 0.6%sy, 1.5%ni, 94.5%id, 0.4%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8193024k total, 8067724k used, 125300k free, 18320k buffers
Swap: 262136k total, 9656k used, 252480k free, 2327556k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14362 nagios 39 19 2294m 704m 14m S 168.8 8.8 2:19.31 /usr/bin/java -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Xmx500m -Xss2048k -Djffi.boot.library.path=/app/nagioslogserver/
22296 apache 20 0 335m 12m 3176 S 4.0 0.2 0:04.68 /usr/sbin/httpd
22321 apache 20 0 334m 11m 3120 S 4.0 0.1 0:04.56 /usr/sbin/httpd
21966 nagios 20 0 19.4g 5.0g 709m S 2.0 64.4 73:19.55 /usr/bin/java -Xms4000m -Xmx4000m -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX
1 root 20 0 19232 1252 1056 S 0.0 0.0 0:00.88 /sbin/init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kthreadd]

Re: Logstash stopping abruptly

Posted: Wed Aug 02, 2017 1:47 pm
by mcapra
It would appear as though Logstash is unable to talk to Elasticsearch for some period of time.

Can you share your Elasticsearch logs? They can typically be found here:

Code: Select all

/var/log/elasticsearch/*.log

Re: Logstash stopping abruptly

Posted: Wed Aug 02, 2017 4:16 pm
by uma K

Code: Select all

[2017-08-02 13:29:52,667][DEBUG][action.search.type       ] [abd0aca5-8cbf-4f11-988e-be0d778f5f95] All shards failed for phase: [query]
org.elasticsearch.transport.RemoteTransportException: [a40fda6a-2269-44c8-9c95-77eaf5a865dd][inet[/136.133.238.46:9300]][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.search.SearchParseException: [logstash-2017.08.01][3]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"facets":{"0":{"date_histogram":{"field":"@timestamp","interval":"10m"},"global":true,"facet_f
ilter":{"fquery":{"query":{"filtered":{"query":{"query_string":{"query":"PartsView\/PartsView"}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1501619394749,"to":1501705794749}}}],"must_not":[{"terms":{"host.raw":["136.133.230
.76"]}},{"terms":{"host.raw":["136.133.230.204"]}},{"terms":{"host.raw":["136.133.230.207"]}},{"terms":{"host.raw":["136.133.230.180"]}},{"terms":{"host.raw":["136.133.231.113"]}},{"terms":{"host.raw":["136.133.230.205"]}},{"terms":{"hos
t.raw":["136.133.230.182"]}},{"terms":{"host.raw":["136.133.231.111"]}},{"terms":{"host.raw":["0:0:0:0:0:0:0:1"]}},{"terms":{"host.raw":["136.133.230.134"]}},{"terms":{"host.raw":["136.133.230.200"]}},{"terms":{"host.raw":["136.133.236.1
47"]}},{"terms":{"host.raw":["136.133.236.147"]}},{"terms":{"host.raw":["136.133.236.147"]}},{"terms":{"host.raw":["136.133.230.221"]}},{"terms":{"host.raw":["136.133.231.211"]}},{"terms":{"host.raw":["136.133.231.213"]}},{"terms":{"host
.raw":["136.133.230.192"]}},{"terms":{"host.raw":["136.133.230.235"]}},{"terms":{"host.raw":["136.133.230.238"]}},{"terms":{"host.raw":["136.133.236.203"]}},{"terms":{"host.raw":["136.133.230.239"]}},{"terms":{"host.raw":["136.133.230.24
6"]}},{"terms":{"host":["136.133.230.139"]}},{"terms":{"host":["136.133.230.139"]}},{"terms":{"host.raw":["136.133.230.143"]}},{"terms":{"host.raw":["136.133.230.143"]}},{"terms":{"host.raw":["136.133.230.223"]}},{"terms":{"host.raw":["1
36.133.230.159"]}},{"terms":{"host.raw":["136.133.230.159"]}},{"terms":{"host.raw":["136.133.230.154"]}},{"terms":{"host.raw":["136.133.230.191"]}},{"terms":{"host.raw":["136.133.231.59"]}},{"terms":{"host.raw":["136.133.231.93"]}},{"ter
ms":{"host.raw":["136.133.236.166"]}},{"terms":{"host.raw":["136.133.236.62"]}},{"terms":{"host.raw":["136.133.236.204"]}},{"terms":{"host.raw":["136.133.236.149"]}},{"terms":{"host.raw":["136.133.231.239"]}},{"terms":{"host.raw":["136.1
33.230.194"]}},{"terms":{"host.raw":["136.133.231.5"]}},{"terms":{"host.raw":["136.133.230.4"]}},{"terms":{"host.raw":["136.133.230.6"]}},{"terms":{"host.raw":["136.133.230.6"]}},{"terms":{"host.raw":["136.133.230.5"]}},{"terms":{"host.r
aw":["136.133.230.5"]}},{"terms":{"host.raw":["136.133.175.247"]}},{"terms":{"host.raw":["136.133.175.248"]}},{"terms":{"host.raw":["136.133.175.248"]}},{"terms":{"host.raw":["136.133.131.249"]}},{"terms":{"host.raw":["136.133.131.249"]}
},{"terms":{"host.raw":["136.133.131.249"]}},{"terms":{"host.raw":["136.133.24.249"]}},{"terms":{"host.raw":["136.133.24.249"]}},{"terms":{"host.raw":["136.133.171.8"]}},{"terms":{"host.raw":["136.133.171.8"]}},{"terms":{"host.raw":["136
.133.160.249"]}},{"terms":{"host.raw":["136.133.151.249"]}},{"terms":{"host.raw":["136.133.133.31"]}},{"terms":{"host.raw":["136.133.117.249"]}},{"terms":{"host.raw":["136.133.141.249"]}}]}}}}}}}},"size":0}]]
        at org.elasticsearch.search.SearchService.parseSource(SearchService.java:735)
        at org.elasticsearch.search.SearchService.createContext(SearchService.java:560)
        at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:532)
        at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:294)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:776)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:767)
        at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.query.QueryParsingException: [logstash-2017.08.01] Failed to parse query [PartsView/PartsView]
        at org.elasticsearch.index.query.QueryStringQueryParser.parse(QueryStringQueryParser.java:250)
        at org.elasticsearch.index.query.QueryParseContext.parseInnerQuery(QueryParseContext.java:302)
        at org.elasticsearch.index.query.FilteredQueryParser.parse(FilteredQueryParser.java:71)
        at org.elasticsearch.index.query.QueryParseContext.parseInnerQuery(QueryParseContext.java:302)
        at org.elasticsearch.index.query.FQueryFilterParser.parse(FQueryFilterParser.java:66)
        at org.elasticsearch.index.query.QueryParseContext.executeFilterParser(QueryParseContext.java:368)
        at org.elasticsearch.index.query.QueryParseContext.parseInnerFilter(QueryParseContext.java:349)
        at org.elasticsearch.index.query.IndexQueryParserService.parseInnerFilter(IndexQueryParserService.java:295)
        at org.elasticsearch.search.facet.FacetParseElement.parse(FacetParseElement.java:86)
        at org.elasticsearch.search.SearchService.parseSource(SearchService.java:719)
        ... 10 more
Caused by: org.apache.lucene.queryparser.classic.ParseException: Cannot parse 'PartsView/PartsView': Lexical error at line 1, column 20.  Encountered: <EOF> after : "/PartsView"
        at org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:137)
        at org.apache.lucene.queryparser.classic.MapperQueryParser.parse(MapperQueryParser.java:891)
        at org.elasticsearch.index.query.QueryStringQueryParser.parse(QueryStringQueryParser.java:233)
        ... 19 more
Caused by: org.apache.lucene.queryparser.classic.TokenMgrError: Lexical error at line 1, column 20.  Encountered: <EOF> after : "/PartsView"
        at org.apache.lucene.queryparser.classic.QueryParserTokenManager.getNextToken(QueryParserTokenManager.java:1133)
        at org.apache.lucene.queryparser.classic.QueryParser.jj_scan_token(QueryParser.java:601)
        at org.apache.lucene.queryparser.classic.QueryParser.jj_3R_2(QueryParser.java:484)
        at org.apache.lucene.queryparser.classic.QueryParser.jj_3_1(QueryParser.java:491)
        at org.apache.lucene.queryparser.classic.QueryParser.jj_2_1(QueryParser.java:477)
        at org.apache.lucene.queryparser.classic.QueryParser.Clause(QueryParser.java:228)
        at org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:183)
        at org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:172)
        at org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:127)

Re: Logstash stopping abruptly

Posted: Wed Aug 02, 2017 4:19 pm
by uma K
Totally I have 4 nodes but however when I check the nodes with below command on each server, I get only 2 nodes displayed but all 4nodes have same cluster ID

curl -XGET localhost:9200/_nodes/jvm?pretty

Looks like all 4 nodes are not communicating each other only 2 nodes are in sync.

Re: Logstash stopping abruptly

Posted: Thu Aug 03, 2017 1:04 am
by tacolover101
ruh roh, it looks like you may have hit a split brain, which is where they both start running solo on their own. can you upload a NLS profile from your different machines?

something to help prevent this in the future so that neither lost party can elect a master, is perhaps run your cluster with an odd number, such as 3 or 5. then, set your min. master nodes to 2, or 3 - this will allow you to take down machines for maint. and upgrade as needed. elastic does a great job at explaining it here - https://www.elastic.co/guide/en/elastic ... r-election

Re: Logstash stopping abruptly

Posted: Thu Aug 03, 2017 9:42 am
by tmcdonald
Thanks for the assist, @tacolover101! OP, let us know if you need further assistance.

Re: Logstash stopping abruptly

Posted: Fri Aug 04, 2017 4:13 pm
by uma K
I increased watermark level to 90% and now my server is collecting logs.

I execute the below command and I see that 4nodes are displayed
curl -XGET localhost:9200/_nodes/jvm?pretty

Can you help me with the reason for this split brain appearance.

Re: Logstash stopping abruptly

Posted: Mon Aug 07, 2017 8:20 am
by mcapra
We would need the complete historical logs during the event from each node to be able to say for sure what caused it. Typically, it's either network issues or some instability on one or many nodes that causes them to crash.

Re: Logstash stopping abruptly

Posted: Mon Aug 07, 2017 10:24 am
by dwhitfield
Can you PM me the two profiles?

After you PM the profile, please update this thread. Updating this thread is the only way for it to show back up on our dashboard.