Logstash stopping abruptly

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Locked
uma K
Posts: 63
Joined: Tue Feb 14, 2017 12:41 pm

Logstash stopping abruptly

Post by uma K »

Hi, My logstash is abruptly stopping often with below error. Even though all nodes are up and running.

{:timestamp=>"2017-08-02T09:21:57.769000-0700", :message=>"Got error to send bulk of actions: None of the configured nodes are available: []", :level=>:error}
{:timestamp=>"2017-08-02T09:21:57.769000-0700", :message=>"Failed to flush outgoing items", :outgoing_count=>337, :exception=>org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available: [], :backtrace=>["org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(org/elasticsearch/client/transport/TransportClientNodesService.java:279)", "org.elasticsearch.client.transport.TransportClientNodesService.execute(org/elasticsearch/client/transport/TransportClientNodesService.java:198)",

Below is the result for top-bcn1

top - 10:07:32 up 142 days, 11 min, 2 users, load average: 1.12, 0.59, 0.45
Tasks: 121 total, 1 running, 120 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.0%us, 0.6%sy, 1.5%ni, 94.5%id, 0.4%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8193024k total, 8067724k used, 125300k free, 18320k buffers
Swap: 262136k total, 9656k used, 252480k free, 2327556k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14362 nagios 39 19 2294m 704m 14m S 168.8 8.8 2:19.31 /usr/bin/java -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Xmx500m -Xss2048k -Djffi.boot.library.path=/app/nagioslogserver/
22296 apache 20 0 335m 12m 3176 S 4.0 0.2 0:04.68 /usr/sbin/httpd
22321 apache 20 0 334m 11m 3120 S 4.0 0.1 0:04.56 /usr/sbin/httpd
21966 nagios 20 0 19.4g 5.0g 709m S 2.0 64.4 73:19.55 /usr/bin/java -Xms4000m -Xmx4000m -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX
1 root 20 0 19232 1252 1056 S 0.0 0.0 0:00.88 /sbin/init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kthreadd]
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Logstash stopping abruptly

Post by mcapra »

It would appear as though Logstash is unable to talk to Elasticsearch for some period of time.

Can you share your Elasticsearch logs? They can typically be found here:

Code: Select all

/var/log/elasticsearch/*.log
Former Nagios employee
https://www.mcapra.com/
uma K
Posts: 63
Joined: Tue Feb 14, 2017 12:41 pm

Re: Logstash stopping abruptly

Post by uma K »

Code: Select all

[2017-08-02 13:29:52,667][DEBUG][action.search.type       ] [abd0aca5-8cbf-4f11-988e-be0d778f5f95] All shards failed for phase: [query]
org.elasticsearch.transport.RemoteTransportException: [a40fda6a-2269-44c8-9c95-77eaf5a865dd][inet[/136.133.238.46:9300]][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.search.SearchParseException: [logstash-2017.08.01][3]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"facets":{"0":{"date_histogram":{"field":"@timestamp","interval":"10m"},"global":true,"facet_f
ilter":{"fquery":{"query":{"filtered":{"query":{"query_string":{"query":"PartsView\/PartsView"}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1501619394749,"to":1501705794749}}}],"must_not":[{"terms":{"host.raw":["136.133.230
.76"]}},{"terms":{"host.raw":["136.133.230.204"]}},{"terms":{"host.raw":["136.133.230.207"]}},{"terms":{"host.raw":["136.133.230.180"]}},{"terms":{"host.raw":["136.133.231.113"]}},{"terms":{"host.raw":["136.133.230.205"]}},{"terms":{"hos
t.raw":["136.133.230.182"]}},{"terms":{"host.raw":["136.133.231.111"]}},{"terms":{"host.raw":["0:0:0:0:0:0:0:1"]}},{"terms":{"host.raw":["136.133.230.134"]}},{"terms":{"host.raw":["136.133.230.200"]}},{"terms":{"host.raw":["136.133.236.1
47"]}},{"terms":{"host.raw":["136.133.236.147"]}},{"terms":{"host.raw":["136.133.236.147"]}},{"terms":{"host.raw":["136.133.230.221"]}},{"terms":{"host.raw":["136.133.231.211"]}},{"terms":{"host.raw":["136.133.231.213"]}},{"terms":{"host
.raw":["136.133.230.192"]}},{"terms":{"host.raw":["136.133.230.235"]}},{"terms":{"host.raw":["136.133.230.238"]}},{"terms":{"host.raw":["136.133.236.203"]}},{"terms":{"host.raw":["136.133.230.239"]}},{"terms":{"host.raw":["136.133.230.24
6"]}},{"terms":{"host":["136.133.230.139"]}},{"terms":{"host":["136.133.230.139"]}},{"terms":{"host.raw":["136.133.230.143"]}},{"terms":{"host.raw":["136.133.230.143"]}},{"terms":{"host.raw":["136.133.230.223"]}},{"terms":{"host.raw":["1
36.133.230.159"]}},{"terms":{"host.raw":["136.133.230.159"]}},{"terms":{"host.raw":["136.133.230.154"]}},{"terms":{"host.raw":["136.133.230.191"]}},{"terms":{"host.raw":["136.133.231.59"]}},{"terms":{"host.raw":["136.133.231.93"]}},{"ter
ms":{"host.raw":["136.133.236.166"]}},{"terms":{"host.raw":["136.133.236.62"]}},{"terms":{"host.raw":["136.133.236.204"]}},{"terms":{"host.raw":["136.133.236.149"]}},{"terms":{"host.raw":["136.133.231.239"]}},{"terms":{"host.raw":["136.1
33.230.194"]}},{"terms":{"host.raw":["136.133.231.5"]}},{"terms":{"host.raw":["136.133.230.4"]}},{"terms":{"host.raw":["136.133.230.6"]}},{"terms":{"host.raw":["136.133.230.6"]}},{"terms":{"host.raw":["136.133.230.5"]}},{"terms":{"host.r
aw":["136.133.230.5"]}},{"terms":{"host.raw":["136.133.175.247"]}},{"terms":{"host.raw":["136.133.175.248"]}},{"terms":{"host.raw":["136.133.175.248"]}},{"terms":{"host.raw":["136.133.131.249"]}},{"terms":{"host.raw":["136.133.131.249"]}
},{"terms":{"host.raw":["136.133.131.249"]}},{"terms":{"host.raw":["136.133.24.249"]}},{"terms":{"host.raw":["136.133.24.249"]}},{"terms":{"host.raw":["136.133.171.8"]}},{"terms":{"host.raw":["136.133.171.8"]}},{"terms":{"host.raw":["136
.133.160.249"]}},{"terms":{"host.raw":["136.133.151.249"]}},{"terms":{"host.raw":["136.133.133.31"]}},{"terms":{"host.raw":["136.133.117.249"]}},{"terms":{"host.raw":["136.133.141.249"]}}]}}}}}}}},"size":0}]]
        at org.elasticsearch.search.SearchService.parseSource(SearchService.java:735)
        at org.elasticsearch.search.SearchService.createContext(SearchService.java:560)
        at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:532)
        at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:294)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:776)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:767)
        at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.query.QueryParsingException: [logstash-2017.08.01] Failed to parse query [PartsView/PartsView]
        at org.elasticsearch.index.query.QueryStringQueryParser.parse(QueryStringQueryParser.java:250)
        at org.elasticsearch.index.query.QueryParseContext.parseInnerQuery(QueryParseContext.java:302)
        at org.elasticsearch.index.query.FilteredQueryParser.parse(FilteredQueryParser.java:71)
        at org.elasticsearch.index.query.QueryParseContext.parseInnerQuery(QueryParseContext.java:302)
        at org.elasticsearch.index.query.FQueryFilterParser.parse(FQueryFilterParser.java:66)
        at org.elasticsearch.index.query.QueryParseContext.executeFilterParser(QueryParseContext.java:368)
        at org.elasticsearch.index.query.QueryParseContext.parseInnerFilter(QueryParseContext.java:349)
        at org.elasticsearch.index.query.IndexQueryParserService.parseInnerFilter(IndexQueryParserService.java:295)
        at org.elasticsearch.search.facet.FacetParseElement.parse(FacetParseElement.java:86)
        at org.elasticsearch.search.SearchService.parseSource(SearchService.java:719)
        ... 10 more
Caused by: org.apache.lucene.queryparser.classic.ParseException: Cannot parse 'PartsView/PartsView': Lexical error at line 1, column 20.  Encountered: <EOF> after : "/PartsView"
        at org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:137)
        at org.apache.lucene.queryparser.classic.MapperQueryParser.parse(MapperQueryParser.java:891)
        at org.elasticsearch.index.query.QueryStringQueryParser.parse(QueryStringQueryParser.java:233)
        ... 19 more
Caused by: org.apache.lucene.queryparser.classic.TokenMgrError: Lexical error at line 1, column 20.  Encountered: <EOF> after : "/PartsView"
        at org.apache.lucene.queryparser.classic.QueryParserTokenManager.getNextToken(QueryParserTokenManager.java:1133)
        at org.apache.lucene.queryparser.classic.QueryParser.jj_scan_token(QueryParser.java:601)
        at org.apache.lucene.queryparser.classic.QueryParser.jj_3R_2(QueryParser.java:484)
        at org.apache.lucene.queryparser.classic.QueryParser.jj_3_1(QueryParser.java:491)
        at org.apache.lucene.queryparser.classic.QueryParser.jj_2_1(QueryParser.java:477)
        at org.apache.lucene.queryparser.classic.QueryParser.Clause(QueryParser.java:228)
        at org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:183)
        at org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:172)
        at org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:127)
Last edited by tmcdonald on Wed Aug 02, 2017 4:59 pm, edited 1 time in total.
Reason: Please use [code][/code] tags around long output
uma K
Posts: 63
Joined: Tue Feb 14, 2017 12:41 pm

Re: Logstash stopping abruptly

Post by uma K »

Totally I have 4 nodes but however when I check the nodes with below command on each server, I get only 2 nodes displayed but all 4nodes have same cluster ID

curl -XGET localhost:9200/_nodes/jvm?pretty

Looks like all 4 nodes are not communicating each other only 2 nodes are in sync.
User avatar
tacolover101
Posts: 432
Joined: Mon Apr 10, 2017 11:55 am

Re: Logstash stopping abruptly

Post by tacolover101 »

ruh roh, it looks like you may have hit a split brain, which is where they both start running solo on their own. can you upload a NLS profile from your different machines?

something to help prevent this in the future so that neither lost party can elect a master, is perhaps run your cluster with an odd number, such as 3 or 5. then, set your min. master nodes to 2, or 3 - this will allow you to take down machines for maint. and upgrade as needed. elastic does a great job at explaining it here - https://www.elastic.co/guide/en/elastic ... r-election
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Logstash stopping abruptly

Post by tmcdonald »

Thanks for the assist, @tacolover101! OP, let us know if you need further assistance.
Former Nagios employee
uma K
Posts: 63
Joined: Tue Feb 14, 2017 12:41 pm

Re: Logstash stopping abruptly

Post by uma K »

I increased watermark level to 90% and now my server is collecting logs.

I execute the below command and I see that 4nodes are displayed
curl -XGET localhost:9200/_nodes/jvm?pretty

Can you help me with the reason for this split brain appearance.
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Logstash stopping abruptly

Post by mcapra »

We would need the complete historical logs during the event from each node to be able to say for sure what caused it. Typically, it's either network issues or some instability on one or many nodes that causes them to crash.
Former Nagios employee
https://www.mcapra.com/
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Logstash stopping abruptly

Post by dwhitfield »

Can you PM me the two profiles?

After you PM the profile, please update this thread. Updating this thread is the only way for it to show back up on our dashboard.
Locked