Error: All shards failed for phase
Posted: Tue Mar 19, 2019 3:01 am
Hello,
Our Nagios Log Server has stopped working (the elasticsearch service seemed to have stopped).
We restarted the elasticsearch service and a lot of messages of the following type appeared in the log:
MSG1:
All shards failed for phase: [query] org.elasticsearch.action.NoShardAvailableActionException: [nagioslogserver][4] null
Then, at some point, the following message appeared:
MSG2:
[2019-03-19 07:46:56,742][DEBUG][action.search.type ] [04c4efb4-9365-45d3-9c7b-162e3cbcc051] All shards failed for phase: [query]
org.elasticsearch.index.shard.IllegalIndexShardStateException: [nagioslogserver][0] CurrentState[RECOVERING] operations only allowed when started/relocated
After this message, we were able to use Nagios Log Server.
Q1: What is the meaning for MSG1 and MSG2?
Q2: How can we understand what happened, so we can avoid this kind of issues in the future?
Important note: very often, we receive a lot of messages of the following type:
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000)
Nagios Log Server Forum discussion on the 'rejected execution' issue:
https://support.nagios.com/forum/viewto ... 37&t=49189
Q3: Is the current issue ('All shards failed for phase') related to the 'rejected execution (queue capacity 1000)' issue?
I am using nagios Log Server on one node only: Nagios Log Server 1.4.4, Elasticsearch 1.6.0
Thank you!
Regards,
Liviu
Our Nagios Log Server has stopped working (the elasticsearch service seemed to have stopped).
We restarted the elasticsearch service and a lot of messages of the following type appeared in the log:
MSG1:
All shards failed for phase: [query] org.elasticsearch.action.NoShardAvailableActionException: [nagioslogserver][4] null
Then, at some point, the following message appeared:
MSG2:
[2019-03-19 07:46:56,742][DEBUG][action.search.type ] [04c4efb4-9365-45d3-9c7b-162e3cbcc051] All shards failed for phase: [query]
org.elasticsearch.index.shard.IllegalIndexShardStateException: [nagioslogserver][0] CurrentState[RECOVERING] operations only allowed when started/relocated
After this message, we were able to use Nagios Log Server.
Q1: What is the meaning for MSG1 and MSG2?
Q2: How can we understand what happened, so we can avoid this kind of issues in the future?
Important note: very often, we receive a lot of messages of the following type:
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000)
Nagios Log Server Forum discussion on the 'rejected execution' issue:
https://support.nagios.com/forum/viewto ... 37&t=49189
Q3: Is the current issue ('All shards failed for phase') related to the 'rejected execution (queue capacity 1000)' issue?
I am using nagios Log Server on one node only: Nagios Log Server 1.4.4, Elasticsearch 1.6.0
Thank you!
Regards,
Liviu