Hello,
Our Nagios Log Server has stopped working (the elasticsearch service seemed to have stopped).
We restarted the elasticsearch service and a lot of messages of the following type appeared in the log:
MSG1:
All shards failed for phase: [query] org.elasticsearch.action.NoShardAvailableActionException: [nagioslogserver][4] null
Then, at some point, the following message appeared:
MSG2:
[2019-03-19 07:46:56,742][DEBUG][action.search.type ] [04c4efb4-9365-45d3-9c7b-162e3cbcc051] All shards failed for phase: [query]
org.elasticsearch.index.shard.IllegalIndexShardStateException: [nagioslogserver][0] CurrentState[RECOVERING] operations only allowed when started/relocated
After this message, we were able to use Nagios Log Server.
Q1: What is the meaning for MSG1 and MSG2?
Q2: How can we understand what happened, so we can avoid this kind of issues in the future?
Important note: very often, we receive a lot of messages of the following type:
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000)
Nagios Log Server Forum discussion on the 'rejected execution' issue:
https://support.nagios.com/forum/viewto ... 37&t=49189
Q3: Is the current issue ('All shards failed for phase') related to the 'rejected execution (queue capacity 1000)' issue?
I am using nagios Log Server on one node only: Nagios Log Server 1.4.4, Elasticsearch 1.6.0
Thank you!
Regards,
Liviu
Error: All shards failed for phase
-
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: Error: All shards failed for phase
Hello, @li_alm. The first two messages you showed could be normal at the elastic search startup. But the third error message could potentially indicate the lack of system resources, such as CPU or RAM. How much RAM and CPU cores does this server have? Can you generate a system profile by running the script I attached from the /tmp/ folder in the log server? That should generate a system profile archive that you can share with us in this thread.
You do not have the required permissions to view the files attached to this post.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Error: All shards failed for phase
Hello, @npolovenko,
Thank you for your reply.
I have 2 nagios deployments (completely independent, separate), both behave the same (a lot of "rejected" messages in the logs).
Deployment1:
1 CPU core (Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz)
2 GB RAM
Deployment2:
1 CPU core (Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz)
4 GB RAM
I ran the script you have me.
I attached the result.
Regards,
Liviu
Thank you for your reply.
I have 2 nagios deployments (completely independent, separate), both behave the same (a lot of "rejected" messages in the logs).
Deployment1:
1 CPU core (Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz)
2 GB RAM
Deployment2:
1 CPU core (Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz)
4 GB RAM
I ran the script you have me.
I attached the result.
Regards,
Liviu
You do not have the required permissions to view the files attached to this post.
Re: Error: All shards failed for phase
The profile provided doesn't contain any rejected messages, but it does show that the java heap usage is 65% percent which can be pretty high for an idle machine. A large query could cause spikes and the reject message. Can you increase the memory on this machine to 4GB to match the other? By default Elasticsearch will only use half of the total system memory so by only having 2GB on the system, Elasticsearch is limited to just 1GB.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Error: All shards failed for phase
OK, @cdienger, thanks, we will try to increase the RAM for the machine using only 2GB.
My main concern was about MSG1 and MSG2 (see my initial post), because I had the impression Nagios Log Servers would not start.
Regards,
Liviu
My main concern was about MSG1 and MSG2 (see my initial post), because I had the impression Nagios Log Servers would not start.
Regards,
Liviu
Re: Error: All shards failed for phase
Those messages are typical of a service restarting. You can verify the services are up from the command line:
service elasticsearch status
or in the web UI under Admin > System > System Status.
service elasticsearch status
or in the web UI under Admin > System > System Status.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.