tar czfv elasticsearchlogs.tgz /var/log/elasticsearch/
Please upload the resulting files.
Also, do you know the time period that the disconnect may have happened during?
TwitsBlog Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
elasticsearchlogs_1.tgz from my 1st NodeA
elasticsearchlogs_2.tgz from my 2nd NodeB
Last time I rebooted both nodes after a few hours the cluster "broke" i.e. Cluster Status Yellow with unassigned shard and in the Instance Status the other node has "!".
I did exactly what you instructed me. It is ok for now (but this is always the case after a restart).
We 'll have to wait and see... I'll send feedback.
Thanx a lot.
I cannot see anything in the logs that leads to an obvious error. Would it be alright if you turned the logging level up and reproduce the issue once more?
vi /usr/local/nagioslogserver/elasticsearch/config/logging.yml
Change "es.logger.level: INFO" to es.logger.level: DEBUG". Once changed, restart both nodes.
After the nodes have disconnected again, upload your log files using the same method as before.
Also, if you could run the following command when you notice high CPU usage, it could be helpful:
TwitsBlog Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
I've changed the log level to DEBUG and rebooted the servers. For now the cluster seems to be ok (we'll have to wait though).
I was expecting a large amount of logs in the debug level but this is not the case! Also in the logstash log I get the "not part of the cluster" WARN. (I attach the open logs from both nodes).
Another strange thing was that after 2 reboots in node A the logstash process didn't start so I had to start it manually.
Since the cluster was down I had unassigned shards. After the reboot the shards were "synchronized" but now only 1 shard is left as unassigned thus the Cluster Health status is still yellow.