Page 2 of 4
Re: Cluster 2nd Node OFF
Posted: Thu Mar 05, 2015 2:55 am
by teirekos
Node A
Code: Select all
[root@NagiosLogServer elasticsearch]# curl -XGET 'http://127.0.0.1:9200/?pretty'
{
"status" : 200,
"name" : "1048634e-2f8f-4ec5-9432-edba342d51dd",
"version" : {
"number" : "1.3.2",
"build_hash" : "dee175dbe2f254f3f26992f5d7591939aaefd12f",
"build_timestamp" : "2014-08-13T14:29:30Z",
"build_snapshot" : false,
"lucene_version" : "4.9"
},
"tagline" : "You Know, for Search"
}
Node B
Code: Select all
[root@NagiosLogServer2 logstash]# curl -XGET 'http://127.0.0.1:9200/?pretty'
{
"status" : 200,
"name" : "845bc07c-ed91-4920-8e23-747c9cc699f5",
"version" : {
"number" : "1.3.2",
"build_hash" : "dee175dbe2f254f3f26992f5d7591939aaefd12f",
"build_timestamp" : "2014-08-13T14:29:30Z",
"build_snapshot" : false,
"lucene_version" : "4.9"
},
"tagline" : "You Know, for Search"
}
Re: Cluster 2nd Node OFF
Posted: Thu Mar 05, 2015 4:43 pm
by cmerchant
Are we seeing the results of these queries with 2nd node is connected?
Node A
Code: Select all
[root@NagiosLogServer elasticsearch]# curl -XGET 'http://127.0.0.1:9200/?pretty'
{
"status" : 200,
"name" : "1048634e-2f8f-4ec5-9432-edba342d51dd",
"version" : {
"number" : "1.3.2",
"build_hash" : "dee175dbe2f254f3f26992f5d7591939aaefd12f",
"build_timestamp" : "2014-08-13T14:29:30Z",
"build_snapshot" : false,
"lucene_version" : "4.9"
},
"tagline" : "You Know, for Search"
}
Node B
Code: Select all
[root@NagiosLogServer2 logstash]# curl -XGET 'http://127.0.0.1:9200/?pretty'
{
"status" : 200,
"name" : "845bc07c-ed91-4920-8e23-747c9cc699f5",
"version" : {
"number" : "1.3.2",
"build_hash" : "dee175dbe2f254f3f26992f5d7591939aaefd12f",
"build_timestamp" : "2014-08-13T14:29:30Z",
"build_snapshot" : false,
"lucene_version" : "4.9"
},
"tagline" : "You Know, for Search"
}
Also, have you modified the permissions for logstash to allow access to privileged port 514?
Re: Cluster 2nd Node OFF
Posted: Thu Mar 05, 2015 4:44 pm
by jolson
Would you please collect some Elasticsearch logs for us? Run the following on both nodes:
Code: Select all
tar czfv elasticsearchlogs.tgz /var/log/elasticsearch/
Please upload the resulting files.
Also, do you know the time period that the disconnect may have happened during?
Re: Cluster 2nd Node OFF
Posted: Fri Mar 06, 2015 4:25 am
by teirekos
elasticsearchlogs_1.tgz from my 1st NodeA
elasticsearchlogs_2.tgz from my 2nd NodeB
Last time I rebooted both nodes after a few hours the cluster "broke" i.e. Cluster Status Yellow with unassigned shard and in the Instance Status the other node has "!".
Re: Cluster 2nd Node OFF
Posted: Fri Mar 06, 2015 12:18 pm
by scottwilkerson
teirekos,
Lets make the following change to your elasticsearch configuration /usr/local/nagioslogserver/elasticsearch/config/elasticsearch.yml
On each instance change this
Code: Select all
# discovery.zen.minimum_master_nodes: 1
To this
Code: Select all
discovery.zen.minimum_master_nodes: 2
Then lets restart elasticsearch on each instance
Re: Cluster 2nd Node OFF
Posted: Mon Mar 09, 2015 9:34 am
by teirekos
I did exactly what you instructed me. It is ok for now (but this is always the case after a restart).
We 'll have to wait and see... I'll send feedback.
Thanx a lot.
Re: Cluster 2nd Node OFF
Posted: Mon Mar 09, 2015 11:14 am
by teirekos
same problem after a few hours after the restart. I attach the latest elasticsearch logs...
Re: Cluster 2nd Node OFF
Posted: Mon Mar 09, 2015 2:54 pm
by jolson
I cannot see anything in the logs that leads to an obvious error. Would it be alright if you turned the logging level up and reproduce the issue once more?
Code: Select all
vi /usr/local/nagioslogserver/elasticsearch/config/logging.yml
Change "es.logger.level: INFO" to es.logger.level: DEBUG". Once changed, restart both nodes.
After the nodes have disconnected again, upload your log files using the same method as before.
Also, if you could run the following command when you notice high CPU usage, it could be helpful:
Code: Select all
curl -XGET localhost:9200/_nodes/hot_threads
Re: Cluster 2nd Node OFF
Posted: Tue Mar 10, 2015 10:04 am
by teirekos
I've changed the log level to DEBUG and rebooted the servers. For now the cluster seems to be ok (we'll have to wait though).
I was expecting a large amount of logs in the debug level but this is not the case! Also in the logstash log I get the "not part of the cluster" WARN. (I attach the open logs from both nodes).
Another strange thing was that after 2 reboots in node A the logstash process didn't start so I had to start it manually.
Since the cluster was down I had unassigned shards. After the reboot the shards were "synchronized" but now only 1 shard is left as unassigned thus the Cluster Health status is still yellow.
Re: Cluster 2nd Node OFF
Posted: Tue Mar 10, 2015 5:03 pm
by cmerchant
I'm noticing the timestamps between the nodea and nodeb are different when the one node disconnects.
Can you confirm that you have the same clock settings between the nodes?