Cluster 2nd Node OFF

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
teirekos
Posts: 110
Joined: Wed Nov 26, 2014 6:06 am

Re: Cluster 2nd Node OFF

Post by teirekos »

Node A

Code: Select all

[root@NagiosLogServer elasticsearch]# curl -XGET 'http://127.0.0.1:9200/?pretty'
{
  "status" : 200,
  "name" : "1048634e-2f8f-4ec5-9432-edba342d51dd",
  "version" : {
    "number" : "1.3.2",
    "build_hash" : "dee175dbe2f254f3f26992f5d7591939aaefd12f",
    "build_timestamp" : "2014-08-13T14:29:30Z",
    "build_snapshot" : false,
    "lucene_version" : "4.9"
  },
  "tagline" : "You Know, for Search"
}
Node B

Code: Select all

[root@NagiosLogServer2 logstash]# curl -XGET 'http://127.0.0.1:9200/?pretty'
{
  "status" : 200,
  "name" : "845bc07c-ed91-4920-8e23-747c9cc699f5",
  "version" : {
    "number" : "1.3.2",
    "build_hash" : "dee175dbe2f254f3f26992f5d7591939aaefd12f",
    "build_timestamp" : "2014-08-13T14:29:30Z",
    "build_snapshot" : false,
    "lucene_version" : "4.9"
  },
  "tagline" : "You Know, for Search"
}
cmerchant
Posts: 546
Joined: Wed Sep 24, 2014 11:19 am

Re: Cluster 2nd Node OFF

Post by cmerchant »

Are we seeing the results of these queries with 2nd node is connected?
Node A

Code: Select all

[root@NagiosLogServer elasticsearch]# curl -XGET 'http://127.0.0.1:9200/?pretty'
{
"status" : 200,
"name" : "1048634e-2f8f-4ec5-9432-edba342d51dd",
"version" : {
"number" : "1.3.2",
"build_hash" : "dee175dbe2f254f3f26992f5d7591939aaefd12f",
"build_timestamp" : "2014-08-13T14:29:30Z",
"build_snapshot" : false,
"lucene_version" : "4.9"
},
"tagline" : "You Know, for Search"
}
Node B

Code: Select all

[root@NagiosLogServer2 logstash]# curl -XGET 'http://127.0.0.1:9200/?pretty'
{
"status" : 200,
"name" : "845bc07c-ed91-4920-8e23-747c9cc699f5",
"version" : {
"number" : "1.3.2",
"build_hash" : "dee175dbe2f254f3f26992f5d7591939aaefd12f",
"build_timestamp" : "2014-08-13T14:29:30Z",
"build_snapshot" : false,
"lucene_version" : "4.9"
},
"tagline" : "You Know, for Search"
}
Also, have you modified the permissions for logstash to allow access to privileged port 514?
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Cluster 2nd Node OFF

Post by jolson »

Would you please collect some Elasticsearch logs for us? Run the following on both nodes:

Code: Select all

tar czfv elasticsearchlogs.tgz /var/log/elasticsearch/
Please upload the resulting files.

Also, do you know the time period that the disconnect may have happened during?
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
teirekos
Posts: 110
Joined: Wed Nov 26, 2014 6:06 am

Re: Cluster 2nd Node OFF

Post by teirekos »

elasticsearchlogs_1.tgz from my 1st NodeA
elasticsearchlogs_2.tgz from my 2nd NodeB

Last time I rebooted both nodes after a few hours the cluster "broke" i.e. Cluster Status Yellow with unassigned shard and in the Instance Status the other node has "!".
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Cluster 2nd Node OFF

Post by scottwilkerson »

teirekos,

Lets make the following change to your elasticsearch configuration /usr/local/nagioslogserver/elasticsearch/config/elasticsearch.yml

On each instance change this

Code: Select all

# discovery.zen.minimum_master_nodes: 1
To this

Code: Select all

discovery.zen.minimum_master_nodes: 2
Then lets restart elasticsearch on each instance

Code: Select all

service elasticsearch restart
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
teirekos
Posts: 110
Joined: Wed Nov 26, 2014 6:06 am

Re: Cluster 2nd Node OFF

Post by teirekos »

I did exactly what you instructed me. It is ok for now (but this is always the case after a restart).
We 'll have to wait and see... I'll send feedback.
Thanx a lot.
teirekos
Posts: 110
Joined: Wed Nov 26, 2014 6:06 am

Re: Cluster 2nd Node OFF

Post by teirekos »

same problem after a few hours after the restart. I attach the latest elasticsearch logs...
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Cluster 2nd Node OFF

Post by jolson »

I cannot see anything in the logs that leads to an obvious error. Would it be alright if you turned the logging level up and reproduce the issue once more?

Code: Select all

vi /usr/local/nagioslogserver/elasticsearch/config/logging.yml
Change "es.logger.level: INFO" to es.logger.level: DEBUG". Once changed, restart both nodes.
After the nodes have disconnected again, upload your log files using the same method as before.

Also, if you could run the following command when you notice high CPU usage, it could be helpful:

Code: Select all

curl -XGET localhost:9200/_nodes/hot_threads
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
teirekos
Posts: 110
Joined: Wed Nov 26, 2014 6:06 am

Re: Cluster 2nd Node OFF

Post by teirekos »

I've changed the log level to DEBUG and rebooted the servers. For now the cluster seems to be ok (we'll have to wait though).
I was expecting a large amount of logs in the debug level but this is not the case! Also in the logstash log I get the "not part of the cluster" WARN. (I attach the open logs from both nodes).
Another strange thing was that after 2 reboots in node A the logstash process didn't start so I had to start it manually.
Since the cluster was down I had unassigned shards. After the reboot the shards were "synchronized" but now only 1 shard is left as unassigned thus the Cluster Health status is still yellow.
cmerchant
Posts: 546
Joined: Wed Sep 24, 2014 11:19 am

Re: Cluster 2nd Node OFF

Post by cmerchant »

I'm noticing the timestamps between the nodea and nodeb are different when the one node disconnects.

Can you confirm that you have the same clock settings between the nodes?
Locked