Page 1 of 1

Cannot add new install to existing cluster

Posted: Thu Oct 29, 2015 10:03 pm
by mark.payne
I am evaluating NLS and managed to get a single server installed and in a acceptable state.
Now I am trying to add another server to the cluster.

I add the hostname and cluster ID then click finish installation however it keeps saying "Could not establish connection, this could be due to a slow connection, or you may want to re-enter your cluster information."
Before I tried this I made sure that TCP ports 9300-9400 are open both ways and can telnet from the new server to the existing server on port 9300 successfully.

So I ran a tcpdump on the existing server and ran the connect to existing cluster on new server again, nothing coming in at all. Has to be the new server at this point.
Running a tcpdump on the new server and rerun the connect to existing cluster but still nothing is being sent out.

Connecting to the existing cluster Finish install button is not doing anything... Please help.

Running CentOS 7.1

Re: Cannot add new install to existing cluster

Posted: Thu Oct 29, 2015 10:11 pm
by Box293
What is the output of this on both servers?

Code: Select all

cat /usr/local/nagioslogserver/var/cluster_hosts

Re: Cannot add new install to existing cluster

Posted: Thu Oct 29, 2015 10:33 pm
by mark.payne
Existing server:
localhost
192.168.xxx.131

New server:
localhost
192.168.xxx.131
192.168.xxx.131
192.168.xxx.131
192.168.xxx.131
192.168.xxx.131

xxx being the same value for all.

Re: Cannot add new install to existing cluster

Posted: Thu Oct 29, 2015 11:20 pm
by Box293
OK so that looks ok.

What is the output of this on both servers?

Code: Select all

netstat -na|grep LISTEN
Can show us the firewall status and rules?
Also, as a test, are you able to stop the local firewall on CentOS?

Re: Cannot add new install to existing cluster

Posted: Thu Oct 29, 2015 11:25 pm
by mark.payne
Existing server:
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN
tcp6 0 0 :::2056 :::* LISTEN
tcp6 0 0 :::5544 :::* LISTEN
tcp6 0 0 :::2057 :::* LISTEN
tcp6 0 0 127.0.0.1:9200 :::* LISTEN
tcp6 0 0 :::80 :::* LISTEN
tcp6 0 0 :::9300 :::* LISTEN
tcp6 0 0 :::22 :::* LISTEN
tcp6 0 0 :::3515 :::* LISTEN


New Server:
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN
tcp6 0 0 :::80 :::* LISTEN
tcp6 0 0 :::22 :::* LISTEN


Stopping the firewall doesn't help.
As explained in first post tcpdump on the new server doesn't show any tcp traffic when trying to "Connect to Existing Cluster"
Its like its not even trying to do anything.

Re: Cannot add new install to existing cluster

Posted: Thu Oct 29, 2015 11:43 pm
by mark.payne
I noticed that when I try connect to existing cluster the elasticsearch service exits and starts.

Below is the log

[2015-10-30 17:27:03,024][INFO ][node ] [ead9d787-7f47-41c8-b504-76ccabba3bd2] version[1.6.0], pid[4966], build[cdd3ac4/2015-06-09T13:36:34Z]
[2015-10-30 17:27:03,024][INFO ][node ] [ead9d787-7f47-41c8-b504-76ccabba3bd2] initializing ...
[2015-10-30 17:27:03,038][INFO ][plugins ] [ead9d787-7f47-41c8-b504-76ccabba3bd2] loaded [knapsack-1.5.2.0-f340ad1], sites []
[2015-10-30 17:27:03,076][ERROR][bootstrap ] Exception
org.elasticsearch.ElasticsearchIllegalStateException: Failed to created node environment
at org.elasticsearch.node.internal.InternalNode.<init>(InternalNode.java:164)
at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159)
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:77)
at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:245)
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32)
Caused by: java.nio.file.AccessDeniedException: /mnt/sdb1/nagioslogserver/elasticsearch/data/88244341-3928-4ea7-9363-93d3c9b771ca
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:383)
at java.nio.file.Files.createDirectory(Files.java:630)
at java.nio.file.Files.createAndCheckIsDirectory(Files.java:734)
at java.nio.file.Files.createDirectories(Files.java:720)
at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:126)
at org.elasticsearch.node.internal.InternalNode.<init>(InternalNode.java:162)
... 4 more

After I installed I added another drive and moved the directory and data to the new drive.

Re: Cannot add new install to existing cluster

Posted: Thu Oct 29, 2015 11:47 pm
by Box293
Great thanks for that, sometimes we need to double check things just to be sure.
mark.payne wrote:After I installed I added another drive and moved the directory and data to the new drive.
Was this on the new or existing server?
Did you follow any documented procedure for this?

What is the permissions of:

Code: Select all

ll /mnt/sdb1/nagioslogserver/elasticsearch/data/88244341-3928-4ea7-9363-93d3c9b771ca
From the server that you are seeing this in the logs.

Re: Cannot add new install to existing cluster

Posted: Thu Oct 29, 2015 11:52 pm
by mark.payne
Reverted the move of the data directory and can now add to the cluster.
Moved the data back to the new drive and all is well.

I guess shouldn't be changing data around until the cluster is setup...
How this would work when you have large amounts of data already to replicate and the drive you initially installed NLS on doesn't have enough space not sure.

One thing is that there doesn't seem to be any documentation around firewall rules for cluster communication. Might want to add some documentation around this.

Re: Cannot add new install to existing cluster

Posted: Thu Oct 29, 2015 11:54 pm
by mark.payne
Box293 wrote:Great thanks for that, sometimes we need to double check things just to be sure.
mark.payne wrote:After I installed I added another drive and moved the directory and data to the new drive.
Was this on the new or existing server?
Did you follow any documented procedure for this?

What is the permissions of:

Code: Select all

ll /mnt/sdb1/nagioslogserver/elasticsearch/data/88244341-3928-4ea7-9363-93d3c9b771ca
From the server that you are seeing this in the logs.
Moved data on new server.
Yes, I followed the documentation. Same thing I had already done on the existing server.
Permissions are drwxr-xr-x 3 nagios users 4096 Oct 30 17:45 nodes

Cluster status is Green :)

Re: Cannot add new install to existing cluster

Posted: Fri Oct 30, 2015 9:31 am
by jolson
How this would work when you have large amounts of data already to replicate and the drive you initially installed NLS on doesn't have enough space not sure.
I'm not sure I understand this question. Could you clarify a bit please?
Cluster status is Green :)
Happy to hear it! That must mean that your instances connected properly?