Nagios Support Forum

Posted: **Mon Sep 09, 2019 5:13 pm**

Hello

Trying to add 3 more NLS instances to my existing instance.

See attached images and profile.

Every time I run through the "Connect to an Existing Cluster" choice from the new instances, I get this error:

Code: Select all

"Could not establish connection, this could be due to a slow connection, or you may want to re-enter your cluster information."

- I am never able to progress past this point.

I am specifically not sending anything to the servers from clients so the servers have nothing else to do but respond to this choice. They are not busy at all.

On the existing Instance, it LOOKS like the command was successful, because the additional instances do show up.

What should I be looking at? Any hints/tips?

I can't find anything obvious in the logs in /var/log/elasticsearch

Mike

Support edit: Downloaded system-profile.tar.gz and shared with team

Posted: **Tue Sep 10, 2019 8:04 am**

Looking at this again... second instance still claims that Elasticsearch isn't running, but data has certainly balanced evenly between both servers... (see screenshot attached)

Posted: **Tue Sep 10, 2019 10:29 am**

From the profile, and that screenshot, it looks like the addition was mostly successful, it's just not able to communicate fully. Is there a firewall running on the second server? Can you reach TCP port 9300 on the second server from the first server?

Code: Select all

telnet <SecondServerNameOrIP> 9300

or

Code: Select all

nmap <SecondServerNameOrIP> -p 9300

Posted: **Tue Sep 10, 2019 1:08 pm**

telnet is banned in favor of SSH

here's an nmap response

Code: Select all

[root@rbbusnls1p ~]# nmap rbbusnls2p -p 9300

Starting Nmap 6.40 ( http://nmap.org ) at 2019-09-10 14:06 EDT
Nmap scan report for rbbusnls2p (151.120.113.52)
Host is up (0.00016s latency).
PORT     STATE SERVICE
9300/tcp open  vrace
MAC Address: 00:17:A4:77:04:1C (Hewlett-Packard Company)

Nmap done: 1 IP address (1 host up) scanned in 0.10 seconds
[root@rbbusnls1p ~]#

I'm fairly certain this is not a firewall issue. Blade-to-blade traffic, so latency is minimal as we aren't even leaving the chassis.

Posted: **Tue Sep 10, 2019 2:18 pm**

If you point your browser to that hostname, rbbusnls2p, does it bring up the Log Server page?

Posted: **Tue Sep 10, 2019 3:05 pm**

Yes, both the master and the secondary instance respond correctly in the browser.

Posted: **Tue Sep 10, 2019 3:50 pm**

Is there a way to complete this manually? The suspense is killing me.

Posted: **Tue Sep 10, 2019 4:41 pm**

The thing is, it looks like it's complete, and we're just running into the interface thinking things aren't connected. On the second server, go to Admin -> Instance Status, and let's see if it's using any storage.

Posted: **Tue Sep 10, 2019 4:51 pm**

Hi... my other screenshots should have demonstrated that but here's another screenshot

Wait a minute, I can't do anything on the second server, it's stuck at the Install or Connect screen.

It looks like both nodes are active

Additionally, here's an excerpt from the elasticsearch log file (/var/log/elasticsearch...) on the master and secondary nodes

Code: Select all

[2019-09-10 17:42:11,992][INFO ][cluster.service          ] [77596958-30db-4cb4-bf11-09e114a44012] removed {[fb689df9-6555-48f7-ada4-70fb56095c6f][hc3JnLGjTq60ID_icK0gdg][rbbusnls2p][inet[/151.120.113.52:9300]]{max_local_storage_nodes=1},}, reason: zen-disco-node_left([fb689df9-6555-48f7-ada4-70fb56095c6f][hc3JnLGjTq60ID_icK0gdg][rbbusnls2p][inet[/151.120.113.52:9300]]{max_local_storage_nodes=1})
[2019-09-10 17:42:40,509][INFO ][cluster.service          ] [77596958-30db-4cb4-bf11-09e114a44012] added {[3fcf7609-0455-49a5-81c3-c7baaa51776a][BgCAY0VpQxC3PxPonH5gBQ][rbbusnls2p][inet[/151.120.113.52:9300]]{max_local_storage_nodes=1},}, reason: zen-disco-receive(join from node[[3fcf7609-0455-49a5-81c3-c7baaa51776a][BgCAY0VpQxC3PxPonH5gBQ][rbbusnls2p][inet[/151.120.113.52:9300]]{max_local_storage_nodes=1}])

Code: Select all

[2019-09-10 17:42:15,906][INFO ][node                     ] [fb689df9-6555-48f7-ada4-70fb56095c6f] stopping ...
[2019-09-10 17:42:16,033][INFO ][node                     ] [fb689df9-6555-48f7-ada4-70fb56095c6f] stopped
[2019-09-10 17:42:16,033][INFO ][node                     ] [fb689df9-6555-48f7-ada4-70fb56095c6f] closing ...
[2019-09-10 17:42:16,041][INFO ][node                     ] [fb689df9-6555-48f7-ada4-70fb56095c6f] closed
[2019-09-10 17:42:37,424][INFO ][node                     ] [3fcf7609-0455-49a5-81c3-c7baaa51776a] version[1.7.6], pid[51593], build[c730b59/2016-11-18T15:21:16Z]
[2019-09-10 17:42:37,425][INFO ][node                     ] [3fcf7609-0455-49a5-81c3-c7baaa51776a] initializing ...
[2019-09-10 17:42:37,503][INFO ][plugins                  ] [3fcf7609-0455-49a5-81c3-c7baaa51776a] loaded [knapsack-1.7.3.0-d0ea246], sites []
[2019-09-10 17:42:37,543][INFO ][env                      ] [3fcf7609-0455-49a5-81c3-c7baaa51776a] using [1] data paths, mounts [[/usr/local/nagioslogserver/elasticsearch/data (/dev/mapper/vg00-lv_elastic)]], net usable_space [944.7gb], net total_space [1.6tb], types [xfs]
[2019-09-10 17:42:41,009][INFO ][node                     ] [3fcf7609-0455-49a5-81c3-c7baaa51776a] initialized
[2019-09-10 17:42:41,010][INFO ][node                     ] [3fcf7609-0455-49a5-81c3-c7baaa51776a] starting ...
[2019-09-10 17:42:41,370][INFO ][transport                ] [3fcf7609-0455-49a5-81c3-c7baaa51776a] bound_address {inet[/0.0.0.0:9300]}, publish_address {inet[/151.120.113.52:9300]}
[2019-09-10 17:42:41,386][INFO ][discovery                ] [3fcf7609-0455-49a5-81c3-c7baaa51776a] 15edd11f-8263-4eb7-9054-8ace66feebb6/BgCAY0VpQxC3PxPonH5gBQ
[2019-09-10 17:42:44,555][INFO ][cluster.service          ] [3fcf7609-0455-49a5-81c3-c7baaa51776a] detected_master [77596958-30db-4cb4-bf11-09e114a44012][4BAV0gZFRkmvGxVxvB1ycA][rbbusnls1p][inet[/151.120.113.51:9300]]{max_local_storage_nodes=1}, added {[77596958-30db-4cb4-bf11-09e114a44012][4BAV0gZFRkmvGxVxvB1ycA][rbbusnls1p][inet[/151.120.113.51:9300]]{max_local_storage_nodes=1},}, reason: zen-disco-receive(from master [[77596958-30db-4cb4-bf11-09e114a44012][4BAV0gZFRkmvGxVxvB1ycA][rbbusnls1p][inet[/151.120.113.51:9300]]{max_local_storage_nodes=1}])
[2019-09-10 17:42:44,925][INFO ][http                     ] [3fcf7609-0455-49a5-81c3-c7baaa51776a] bound_address {inet[/127.0.0.1:9200]}, publish_address {inet[localhost/127.0.0.1:9200]}
[2019-09-10 17:42:44,926][INFO ][node                     ] [3fcf7609-0455-49a5-81c3-c7baaa51776a] started

Posted: **Tue Sep 10, 2019 5:10 pm**

Way back when I was toying purely with Elasticsearch... sometimes my queries from Kibana would fail after 30000ms.

This process does take 30-45 seconds (from hitting enter to getting the error) - is it possible a query is timing out or failing?

I've scoured every log file I can find, it looks like your code to add this node to the cluster is encrypted and thus I can't be much more help.

Is there something I can do to manually complete this, without having to use the GUI?

Nagios Support Forum

Unsuccessful Add of Instance

Unsuccessful Add of Instance

Re: Unsuccessful Add of Instance

Re: Unsuccessful Add of Instance

Re: Unsuccessful Add of Instance

Re: Unsuccessful Add of Instance

Re: Unsuccessful Add of Instance

Re: Unsuccessful Add of Instance

Re: Unsuccessful Add of Instance

Re: Unsuccessful Add of Instance

Re: Unsuccessful Add of Instance