Unsuccessful Add of Instance

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
rocheryderm
Posts: 69
Joined: Fri Jul 13, 2018 1:09 pm

Unsuccessful Add of Instance

Post by rocheryderm »

Hello

Trying to add 3 more NLS instances to my existing instance.

See attached images and profile.

Every time I run through the "Connect to an Existing Cluster" choice from the new instances, I get this error:

Code: Select all

"Could not establish connection, this could be due to a slow connection, or you may want to re-enter your cluster information."
- I am never able to progress past this point.

I am specifically not sending anything to the servers from clients so the servers have nothing else to do but respond to this choice. They are not busy at all.

On the existing Instance, it LOOKS like the command was successful, because the additional instances do show up.

What should I be looking at? Any hints/tips?

I can't find anything obvious in the logs in /var/log/elasticsearch

Mike

Support edit: Downloaded system-profile.tar.gz and shared with team
You do not have the required permissions to view the files attached to this post.
rocheryderm
Posts: 69
Joined: Fri Jul 13, 2018 1:09 pm

Re: Unsuccessful Add of Instance

Post by rocheryderm »

Looking at this again... second instance still claims that Elasticsearch isn't running, but data has certainly balanced evenly between both servers... (see screenshot attached)
You do not have the required permissions to view the files attached to this post.
User avatar
mbellerue
Posts: 1403
Joined: Fri Jul 12, 2019 11:10 am

Re: Unsuccessful Add of Instance

Post by mbellerue »

From the profile, and that screenshot, it looks like the addition was mostly successful, it's just not able to communicate fully. Is there a firewall running on the second server? Can you reach TCP port 9300 on the second server from the first server?

Code: Select all

telnet <SecondServerNameOrIP> 9300
or

Code: Select all

nmap <SecondServerNameOrIP> -p 9300
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
rocheryderm
Posts: 69
Joined: Fri Jul 13, 2018 1:09 pm

Re: Unsuccessful Add of Instance

Post by rocheryderm »

telnet is banned in favor of SSH

here's an nmap response

Code: Select all

[root@rbbusnls1p ~]# nmap rbbusnls2p -p 9300

Starting Nmap 6.40 ( http://nmap.org ) at 2019-09-10 14:06 EDT
Nmap scan report for rbbusnls2p (151.120.113.52)
Host is up (0.00016s latency).
PORT     STATE SERVICE
9300/tcp open  vrace
MAC Address: 00:17:A4:77:04:1C (Hewlett-Packard Company)

Nmap done: 1 IP address (1 host up) scanned in 0.10 seconds
[root@rbbusnls1p ~]#
I'm fairly certain this is not a firewall issue. Blade-to-blade traffic, so latency is minimal as we aren't even leaving the chassis.
User avatar
mbellerue
Posts: 1403
Joined: Fri Jul 12, 2019 11:10 am

Re: Unsuccessful Add of Instance

Post by mbellerue »

If you point your browser to that hostname, rbbusnls2p, does it bring up the Log Server page?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
rocheryderm
Posts: 69
Joined: Fri Jul 13, 2018 1:09 pm

Re: Unsuccessful Add of Instance

Post by rocheryderm »

Yes, both the master and the secondary instance respond correctly in the browser.
rocheryderm
Posts: 69
Joined: Fri Jul 13, 2018 1:09 pm

Re: Unsuccessful Add of Instance

Post by rocheryderm »

Is there a way to complete this manually? The suspense is killing me.
User avatar
mbellerue
Posts: 1403
Joined: Fri Jul 12, 2019 11:10 am

Re: Unsuccessful Add of Instance

Post by mbellerue »

The thing is, it looks like it's complete, and we're just running into the interface thinking things aren't connected. On the second server, go to Admin -> Instance Status, and let's see if it's using any storage.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
rocheryderm
Posts: 69
Joined: Fri Jul 13, 2018 1:09 pm

Re: Unsuccessful Add of Instance

Post by rocheryderm »

Hi... my other screenshots should have demonstrated that but here's another screenshot

Wait a minute, I can't do anything on the second server, it's stuck at the Install or Connect screen.

It looks like both nodes are active

Additionally, here's an excerpt from the elasticsearch log file (/var/log/elasticsearch...) on the master and secondary nodes

Code: Select all

[2019-09-10 17:42:11,992][INFO ][cluster.service          ] [77596958-30db-4cb4-bf11-09e114a44012] removed {[fb689df9-6555-48f7-ada4-70fb56095c6f][hc3JnLGjTq60ID_icK0gdg][rbbusnls2p][inet[/151.120.113.52:9300]]{max_local_storage_nodes=1},}, reason: zen-disco-node_left([fb689df9-6555-48f7-ada4-70fb56095c6f][hc3JnLGjTq60ID_icK0gdg][rbbusnls2p][inet[/151.120.113.52:9300]]{max_local_storage_nodes=1})
[2019-09-10 17:42:40,509][INFO ][cluster.service          ] [77596958-30db-4cb4-bf11-09e114a44012] added {[3fcf7609-0455-49a5-81c3-c7baaa51776a][BgCAY0VpQxC3PxPonH5gBQ][rbbusnls2p][inet[/151.120.113.52:9300]]{max_local_storage_nodes=1},}, reason: zen-disco-receive(join from node[[3fcf7609-0455-49a5-81c3-c7baaa51776a][BgCAY0VpQxC3PxPonH5gBQ][rbbusnls2p][inet[/151.120.113.52:9300]]{max_local_storage_nodes=1}])

Code: Select all

[2019-09-10 17:42:15,906][INFO ][node                     ] [fb689df9-6555-48f7-ada4-70fb56095c6f] stopping ...
[2019-09-10 17:42:16,033][INFO ][node                     ] [fb689df9-6555-48f7-ada4-70fb56095c6f] stopped
[2019-09-10 17:42:16,033][INFO ][node                     ] [fb689df9-6555-48f7-ada4-70fb56095c6f] closing ...
[2019-09-10 17:42:16,041][INFO ][node                     ] [fb689df9-6555-48f7-ada4-70fb56095c6f] closed
[2019-09-10 17:42:37,424][INFO ][node                     ] [3fcf7609-0455-49a5-81c3-c7baaa51776a] version[1.7.6], pid[51593], build[c730b59/2016-11-18T15:21:16Z]
[2019-09-10 17:42:37,425][INFO ][node                     ] [3fcf7609-0455-49a5-81c3-c7baaa51776a] initializing ...
[2019-09-10 17:42:37,503][INFO ][plugins                  ] [3fcf7609-0455-49a5-81c3-c7baaa51776a] loaded [knapsack-1.7.3.0-d0ea246], sites []
[2019-09-10 17:42:37,543][INFO ][env                      ] [3fcf7609-0455-49a5-81c3-c7baaa51776a] using [1] data paths, mounts [[/usr/local/nagioslogserver/elasticsearch/data (/dev/mapper/vg00-lv_elastic)]], net usable_space [944.7gb], net total_space [1.6tb], types [xfs]
[2019-09-10 17:42:41,009][INFO ][node                     ] [3fcf7609-0455-49a5-81c3-c7baaa51776a] initialized
[2019-09-10 17:42:41,010][INFO ][node                     ] [3fcf7609-0455-49a5-81c3-c7baaa51776a] starting ...
[2019-09-10 17:42:41,370][INFO ][transport                ] [3fcf7609-0455-49a5-81c3-c7baaa51776a] bound_address {inet[/0.0.0.0:9300]}, publish_address {inet[/151.120.113.52:9300]}
[2019-09-10 17:42:41,386][INFO ][discovery                ] [3fcf7609-0455-49a5-81c3-c7baaa51776a] 15edd11f-8263-4eb7-9054-8ace66feebb6/BgCAY0VpQxC3PxPonH5gBQ
[2019-09-10 17:42:44,555][INFO ][cluster.service          ] [3fcf7609-0455-49a5-81c3-c7baaa51776a] detected_master [77596958-30db-4cb4-bf11-09e114a44012][4BAV0gZFRkmvGxVxvB1ycA][rbbusnls1p][inet[/151.120.113.51:9300]]{max_local_storage_nodes=1}, added {[77596958-30db-4cb4-bf11-09e114a44012][4BAV0gZFRkmvGxVxvB1ycA][rbbusnls1p][inet[/151.120.113.51:9300]]{max_local_storage_nodes=1},}, reason: zen-disco-receive(from master [[77596958-30db-4cb4-bf11-09e114a44012][4BAV0gZFRkmvGxVxvB1ycA][rbbusnls1p][inet[/151.120.113.51:9300]]{max_local_storage_nodes=1}])
[2019-09-10 17:42:44,925][INFO ][http                     ] [3fcf7609-0455-49a5-81c3-c7baaa51776a] bound_address {inet[/127.0.0.1:9200]}, publish_address {inet[localhost/127.0.0.1:9200]}
[2019-09-10 17:42:44,926][INFO ][node                     ] [3fcf7609-0455-49a5-81c3-c7baaa51776a] started
You do not have the required permissions to view the files attached to this post.
rocheryderm
Posts: 69
Joined: Fri Jul 13, 2018 1:09 pm

Re: Unsuccessful Add of Instance

Post by rocheryderm »

Way back when I was toying purely with Elasticsearch... sometimes my queries from Kibana would fail after 30000ms.

This process does take 30-45 seconds (from hitting enter to getting the error) - is it possible a query is timing out or failing?

I've scoured every log file I can find, it looks like your code to add this node to the cluster is encrypted and thus I can't be much more help.

Is there something I can do to manually complete this, without having to use the GUI?
Locked