Page 1 of 1
failed to connect new LS to existing cluster. then unfortunately, it did
Posted: Tue Jun 04, 2024 1:39 pm
by iankidd
I was having problems connecting a new LS to my existing 3 node cluster, getting a connection timeout error every time I tried to join. I went through the forums and tried everything, but continued to get 504 errors. Out of desperation, I restarted elasticsearch on the new node. When I went to http://{new_LS_IP}/nagioslogserver/install to try again, I was presented with a login screen instead of the option to build a stand alone system or connect to a cluster. I logged in with nagiosadmin, and despite the previous 504 error, I found the new node was now joined to the cluster ... sort of.
in admin -> instance status: elasticsearch and logstash both show down for the new system, even though they're up. restarting it via command line doesn't change this.
When I click on the IP for the new node in admin->instance status, and look at the configuration for file system, the mount is /, which I do not want. All the others nodes are pointing to the mount point /nagioslog_data, which does exist on the new system.
I tried going to http://{new_LS_IP}/nagioslogserver/install, but that just sends me back to the home page. At this point, I don't know if this is salvageable.
Re: failed to connect new LS to existing cluster. then unfortunately, it did
Posted: Tue Jun 04, 2024 1:53 pm
by jmichaelson
I'm not convinced that this isn't salvageable yet.
First off, let's see if and where there is any data. Normally data is in /usr/local/nagioslogserver/elasticsearch/data. On your other nodes that's not the case. I'm going to guess that on this new node it is. Can you verify that?
Next, how did you get the existing nodes pointing to /nagioslog_data? Was it a symlink, configuration file change, or something else?
Re: failed to connect new LS to existing cluster. then unfortunately, it did
Posted: Tue Jun 04, 2024 2:53 pm
by iankidd
There is data in the location you specified on the new node:
# ls -l /usr/local/nagioslogserver/elasticsearch/data/4261ea49-46b2-43f8-b7b5-4378a217f92c/nodes/0
total 4
drwxr-xr-x 55 nagios nagios 4096 Jun 4 11:02 indices
-rw-r--r-- 1 nagios nagios 0 Jun 4 11:02 node.lock
drwxr-xr-x 2 nagios nagios 26 Jun 4 11:31 _state
As for how I set up /nagioslog_data, I couldn't remember - I built these systems back in 2018. I poked around found that I had modified /etc/sysconfig/elasticsearch:
# grep DATA_DIR /etc/sysconfig/elasticsearch
DATA_DIR="/nagioslog_data/elasticsearch/data"
Looks like my belief that there would a place to set up data location as part of the installation was incorrect.
Re: failed to connect new LS to existing cluster. then unfortunately, it did
Posted: Tue Jun 04, 2024 3:21 pm
by iankidd
I'm thinking that I should
(1)shut down elasticsearch
(2)migrate the data from /usr/local/nagioslogserver/elasticsearch/data to /nagioslog_data/elasticsearch/data
(3)edit the /etc/sysconfig/elasticsearch
(4)restart elasticsearch
could fix the data location issue?
Re: failed to connect new LS to existing cluster. then unfortunately, it did
Posted: Wed Jun 05, 2024 4:23 pm
by jmichaelson
That is exactly what I would suggest. I'd recommend a copy rather than a move so you can revert in the event of a problem. You can delete the original after you verify that its all working.
Re: failed to connect new LS to existing cluster. then unfortunately, it did
Posted: Thu Jun 06, 2024 2:07 pm
by iankidd
Thanks! I liked the idea of a copy rather than a migration so stopped elasticsearch, created a tarball of /usr/local/nagioslogserver/elasticsearch/data, copied the data over to /nagioslog_data/elasticsearch/data, modified the DATA_DIR in /etc/sysconfig/elasticsearch, then started elasticsearch.
Based on what I'm seeing, that particular issue now looks resolved.
Regarding the issue with elasticsearch and logstash showing down in the GUI, I looked at /usr/local/nagioslogserver/var/poller.log and saw sudo errors. I granted sudo rights and that issue cleared up (totally forgot about that requirement as well. Looks like I forgot quite a bit in the six years since I built my initial log servers!)
I pointed a VM to send syslog data to the new LS, but nothing showed up. I looked at /usr/local/nagioslogserver/logstash/etc/conf.d/ and saw that there were no configuration files on the new LS. So I went to the GUI -> configure -> "apply configuration", and the configuration files appeared, but still no syslog data. I also noticed that there was nothing listening on port 514 on the new LS
I vi'ed /etc/sysconfig/logstash and changed LS_USER=nagios to LS_USER=root, restarted logstash, and that seemed to do the trick. odd that I didn't need to that on the old LS's...
I've checked the following:
the output of curl -XGET '
http://localhost:9200/_cat/shards?v' shows that the new LS has indexes in "started" state
I then opened some old indexes and saw that they are now in "started" state with the command above and some are on the new LS
cluster status is green
all elasticsearch/logstash indicators are green in admin -> instance status
I'm now collecting logs from the VM I pointed to the new LS and can pull up data up when I run a query through gui -> dashboards
Is there anything else you think I should test? If not, I think I'm good!
Re: failed to connect new LS to existing cluster. then unfortunately, it did
Posted: Thu Jun 06, 2024 2:53 pm
by jmichaelson
Nothing that I can think of. I wish we could figure out how you ended up in the state that you did, since the install scripts normally handle all of the items you mentioned happening. Glad eveyrthing is working for you now!