Page 1 of 1
New instance failed to join cluster
Posted: Wed Aug 28, 2019 5:52 pm
by rferebee
Hello,
I'm trying to add my last CentOS 7 upgraded server to my Log Server cluster and the first time I added it, it had some kind of issue and now I can't get back into the GUI to try and re-add it.
I deleted it from the Instance Status page of my main cluster GUI, but the new server is now telling me that the Elasticsearch service isn't running and won't let me proceed without starting.
It looks like when I tried to join it the first time it didn't create all the necessary directories, so it won't even let me start Elasticsearch. The only directory inside of /usr/local/nagioslogserver is 'lost+found'.
Is there a way to force this box to "start over" and take me back to the New Instance/Connect screen?
Thank you.
Re: New instance failed to join cluster
Posted: Thu Aug 29, 2019 10:46 am
by rferebee
I attached a screenshot of what I see when I try to access the GUI of the new server.
Here's everything in the /usr/local/nagioslogserver directory:
Code: Select all
root@nagioslscc2_temp:/root>cd /usr/local/nagioslogserver/
root@nagioslscc2_temp:/usr/local/nagioslogserver>ls
lost+found
Here's what it looks like when I check the status of elasticsearch and logstash:
Code: Select all
root@nagioslscc2_temp:/usr/local/nagioslogserver>systemctl status elasticsearch
● elasticsearch.service - LSB: This service manages the elasticsearch daemon
Loaded: loaded (/etc/rc.d/init.d/elasticsearch; bad; vendor preset: disabled)
Active: failed (Result: exit-code) since Wed 2019-08-28 15:41:21 PDT; 17h ago
Docs: man:systemd-sysv-generator(8)
Process: 13702 ExecStart=/etc/rc.d/init.d/elasticsearch start (code=exited, status=1/FAILURE)
Aug 28 15:41:21 nagioslscc2_temp systemd[1]: Starting LSB: This service manages the elasticsearch daemon...
Aug 28 15:41:21 nagioslscc2_temp elasticsearch[13702]: Could not open input file: /usr/local/nagioslogserver/scripts/get_es_config.php
Aug 28 15:41:21 nagioslscc2_temp systemd[1]: elasticsearch.service: control process exited, code=exited status=1
Aug 28 15:41:21 nagioslscc2_temp systemd[1]: Failed to start LSB: This service manages the elasticsearch daemon.
Aug 28 15:41:21 nagioslscc2_temp systemd[1]: Unit elasticsearch.service entered failed state.
Aug 28 15:41:21 nagioslscc2_temp systemd[1]: elasticsearch.service failed.
root@nagioslscc2_temp:/usr/local/nagioslogserver>systemctl status logstash
● logstash.service - LSB: Logstash
Loaded: loaded (/etc/rc.d/init.d/logstash; bad; vendor preset: disabled)
Active: failed (Result: exit-code) since Wed 2019-08-28 15:39:07 PDT; 17h ago
Docs: man:systemd-sysv-generator(8)
Process: 3255 ExecStart=/etc/rc.d/init.d/logstash start (code=exited, status=1/FAILURE)
Aug 28 15:39:07 nagioslscc2_temp systemd[1]: Starting LSB: Logstash...
Aug 28 15:39:07 nagioslscc2_temp logstash[3255]: cat: /usr/local/nagioslogserver/var/cluster_uuid: No such file or directory
Aug 28 15:39:07 nagioslscc2_temp logstash[3255]: Could not open input file: /usr/local/nagioslogserver/scripts/get_logstash_config.php
Aug 28 15:39:07 nagioslscc2_temp systemd[1]: logstash.service: control process exited, code=exited status=1
Aug 28 15:39:07 nagioslscc2_temp systemd[1]: Failed to start LSB: Logstash.
Aug 28 15:39:07 nagioslscc2_temp systemd[1]: Unit logstash.service entered failed state.
Aug 28 15:39:07 nagioslscc2_temp systemd[1]: logstash.service failed.
So, for whatever reason the server thinks it's still connected the cluster and is trying to collect logs even though it isn't and can't?
It took 3 weeks to get this server up and running from my Unix team (don't ask)... I'm really hoping we can salvage this and it doesn't need to be rebuilt.
Re: New instance failed to join cluster
Posted: Thu Aug 29, 2019 12:44 pm
by mbellerue
You mentioned that there's a Lost+Found directory in /usr/local/nagioslogserver/. Is that a mount point for additional storage? Is it possible that the Log Server files got installed to /usr/local/nagioslogserver/, and then the additional storage was mounted? Maybe try umount /usr/local/nagioslogserver/ then see what's in that directory. Just to be sure, because we can recover from that pretty easily.
Re: New instance failed to join cluster
Posted: Thu Aug 29, 2019 12:55 pm
by rferebee
Yes, it is a mount point to a larger storage drive.
The instance was added to the cluster accidentally before it was mounted and it looks like that's exactly what the problem is.
All of these directories are present:
Code: Select all
root@nagioslscc2_temp:/usr/local/nagioslogserver>ls
elasticsearch etc logstash mibs scripts snapshots tmp var
If I just remove the /nagioslogserver directory will it "reset" everything?
Re: New instance failed to join cluster
Posted: Thu Aug 29, 2019 1:20 pm
by mbellerue
It won't reset everything, because there is some configuration in other places. The easiest way to deal with this is to create a temporary mount point, say at /mnt/temp/, mount your additional storage there, then,
Code: Select all
cd /usr/local/nagioslogserver/
cp -R ./* /mnt/temp/
cd /
That will get everything on to your additional storage in exactly the way that Log Server is expecting it. You then unmount the additional storage from /mnt/temp, and re-mount it at /usr/local/nagioslogserver/. Then start your services.
Re: New instance failed to join cluster
Posted: Mon Sep 16, 2019 5:50 pm
by rferebee
This issue was resolved internally. We had to rebuild the VM.
Please lock this thread. Thank you.
Re: New instance failed to join cluster
Posted: Tue Sep 17, 2019 7:56 am
by scottwilkerson
rferebee wrote:This issue was resolved internally. We had to rebuild the VM.
Please lock this thread. Thank you.
Great!
Locking