Hello,
I'm trying to add my last CentOS 7 upgraded server to my Log Server cluster and the first time I added it, it had some kind of issue and now I can't get back into the GUI to try and re-add it.
I deleted it from the Instance Status page of my main cluster GUI, but the new server is now telling me that the Elasticsearch service isn't running and won't let me proceed without starting.
It looks like when I tried to join it the first time it didn't create all the necessary directories, so it won't even let me start Elasticsearch. The only directory inside of /usr/local/nagioslogserver is 'lost+found'.
Is there a way to force this box to "start over" and take me back to the New Instance/Connect screen?
Thank you.
New instance failed to join cluster
Re: New instance failed to join cluster
I attached a screenshot of what I see when I try to access the GUI of the new server.
Here's everything in the /usr/local/nagioslogserver directory:
Here's what it looks like when I check the status of elasticsearch and logstash:
So, for whatever reason the server thinks it's still connected the cluster and is trying to collect logs even though it isn't and can't?
It took 3 weeks to get this server up and running from my Unix team (don't ask)... I'm really hoping we can salvage this and it doesn't need to be rebuilt.
Here's everything in the /usr/local/nagioslogserver directory:
Code: Select all
root@nagioslscc2_temp:/root>cd /usr/local/nagioslogserver/
root@nagioslscc2_temp:/usr/local/nagioslogserver>ls
lost+foundCode: Select all
root@nagioslscc2_temp:/usr/local/nagioslogserver>systemctl status elasticsearch
● elasticsearch.service - LSB: This service manages the elasticsearch daemon
Loaded: loaded (/etc/rc.d/init.d/elasticsearch; bad; vendor preset: disabled)
Active: failed (Result: exit-code) since Wed 2019-08-28 15:41:21 PDT; 17h ago
Docs: man:systemd-sysv-generator(8)
Process: 13702 ExecStart=/etc/rc.d/init.d/elasticsearch start (code=exited, status=1/FAILURE)
Aug 28 15:41:21 nagioslscc2_temp systemd[1]: Starting LSB: This service manages the elasticsearch daemon...
Aug 28 15:41:21 nagioslscc2_temp elasticsearch[13702]: Could not open input file: /usr/local/nagioslogserver/scripts/get_es_config.php
Aug 28 15:41:21 nagioslscc2_temp systemd[1]: elasticsearch.service: control process exited, code=exited status=1
Aug 28 15:41:21 nagioslscc2_temp systemd[1]: Failed to start LSB: This service manages the elasticsearch daemon.
Aug 28 15:41:21 nagioslscc2_temp systemd[1]: Unit elasticsearch.service entered failed state.
Aug 28 15:41:21 nagioslscc2_temp systemd[1]: elasticsearch.service failed.
root@nagioslscc2_temp:/usr/local/nagioslogserver>systemctl status logstash
● logstash.service - LSB: Logstash
Loaded: loaded (/etc/rc.d/init.d/logstash; bad; vendor preset: disabled)
Active: failed (Result: exit-code) since Wed 2019-08-28 15:39:07 PDT; 17h ago
Docs: man:systemd-sysv-generator(8)
Process: 3255 ExecStart=/etc/rc.d/init.d/logstash start (code=exited, status=1/FAILURE)
Aug 28 15:39:07 nagioslscc2_temp systemd[1]: Starting LSB: Logstash...
Aug 28 15:39:07 nagioslscc2_temp logstash[3255]: cat: /usr/local/nagioslogserver/var/cluster_uuid: No such file or directory
Aug 28 15:39:07 nagioslscc2_temp logstash[3255]: Could not open input file: /usr/local/nagioslogserver/scripts/get_logstash_config.php
Aug 28 15:39:07 nagioslscc2_temp systemd[1]: logstash.service: control process exited, code=exited status=1
Aug 28 15:39:07 nagioslscc2_temp systemd[1]: Failed to start LSB: Logstash.
Aug 28 15:39:07 nagioslscc2_temp systemd[1]: Unit logstash.service entered failed state.
Aug 28 15:39:07 nagioslscc2_temp systemd[1]: logstash.service failed.It took 3 weeks to get this server up and running from my Unix team (don't ask)... I'm really hoping we can salvage this and it doesn't need to be rebuilt.
You do not have the required permissions to view the files attached to this post.
Re: New instance failed to join cluster
You mentioned that there's a Lost+Found directory in /usr/local/nagioslogserver/. Is that a mount point for additional storage? Is it possible that the Log Server files got installed to /usr/local/nagioslogserver/, and then the additional storage was mounted? Maybe try umount /usr/local/nagioslogserver/ then see what's in that directory. Just to be sure, because we can recover from that pretty easily.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: New instance failed to join cluster
Yes, it is a mount point to a larger storage drive.
The instance was added to the cluster accidentally before it was mounted and it looks like that's exactly what the problem is.
All of these directories are present:
If I just remove the /nagioslogserver directory will it "reset" everything?
The instance was added to the cluster accidentally before it was mounted and it looks like that's exactly what the problem is.
All of these directories are present:
Code: Select all
root@nagioslscc2_temp:/usr/local/nagioslogserver>ls
elasticsearch etc logstash mibs scripts snapshots tmp var
Re: New instance failed to join cluster
It won't reset everything, because there is some configuration in other places. The easiest way to deal with this is to create a temporary mount point, say at /mnt/temp/, mount your additional storage there, then,
That will get everything on to your additional storage in exactly the way that Log Server is expecting it. You then unmount the additional storage from /mnt/temp, and re-mount it at /usr/local/nagioslogserver/. Then start your services.
Code: Select all
cd /usr/local/nagioslogserver/
cp -R ./* /mnt/temp/
cd /As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: New instance failed to join cluster
This issue was resolved internally. We had to rebuild the VM.
Please lock this thread. Thank you.
Please lock this thread. Thank you.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: New instance failed to join cluster
Great!rferebee wrote:This issue was resolved internally. We had to rebuild the VM.
Please lock this thread. Thank you.
Locking