New instance failed to join cluster

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Locked
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

New instance failed to join cluster

Post by rferebee »

Hello,

I'm trying to add my last CentOS 7 upgraded server to my Log Server cluster and the first time I added it, it had some kind of issue and now I can't get back into the GUI to try and re-add it.

I deleted it from the Instance Status page of my main cluster GUI, but the new server is now telling me that the Elasticsearch service isn't running and won't let me proceed without starting.

It looks like when I tried to join it the first time it didn't create all the necessary directories, so it won't even let me start Elasticsearch. The only directory inside of /usr/local/nagioslogserver is 'lost+found'.

Is there a way to force this box to "start over" and take me back to the New Instance/Connect screen?

Thank you.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: New instance failed to join cluster

Post by rferebee »

I attached a screenshot of what I see when I try to access the GUI of the new server.

Here's everything in the /usr/local/nagioslogserver directory:

Code: Select all

root@nagioslscc2_temp:/root>cd /usr/local/nagioslogserver/
root@nagioslscc2_temp:/usr/local/nagioslogserver>ls
lost+found
Here's what it looks like when I check the status of elasticsearch and logstash:

Code: Select all

root@nagioslscc2_temp:/usr/local/nagioslogserver>systemctl status elasticsearch
● elasticsearch.service - LSB: This service manages the elasticsearch daemon
   Loaded: loaded (/etc/rc.d/init.d/elasticsearch; bad; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2019-08-28 15:41:21 PDT; 17h ago
     Docs: man:systemd-sysv-generator(8)
  Process: 13702 ExecStart=/etc/rc.d/init.d/elasticsearch start (code=exited, status=1/FAILURE)

Aug 28 15:41:21 nagioslscc2_temp systemd[1]: Starting LSB: This service manages the elasticsearch daemon...
Aug 28 15:41:21 nagioslscc2_temp elasticsearch[13702]: Could not open input file: /usr/local/nagioslogserver/scripts/get_es_config.php
Aug 28 15:41:21 nagioslscc2_temp systemd[1]: elasticsearch.service: control process exited, code=exited status=1
Aug 28 15:41:21 nagioslscc2_temp systemd[1]: Failed to start LSB: This service manages the elasticsearch daemon.
Aug 28 15:41:21 nagioslscc2_temp systemd[1]: Unit elasticsearch.service entered failed state.
Aug 28 15:41:21 nagioslscc2_temp systemd[1]: elasticsearch.service failed.
root@nagioslscc2_temp:/usr/local/nagioslogserver>systemctl status logstash
● logstash.service - LSB: Logstash
   Loaded: loaded (/etc/rc.d/init.d/logstash; bad; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2019-08-28 15:39:07 PDT; 17h ago
     Docs: man:systemd-sysv-generator(8)
  Process: 3255 ExecStart=/etc/rc.d/init.d/logstash start (code=exited, status=1/FAILURE)

Aug 28 15:39:07 nagioslscc2_temp systemd[1]: Starting LSB: Logstash...
Aug 28 15:39:07 nagioslscc2_temp logstash[3255]: cat: /usr/local/nagioslogserver/var/cluster_uuid: No such file or directory
Aug 28 15:39:07 nagioslscc2_temp logstash[3255]: Could not open input file: /usr/local/nagioslogserver/scripts/get_logstash_config.php
Aug 28 15:39:07 nagioslscc2_temp systemd[1]: logstash.service: control process exited, code=exited status=1
Aug 28 15:39:07 nagioslscc2_temp systemd[1]: Failed to start LSB: Logstash.
Aug 28 15:39:07 nagioslscc2_temp systemd[1]: Unit logstash.service entered failed state.
Aug 28 15:39:07 nagioslscc2_temp systemd[1]: logstash.service failed.
So, for whatever reason the server thinks it's still connected the cluster and is trying to collect logs even though it isn't and can't?

It took 3 weeks to get this server up and running from my Unix team (don't ask)... I'm really hoping we can salvage this and it doesn't need to be rebuilt.
You do not have the required permissions to view the files attached to this post.
User avatar
mbellerue
Posts: 1403
Joined: Fri Jul 12, 2019 11:10 am

Re: New instance failed to join cluster

Post by mbellerue »

You mentioned that there's a Lost+Found directory in /usr/local/nagioslogserver/. Is that a mount point for additional storage? Is it possible that the Log Server files got installed to /usr/local/nagioslogserver/, and then the additional storage was mounted? Maybe try umount /usr/local/nagioslogserver/ then see what's in that directory. Just to be sure, because we can recover from that pretty easily.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: New instance failed to join cluster

Post by rferebee »

Yes, it is a mount point to a larger storage drive.

The instance was added to the cluster accidentally before it was mounted and it looks like that's exactly what the problem is.

All of these directories are present:

Code: Select all

root@nagioslscc2_temp:/usr/local/nagioslogserver>ls
elasticsearch  etc  logstash  mibs  scripts  snapshots  tmp  var
If I just remove the /nagioslogserver directory will it "reset" everything?
User avatar
mbellerue
Posts: 1403
Joined: Fri Jul 12, 2019 11:10 am

Re: New instance failed to join cluster

Post by mbellerue »

It won't reset everything, because there is some configuration in other places. The easiest way to deal with this is to create a temporary mount point, say at /mnt/temp/, mount your additional storage there, then,

Code: Select all

cd /usr/local/nagioslogserver/
cp -R ./* /mnt/temp/
cd /
That will get everything on to your additional storage in exactly the way that Log Server is expecting it. You then unmount the additional storage from /mnt/temp, and re-mount it at /usr/local/nagioslogserver/. Then start your services.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: New instance failed to join cluster

Post by rferebee »

This issue was resolved internally. We had to rebuild the VM.

Please lock this thread. Thank you.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: New instance failed to join cluster

Post by scottwilkerson »

rferebee wrote:This issue was resolved internally. We had to rebuild the VM.

Please lock this thread. Thank you.
Great!

Locking
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked