Page 2 of 3
Re: Unsuccessful Add of Instance
Posted: Wed Sep 11, 2019 12:42 pm
by mbellerue
I don't know of any way to force the second server to complete the installation process, without it actually completing the installation process. I'm not sure it would be a good idea to attempt, since we don't know at what point it failed.
We can attack this one of two ways. You can try removing the second server from the cluster. It's possible that something got messed up during the addition of the node to the cluster, and all subsequent attempts to get the second server to complete its task are getting hung up on the fact that the second server already exists in key configuration files.
https://assets.nagios.com/downloads/nag ... luster.pdf
Or, we can dig a little further. If you can get us system profiles from the two servers, we can dig into them and see if we can find out what happened. For the first instance it's easy since you have access to the web interface. Just go to Admin -> System Status -> Download System Profile. For the second instance, you'll have to ssh in, and generate the profile manually by running,
Code: Select all
/usr/local/nagioslogserver/scripts/profile.sh
Which will create the system-profile.tar.gz file in /tmp/.
Re: Unsuccessful Add of Instance
Posted: Wed Sep 11, 2019 1:46 pm
by rocheryderm
Thank you. I tried the procedure to remove a node, but this didn't help - adding the node failed exactly the same way as before.
Please find attached profiles for master (rbbusnls1p) and additional instance (rbbusnls2p) attached.
Support Edit: downloaded rbbusnls1p-system-profile.tar.gz, and rbbusnls2p-system-profile.tar.gz and shared with team
Re: Unsuccessful Add of Instance
Posted: Thu Sep 12, 2019 10:03 am
by mbellerue
Alright, we've looked over the profiles, and it looks like everything is in place. Let's just check a couple things to see if we can find out why adding the server failed. Can you check the umask of root, and the permissions of the files in following directories,
Code: Select all
ls -lah /var/www/html/nagioslogserver/
ls -lah /usr/local/nagioslogserver/
However, if you want, at this point you can also run the following commands, and it should complete your installation.
Code: Select all
touch /usr/local/nagioslogserver/.installed
curl -XGET 'http://localhost:9200/nagioslogserver/cf_option/is_installed'
Re: Unsuccessful Add of Instance
Posted: Thu Sep 12, 2019 2:29 pm
by rocheryderm
hello @mbellerue
Thank you for all the time you and the team have spent so far.
The ".installed" file was already there on both the master and the secondary server
Code: Select all
[root@rbbusnls2p nagioslogserver]# ls -lah /var/www/html/nagioslogserver/
total 4.0K
drwxr-xr-x 5 root root 67 Sep 9 15:25 .
drwxr-xr-x 3 root root 46 Sep 9 15:27 ..
drwxr-xr-x 17 apache apache 230 Sep 9 16:03 application
-rw-r--r-- 1 root root 12 Sep 9 15:25 lsversion
drwxr-xr-x 8 root root 130 Sep 9 15:25 system
drwxr-xr-x 9 apache apache 185 Sep 9 15:27 www
[root@rbbusnls2p nagioslogserver]# ls -lah /usr/local/nagioslogserver/
total 0
drwxrwxr-x 10 nagios nagios 138 Sep 9 15:27 .
drwxr-xr-x. 15 root root 175 Sep 9 15:25 ..
drwxr-xr-x 7 nagios nagios 128 Sep 9 15:27 elasticsearch
drwxrwxr-x 2 nagios nagios 6 Sep 9 15:25 etc
-rw-r--r-- 1 root root 0 Sep 12 15:18 .installed
drwxr-xr-x 6 nagios nagios 171 Sep 9 15:26 logstash
drwxrwxr-x 2 nagios nagios 62 Sep 9 15:25 mibs
drwxrwxr-x 2 nagios nagios 263 Sep 9 15:25 scripts
drwxrwxr-x 2 nagios nagios 6 Sep 9 15:25 snapshots
drwxrwxr-x 3 nagios nagios 27 Sep 9 15:27 tmp
drwxrwxr-x 2 nagios nagios 115 Sep 11 14:54 var
[root@rbbusnls2p nagioslogserver]#
I followed your commands on the secondary server
Code: Select all
[root@rbbusnls2p nagioslogserver]# touch /usr/local/nagioslogserver/.installed
[root@rbbusnls2p nagioslogserver]# curl -XGET 'http://localhost:9200/nagioslogserver/cf_option/is_installed'
{"_index":"nagioslogserver","_type":"cf_option","_id":"is_installed","found":false}
No change - do I need to restart, or should it be immediate? Did you mean to have an XPUT instead of an XGET?
Re: Unsuccessful Add of Instance
Posted: Thu Sep 12, 2019 3:32 pm
by scottwilkerson
One more path to touch
Code: Select all
touch /var/www/html/nagioslogserver/application/cache/installed
Once logged in, you will also need to apply configuration so it it picked up by the second node
Re: Unsuccessful Add of Instance
Posted: Thu Sep 12, 2019 4:14 pm
by rocheryderm
BOOM. THANK YOU.
Re: Unsuccessful Add of Instance
Posted: Thu Sep 12, 2019 4:17 pm
by scottwilkerson
rocheryderm wrote:BOOM. THANK YOU.
Great!
Re: Unsuccessful Add of Instance
Posted: Thu Sep 12, 2019 4:33 pm
by scottwilkerson
I wanted to add one more thing, you seemed to be missing a key in your nagioslogserver index that could cause this to happen in the future
I would suggest you run the following on one of the Instances from the command line
Code: Select all
curl -XPOST 'http://localhost:9200/nagioslogserver/cf_option/is_installed' -d '{"created":"2019-09-12 00:00:00","created_by":0,"value":"1"}}'
Re: Unsuccessful Add of Instance
Posted: Thu Sep 12, 2019 5:36 pm
by rocheryderm
Thank you - this is great. Now have 4 servers up. But I'm concerned, shouldn't I see something in the "Actions" column? Like... the ability to delete instances?
Code: Select all
curl -XGET 'http://localhost:9200/nagioslogserver/cf_option/is_installed'
{"_index":"nagioslogserver","_type":"cf_option","_id":"is_installed","_version":1,"found":true,"_source":{"created":"2019-09-12 00:00:00","created_by":0,"value":"1"}}}
Re: Unsuccessful Add of Instance
Posted: Fri Sep 13, 2019 6:33 am
by scottwilkerson
rocheryderm wrote: But I'm concerned, shouldn't I see something in the "Actions" column? Like... the ability to delete instances?
This only show up if they are not connected