Page 2 of 3

Re: Unsuccessful Add of Instance

Posted: Wed Sep 11, 2019 12:42 pm
by mbellerue
I don't know of any way to force the second server to complete the installation process, without it actually completing the installation process. I'm not sure it would be a good idea to attempt, since we don't know at what point it failed.

We can attack this one of two ways. You can try removing the second server from the cluster. It's possible that something got messed up during the addition of the node to the cluster, and all subsequent attempts to get the second server to complete its task are getting hung up on the fact that the second server already exists in key configuration files.
https://assets.nagios.com/downloads/nag ... luster.pdf

Or, we can dig a little further. If you can get us system profiles from the two servers, we can dig into them and see if we can find out what happened. For the first instance it's easy since you have access to the web interface. Just go to Admin -> System Status -> Download System Profile. For the second instance, you'll have to ssh in, and generate the profile manually by running,

Code: Select all

/usr/local/nagioslogserver/scripts/profile.sh
Which will create the system-profile.tar.gz file in /tmp/.

Re: Unsuccessful Add of Instance

Posted: Wed Sep 11, 2019 1:46 pm
by rocheryderm
Thank you. I tried the procedure to remove a node, but this didn't help - adding the node failed exactly the same way as before.

Please find attached profiles for master (rbbusnls1p) and additional instance (rbbusnls2p) attached.

Support Edit: downloaded rbbusnls1p-system-profile.tar.gz, and rbbusnls2p-system-profile.tar.gz and shared with team

Re: Unsuccessful Add of Instance

Posted: Thu Sep 12, 2019 10:03 am
by mbellerue
Alright, we've looked over the profiles, and it looks like everything is in place. Let's just check a couple things to see if we can find out why adding the server failed. Can you check the umask of root, and the permissions of the files in following directories,

Code: Select all

ls -lah /var/www/html/nagioslogserver/
ls -lah /usr/local/nagioslogserver/
However, if you want, at this point you can also run the following commands, and it should complete your installation.

Code: Select all

touch /usr/local/nagioslogserver/.installed
curl -XGET 'http://localhost:9200/nagioslogserver/cf_option/is_installed'

Re: Unsuccessful Add of Instance

Posted: Thu Sep 12, 2019 2:29 pm
by rocheryderm
hello @mbellerue

Thank you for all the time you and the team have spent so far.

The ".installed" file was already there on both the master and the secondary server

Code: Select all

[root@rbbusnls2p nagioslogserver]# ls -lah /var/www/html/nagioslogserver/
total 4.0K
drwxr-xr-x  5 root   root    67 Sep  9 15:25 .
drwxr-xr-x  3 root   root    46 Sep  9 15:27 ..
drwxr-xr-x 17 apache apache 230 Sep  9 16:03 application
-rw-r--r--  1 root   root    12 Sep  9 15:25 lsversion
drwxr-xr-x  8 root   root   130 Sep  9 15:25 system
drwxr-xr-x  9 apache apache 185 Sep  9 15:27 www
[root@rbbusnls2p nagioslogserver]# ls -lah /usr/local/nagioslogserver/
total 0
drwxrwxr-x  10 nagios nagios 138 Sep  9 15:27 .
drwxr-xr-x. 15 root   root   175 Sep  9 15:25 ..
drwxr-xr-x   7 nagios nagios 128 Sep  9 15:27 elasticsearch
drwxrwxr-x   2 nagios nagios   6 Sep  9 15:25 etc
-rw-r--r--   1 root   root     0 Sep 12 15:18 .installed
drwxr-xr-x   6 nagios nagios 171 Sep  9 15:26 logstash
drwxrwxr-x   2 nagios nagios  62 Sep  9 15:25 mibs
drwxrwxr-x   2 nagios nagios 263 Sep  9 15:25 scripts
drwxrwxr-x   2 nagios nagios   6 Sep  9 15:25 snapshots
drwxrwxr-x   3 nagios nagios  27 Sep  9 15:27 tmp
drwxrwxr-x   2 nagios nagios 115 Sep 11 14:54 var
[root@rbbusnls2p nagioslogserver]#
I followed your commands on the secondary server

Code: Select all

[root@rbbusnls2p nagioslogserver]# touch /usr/local/nagioslogserver/.installed
[root@rbbusnls2p nagioslogserver]# curl -XGET 'http://localhost:9200/nagioslogserver/cf_option/is_installed'
{"_index":"nagioslogserver","_type":"cf_option","_id":"is_installed","found":false}
No change - do I need to restart, or should it be immediate? Did you mean to have an XPUT instead of an XGET?

Re: Unsuccessful Add of Instance

Posted: Thu Sep 12, 2019 3:32 pm
by scottwilkerson
One more path to touch

Code: Select all

touch /var/www/html/nagioslogserver/application/cache/installed
Once logged in, you will also need to apply configuration so it it picked up by the second node

Re: Unsuccessful Add of Instance

Posted: Thu Sep 12, 2019 4:14 pm
by rocheryderm
BOOM. THANK YOU.

Re: Unsuccessful Add of Instance

Posted: Thu Sep 12, 2019 4:17 pm
by scottwilkerson
rocheryderm wrote:BOOM. THANK YOU.
Great!

Re: Unsuccessful Add of Instance

Posted: Thu Sep 12, 2019 4:33 pm
by scottwilkerson
I wanted to add one more thing, you seemed to be missing a key in your nagioslogserver index that could cause this to happen in the future

I would suggest you run the following on one of the Instances from the command line

Code: Select all

curl -XPOST 'http://localhost:9200/nagioslogserver/cf_option/is_installed' -d '{"created":"2019-09-12 00:00:00","created_by":0,"value":"1"}}'

Re: Unsuccessful Add of Instance

Posted: Thu Sep 12, 2019 5:36 pm
by rocheryderm
Thank you - this is great. Now have 4 servers up. But I'm concerned, shouldn't I see something in the "Actions" column? Like... the ability to delete instances?

Code: Select all

curl -XGET 'http://localhost:9200/nagioslogserver/cf_option/is_installed'
{"_index":"nagioslogserver","_type":"cf_option","_id":"is_installed","_version":1,"found":true,"_source":{"created":"2019-09-12 00:00:00","created_by":0,"value":"1"}}}

Re: Unsuccessful Add of Instance

Posted: Fri Sep 13, 2019 6:33 am
by scottwilkerson
rocheryderm wrote: But I'm concerned, shouldn't I see something in the "Actions" column? Like... the ability to delete instances?
This only show up if they are not connected