Page 2 of 3

Re: backups not functioning after update to 1.4.0

Posted: Mon Feb 08, 2016 8:54 am
by krobertson71
Does this repository need to be on a totally separate server/host? Or, can I have a backup partition on one of the nodes mounted to the other node and achieve the same affect?

Both of the backup directories are name the same on each node and nagios:nagios have full permissions. So no matter which node received the backup call, it "Should" still work, unless it is looking for it to be a share.

Re: backups not functioning after update to 1.4.0

Posted: Mon Feb 08, 2016 1:16 pm
by krobertson71
We tried to set up a NFS share to the two NLS nodes. We are getting UID mismatches as the nagios user on each node has a different UUID.

This was done by the installer when each node was created, it just grabbed the next available UUID?

Is there documentation or any help you can provide that would help with this issue?

Re: backups not functioning after update to 1.4.0

Posted: Mon Feb 08, 2016 2:56 pm
by krobertson71
we were able to get around the permission issues with the UID by just giving out 777 (Temp only solution). Ran the following command on an index but still got a partial.

Right now we have backups2 created on our 01 node and it is mounted on our 02 node. Still having the UID issue as the logstash installer just randomly grabs the next in line.

Code: Select all

[nagios@node02 backups2]$ curator snapshot --repository logbackup indices --prefix logstash.2016.02.07
2016-02-08 14:37:44,902 INFO      Job starting: snapshot indices
2016-02-08 14:37:44,902 WARNING   Overriding default connection timeout.  New timeout: 21600
2016-02-08 14:37:44,918 INFO      Action snapshot will be performed on the following indices: [u'logstash-2016.02.07']
2016-02-08 14:37:44,997 INFO      Snapshot name: curator-20160208193744
2016-02-08 14:40:59,464 ERROR     Snapshot PARTIAL completed with state: PARTIAL
2016-02-08 14:40:59,464 WARNING   Job did not complete successfully.

Re: backups not functioning after update to 1.4.0

Posted: Mon Feb 08, 2016 5:29 pm
by Box293
krobertson71 wrote:can I have a backup partition on one of the nodes mounted to the other node and achieve the same affect?
Yes that is OK, as long as it's mounted in the same path on both nodes it's fine.
krobertson71 wrote:Right now we have backups2 created on our 01 node and it is mounted on our 02 node. Still having the UID issue as the logstash installer just randomly grabs the next in line.
So it backups2 something like /mnt/backups2 on both nodes?

Re: backups not functioning after update to 1.4.0

Posted: Tue Feb 09, 2016 9:01 am
by krobertson71
No.

We already had a physcial /backups directory in place before this upgrade on each host. So the Linux admin created a backups2 directory within the backups directory "/backups/backups2" on node1 and exposed it via NFS to node2.

Problem is with UID. The UIDS for the nagios user are different on each host. To work around this we have to give the mounted "backups/backups2" on node2 full 777 just to get around the permissions issues the UID mismatch caused. After all of that I still only got a partial.

Are we setting up the NFS wrong? How can we get this right? Do the UID's of the Nagios user have to match on both nodes or is there a way to use NFS so the nagios user on node2 can have the proper access?

Re: backups not functioning after update to 1.4.0

Posted: Tue Feb 09, 2016 11:01 am
by jolson
Do the UID's of the Nagios user have to match on both nodes or is there a way to use NFS so the nagios user on node2 can have the proper access?
Proper setup requires the nagios user to have the same UID on both nodes. Could you try the following procedure on both nodes to ensure that the 'nagios' user has proper access to your new NFS share?

Code: Select all

su - nagios
cd /backups/backups2
touch file
rm file
Be sure that the nagios user can perform that procedure on both nodes, or else you might end up with PARTIAL backups.

Re: backups not functioning after update to 1.4.0

Posted: Tue Feb 09, 2016 11:18 am
by krobertson71
Yes this works, but when you do it on 02, since the UID for nagios is different than on 01, is assigned to another user, we have to give the directory WORLD read and write to pull it off. We can't leave it like this.

The log server installer grabs whatever available UID and assigns it to the nagios user. This is what has caused the issue. Nothing in the documentation for upgrading, or installing, states anything about matching UID (Granted this is a known for NFS), and by following your directions creates this issue.

So does the Nagios user need to be created prior to install? Do we need to create the nagcmd and nagios groups ahead of time when creating a node before we run the installer?

Trying to figure out a way out of this so we have backups again, as mentioned in other threads, without having to wipe and start over.

Again this leads back to documentation for installation that does not stress this point. If the cluster already exists, for us who had the older version with acknowledged backup issues, and we upgrade to a version that now requries NFS and now checks for this fact, it did not before as backups would run if it had local write access, we are stuck between a rock and a hard place.

Would "Breaking" the cluster and rebuilding help? This goes to my question about creating users ahead of time. I could remove the 02 node from the cluster, since all the events are still on 01, and rebuild 02 and create the nagios user first before installing.

If this is possible, what is the proper method for remove a node from the cluster so Log server knows it is now only a single node installation?

Help.

Re: backups not functioning after update to 1.4.0

Posted: Tue Feb 09, 2016 6:10 pm
by jolson
From the backup document: https://assets.nagios.com/downloads/nag ... enance.pdf
This location must be a shared network path writable by the nagios user and available to ALL instances in your cluster.
It has always been this way, and it's been a hard requirement since we built the system. We didn't add any additional enforcement in the newer versions of Nagios Log Server, so I'm truly not sure what's causing the problem you're experiencing.

The issue with using local repositories is not that it won't work - it will work fine. The issue is that if you're using local repositories you'll have an independent backup on each node in your cluster instead of one single backup for the entire cluster.
Would "Breaking" the cluster and rebuilding help?
Possibly, but it may be easier to just change the UID of the 'nagios' user.

Run the following as root:

Code: Select all

service logstash stop
service elasticsearch stop
service crond stop
killall -u nagios
usermod -u <NEWUID> nagios
groupmod -g <NEWGID> nagios
find / -user <OLDUID> -exec chown -h <NEWUID> {} \;
find / -group <OLDGID> -exec chgrp -h <NEWGID> {} \;
usermod -g <NEWGID> nagios
After the above has been run, you should be good to start all of the processes once more:

Code: Select all

service logstash start
service elasticsearch start
service crond start

Re: backups not functioning after update to 1.4.0

Posted: Wed Feb 10, 2016 9:16 am
by krobertson71
I think the issue here is that I am not explaining myself properly.

Let's say I am building a third NLS node. I run the installer. It does not prompt for a UID or GID so they can match other nodes in the cluster. The installer grabs what UID is next line to be used. This would almost ensure a UID will be grabbed that is different than the others.

Why I was asking if we could create the users ahead of time before running the installer.

Re: backups not functioning after update to 1.4.0

Posted: Wed Feb 10, 2016 9:31 am
by krobertson71
Oh and I forgot to mention this and this probably will make a difference.

I do not have Root to these servers. We have a Linux admin team that manages all our Linux hosts. I have to hand over your installation documentation to them so they can build the server.

Now refer back to my last post and see if that fits together better.