backups not functioning after update to 1.4.0
-
krobertson71
- Posts: 444
- Joined: Tue Feb 11, 2014 10:16 pm
Re: backups not functioning after update to 1.4.0
Does this repository need to be on a totally separate server/host? Or, can I have a backup partition on one of the nodes mounted to the other node and achieve the same affect?
Both of the backup directories are name the same on each node and nagios:nagios have full permissions. So no matter which node received the backup call, it "Should" still work, unless it is looking for it to be a share.
Both of the backup directories are name the same on each node and nagios:nagios have full permissions. So no matter which node received the backup call, it "Should" still work, unless it is looking for it to be a share.
-
krobertson71
- Posts: 444
- Joined: Tue Feb 11, 2014 10:16 pm
Re: backups not functioning after update to 1.4.0
We tried to set up a NFS share to the two NLS nodes. We are getting UID mismatches as the nagios user on each node has a different UUID.
This was done by the installer when each node was created, it just grabbed the next available UUID?
Is there documentation or any help you can provide that would help with this issue?
This was done by the installer when each node was created, it just grabbed the next available UUID?
Is there documentation or any help you can provide that would help with this issue?
-
krobertson71
- Posts: 444
- Joined: Tue Feb 11, 2014 10:16 pm
Re: backups not functioning after update to 1.4.0
we were able to get around the permission issues with the UID by just giving out 777 (Temp only solution). Ran the following command on an index but still got a partial.
Right now we have backups2 created on our 01 node and it is mounted on our 02 node. Still having the UID issue as the logstash installer just randomly grabs the next in line.
Right now we have backups2 created on our 01 node and it is mounted on our 02 node. Still having the UID issue as the logstash installer just randomly grabs the next in line.
Code: Select all
[nagios@node02 backups2]$ curator snapshot --repository logbackup indices --prefix logstash.2016.02.07
2016-02-08 14:37:44,902 INFO Job starting: snapshot indices
2016-02-08 14:37:44,902 WARNING Overriding default connection timeout. New timeout: 21600
2016-02-08 14:37:44,918 INFO Action snapshot will be performed on the following indices: [u'logstash-2016.02.07']
2016-02-08 14:37:44,997 INFO Snapshot name: curator-20160208193744
2016-02-08 14:40:59,464 ERROR Snapshot PARTIAL completed with state: PARTIAL
2016-02-08 14:40:59,464 WARNING Job did not complete successfully.
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: backups not functioning after update to 1.4.0
Yes that is OK, as long as it's mounted in the same path on both nodes it's fine.krobertson71 wrote:can I have a backup partition on one of the nodes mounted to the other node and achieve the same affect?
So it backups2 something like /mnt/backups2 on both nodes?krobertson71 wrote:Right now we have backups2 created on our 01 node and it is mounted on our 02 node. Still having the UID issue as the logstash installer just randomly grabs the next in line.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
krobertson71
- Posts: 444
- Joined: Tue Feb 11, 2014 10:16 pm
Re: backups not functioning after update to 1.4.0
No.
We already had a physcial /backups directory in place before this upgrade on each host. So the Linux admin created a backups2 directory within the backups directory "/backups/backups2" on node1 and exposed it via NFS to node2.
Problem is with UID. The UIDS for the nagios user are different on each host. To work around this we have to give the mounted "backups/backups2" on node2 full 777 just to get around the permissions issues the UID mismatch caused. After all of that I still only got a partial.
Are we setting up the NFS wrong? How can we get this right? Do the UID's of the Nagios user have to match on both nodes or is there a way to use NFS so the nagios user on node2 can have the proper access?
We already had a physcial /backups directory in place before this upgrade on each host. So the Linux admin created a backups2 directory within the backups directory "/backups/backups2" on node1 and exposed it via NFS to node2.
Problem is with UID. The UIDS for the nagios user are different on each host. To work around this we have to give the mounted "backups/backups2" on node2 full 777 just to get around the permissions issues the UID mismatch caused. After all of that I still only got a partial.
Are we setting up the NFS wrong? How can we get this right? Do the UID's of the Nagios user have to match on both nodes or is there a way to use NFS so the nagios user on node2 can have the proper access?
Re: backups not functioning after update to 1.4.0
Proper setup requires the nagios user to have the same UID on both nodes. Could you try the following procedure on both nodes to ensure that the 'nagios' user has proper access to your new NFS share?Do the UID's of the Nagios user have to match on both nodes or is there a way to use NFS so the nagios user on node2 can have the proper access?
Code: Select all
su - nagios
cd /backups/backups2
touch file
rm file-
krobertson71
- Posts: 444
- Joined: Tue Feb 11, 2014 10:16 pm
Re: backups not functioning after update to 1.4.0
Yes this works, but when you do it on 02, since the UID for nagios is different than on 01, is assigned to another user, we have to give the directory WORLD read and write to pull it off. We can't leave it like this.
The log server installer grabs whatever available UID and assigns it to the nagios user. This is what has caused the issue. Nothing in the documentation for upgrading, or installing, states anything about matching UID (Granted this is a known for NFS), and by following your directions creates this issue.
So does the Nagios user need to be created prior to install? Do we need to create the nagcmd and nagios groups ahead of time when creating a node before we run the installer?
Trying to figure out a way out of this so we have backups again, as mentioned in other threads, without having to wipe and start over.
Again this leads back to documentation for installation that does not stress this point. If the cluster already exists, for us who had the older version with acknowledged backup issues, and we upgrade to a version that now requries NFS and now checks for this fact, it did not before as backups would run if it had local write access, we are stuck between a rock and a hard place.
Would "Breaking" the cluster and rebuilding help? This goes to my question about creating users ahead of time. I could remove the 02 node from the cluster, since all the events are still on 01, and rebuild 02 and create the nagios user first before installing.
If this is possible, what is the proper method for remove a node from the cluster so Log server knows it is now only a single node installation?
Help.
The log server installer grabs whatever available UID and assigns it to the nagios user. This is what has caused the issue. Nothing in the documentation for upgrading, or installing, states anything about matching UID (Granted this is a known for NFS), and by following your directions creates this issue.
So does the Nagios user need to be created prior to install? Do we need to create the nagcmd and nagios groups ahead of time when creating a node before we run the installer?
Trying to figure out a way out of this so we have backups again, as mentioned in other threads, without having to wipe and start over.
Again this leads back to documentation for installation that does not stress this point. If the cluster already exists, for us who had the older version with acknowledged backup issues, and we upgrade to a version that now requries NFS and now checks for this fact, it did not before as backups would run if it had local write access, we are stuck between a rock and a hard place.
Would "Breaking" the cluster and rebuilding help? This goes to my question about creating users ahead of time. I could remove the 02 node from the cluster, since all the events are still on 01, and rebuild 02 and create the nagios user first before installing.
If this is possible, what is the proper method for remove a node from the cluster so Log server knows it is now only a single node installation?
Help.
Re: backups not functioning after update to 1.4.0
From the backup document: https://assets.nagios.com/downloads/nag ... enance.pdf
The issue with using local repositories is not that it won't work - it will work fine. The issue is that if you're using local repositories you'll have an independent backup on each node in your cluster instead of one single backup for the entire cluster.
Run the following as root:
After the above has been run, you should be good to start all of the processes once more:
It has always been this way, and it's been a hard requirement since we built the system. We didn't add any additional enforcement in the newer versions of Nagios Log Server, so I'm truly not sure what's causing the problem you're experiencing.This location must be a shared network path writable by the nagios user and available to ALL instances in your cluster.
The issue with using local repositories is not that it won't work - it will work fine. The issue is that if you're using local repositories you'll have an independent backup on each node in your cluster instead of one single backup for the entire cluster.
Possibly, but it may be easier to just change the UID of the 'nagios' user.Would "Breaking" the cluster and rebuilding help?
Run the following as root:
Code: Select all
service logstash stop
service elasticsearch stop
service crond stop
killall -u nagios
usermod -u <NEWUID> nagios
groupmod -g <NEWGID> nagios
find / -user <OLDUID> -exec chown -h <NEWUID> {} \;
find / -group <OLDGID> -exec chgrp -h <NEWGID> {} \;
usermod -g <NEWGID> nagios
Code: Select all
service logstash start
service elasticsearch start
service crond start-
krobertson71
- Posts: 444
- Joined: Tue Feb 11, 2014 10:16 pm
Re: backups not functioning after update to 1.4.0
I think the issue here is that I am not explaining myself properly.
Let's say I am building a third NLS node. I run the installer. It does not prompt for a UID or GID so they can match other nodes in the cluster. The installer grabs what UID is next line to be used. This would almost ensure a UID will be grabbed that is different than the others.
Why I was asking if we could create the users ahead of time before running the installer.
Let's say I am building a third NLS node. I run the installer. It does not prompt for a UID or GID so they can match other nodes in the cluster. The installer grabs what UID is next line to be used. This would almost ensure a UID will be grabbed that is different than the others.
Why I was asking if we could create the users ahead of time before running the installer.
-
krobertson71
- Posts: 444
- Joined: Tue Feb 11, 2014 10:16 pm
Re: backups not functioning after update to 1.4.0
Oh and I forgot to mention this and this probably will make a difference.
I do not have Root to these servers. We have a Linux admin team that manages all our Linux hosts. I have to hand over your installation documentation to them so they can build the server.
Now refer back to my last post and see if that fits together better.
I do not have Root to these servers. We have a Linux admin team that manages all our Linux hosts. I have to hand over your installation documentation to them so they can build the server.
Now refer back to my last post and see if that fits together better.