Page 1 of 3

Unable to perform restore after 5.6.13

Posted: Mon Apr 13, 2020 3:15 pm
by rferebee
Good afternoon,

I updated our XI environment to 5.6.13 last Thursday and since that time we have been unable to perform restores using system backups.

Everyday we create a backup of our Production server, copy the backup to our Failover server and then restore the backup of Prod to the Failover to ensure it gets any changes that may have occurred the prior day. Since updating to 5.6.13, the SSH transfer of the Prod backup works, but when the Failover system tries to perform the restore it fails and all of the databases go offline.

Here's the script we're using to perform the restore:

Code: Select all

root@nagiosxi-lv:/home/nagios/scripts> cat failover_restore
#!/bin/bash
#
#copy mangebackups data
#cp /usr/local/nagiosxi/html/includes/components/scheduledbackups/managebackups.php /home/nagios/scripts/managebackups_html.php
#cp /usr/local/nagiosxi/tmp/nagiosxi/nagiosxi/basedir/html/includes/components/scheduledbackups/managebackups.php /home/nagios/scripts/managebackups_tmp.php
#
#executes the restore program, uses the ls command to get the name of the backup file and attaches that to the path
/usr/local/nagiosxi/scripts/restore_xi.sh /store/backups/nagiosxi/restore/`ls /store/backups/nagiosxi/restore`
#
#restore managebackups data
#cp /home/nagios/scripts/managebackups_html.php /usr/local/nagiosxi/html/includes/components/scheduledbackups/managebackups.php
#cp /home/nagios/scripts/managebackups_tmp.php /usr/local/nagiosxi/tmp/nagiosxi/nagiosxi/basedir/html/includes/components/scheduledbackups/managebackups.php
Can you help me figure out what's changed? Thank you.

Re: Unable to perform restore after 5.6.13

Posted: Mon Apr 13, 2020 4:34 pm
by ssax
what is the output of this command?

Code: Select all

ls /store/backups/nagiosxi/restore
That will likely not do what you want if there's more than one file in there.

Also, I'm not sure what you're trying to do with that managebackups.php file, that doesn't contain your locations/etc, those are stored in the DB.

Re: Unable to perform restore after 5.6.13

Posted: Mon Apr 13, 2020 4:52 pm
by rferebee
The output returns just one file, like there normally is.

Code: Select all

root@nagiosxi-lv:/root> ls /store/backups/nagiosxi/restore
nagiosxi.1586768343.tar.gz
I don't know what the managebackups.php part does either, but what I can say is that we've been using the same failover_restore script for almost 2 years and it's been working right up to the point I updated to 5.6.13.

Anything else I can look at? Thank you.

Re: Unable to perform restore after 5.6.13

Posted: Mon Apr 13, 2020 4:59 pm
by ssax
If you run the script manually, what errors are returned?

Re: Unable to perform restore after 5.6.13

Posted: Tue Apr 14, 2020 9:18 am
by rferebee
This is what happens when I run the restore manually:

Code: Select all

root@nagiosxi-lv:/root> sh /home/nagios/scripts/failover_restore
TS=1586873667
Extracting backup to /store/backups/nagiosxi/1586873667-restore...
In /store/backups/nagiosxi/1586873667-restore/nagiosxi.1586854750...
Backup files look okay.  Preparing to restore...
Shutting down services...
Warning: npcd.service changed on disk. Run 'systemctl daemon-reload' to reload units.
Restoring directories to /...
Restoring Nagios Core...
Restoring Nagios XI...
Restoring NRDP backups...
Restoring MRTG...
Restoring SNMP configuration files...
Restoring SNMP MIBs...
Restoring Nagvis backups...
Restoring nagios home dir...
Restoring MySQL databases...
Restoring Nagios XI PostgresQL database...
ERROR:  role "nagiosxi" already exists

Re: Unable to perform restore after 5.6.13

Posted: Tue Apr 14, 2020 11:43 am
by ssax
Please send this file:

Code: Select all

/usr/local/nagiosxi/scripts/restore_xi.sh

Re: Unable to perform restore after 5.6.13

Posted: Tue Apr 14, 2020 11:53 am
by rferebee
PM sent with requested file.

Another strange thing, I can no longer login to the web GUI after the host tried to restore this morning. Each time I try, the login page just refreshed with no errors. I tried with a local account as well as multiple AD accounts.

Re: Unable to perform restore after 5.6.13

Posted: Tue Apr 14, 2020 11:57 am
by ssax
Please try changing the very first line of this file:

Code: Select all

/usr/local/nagiosxi/scripts/restore_xi.sh
From this:

Code: Select all

#!/bin/bash -e
To this:

Code: Select all

#!/bin/bash
Then try again.


Did the restore fail?

You can troubleshoot login issues via:

Code: Select all

tail -Fn0 /var/log/httpd/error_log /var/log/httpd/ssl_error_log

Re: Unable to perform restore after 5.6.13

Posted: Tue Apr 14, 2020 1:39 pm
by rferebee
Ok, there was another issue occurring simultaneously while I was trying to restore. The system ran out of disk space at some point. I think that was preventing me from logging into the web UI.

I removed the '-e' per your suggestion and the restore worked (after clearing up disk space of course). So, I'm not 100% if it was the disk being out of space or the change to the script. However, since I restored Prod onto my backup server, the script was over written to the syntax it was before (from Prod).

Should I make that change on my Prod box, so the script doesn't revert back everyday? I honestly don't know what removing the -e does.

Re: Unable to perform restore after 5.6.13

Posted: Wed Apr 15, 2020 10:34 am
by ssax
If it works, I'd just leave it.

If it doesn't work, update prod and let me know so that I can let the devs know.