Page 2 of 3
Re: Unable to perform restore after 5.6.13
Posted: Thu Apr 16, 2020 11:22 am
by rferebee
I made the change to the file on my Prod box so it wouldn't keep overwriting the file on the failover box. The restore ran this morning and now I'm unable to access my failover environment via the web UI.
The error I'm seeing on the service checks for that host is: Error: Could not parse XML from
http://10.231.86.58/nagiosxi/ ()
I'm not sure what happened or where to even start to look to figure it out.
Re: Unable to perform restore after 5.6.13
Posted: Thu Apr 16, 2020 12:10 pm
by rferebee
Ok, I have a feeling that after the restore this morning, my failover host thought it was my prod host. There were muted service checks on the prod host sending out notifications and I think in actuality they were coming from the failover host.
To stop this as quickly as possible, I ended up reverting the changes to the restore_xi.sh files on both hosts and then restoring a backup to my failover host from two days ago prior to the change. Everything seems to be back to normal now.
Another thing I noticed is that when I run the failover_restore.sh script manually, it causes a lot of drive space to get used. I went from 40%+ free space to only 6% free. I don't know what the heck changed with this last update, but my environment is not happy.
Re: Unable to perform restore after 5.6.13
Posted: Thu Apr 16, 2020 3:16 pm
by ssax
The restore file is big, the restore itself extracts it (uncompressed, then some other things get extracted even further), then mysql temporary files are created to restore the DB. I would usually recommend you keep at least 4x the size of a restore file free space available for proper restore.
That -e in the restore script causes issues, the devs have reverted that change because it caused issues.
Re: Unable to perform restore after 5.6.13
Posted: Thu Apr 16, 2020 3:53 pm
by rferebee
Do the extracted files get removed once the restore is complete? I think if that does happen, it's not happening for some reason in my environment. It's filling up the drive and then leaving the redundant data.
Re: Unable to perform restore after 5.6.13
Posted: Thu Apr 16, 2020 4:47 pm
by ssax
Yes, it should be cleaned up automatically (if it doesn't fail).
Check here:
It should say
XXXXXXX-restore, they can be safely deleted, if your restores failed it likely left them behind.
Re: Unable to perform restore after 5.6.13
Posted: Thu Apr 16, 2020 6:00 pm
by rferebee
I'm seeing a nagiosxi directory in /tmp for some reason. It seems to have all the files in it that would typically be placed during a restore. I think that's what's taking up all the space.
Re: Unable to perform restore after 5.6.13
Posted: Fri Apr 17, 2020 10:40 am
by ssax
Likely, feel free to remove any non-backup files in /store/backups/nagiosxi.
Re: Unable to perform restore after 5.6.13
Posted: Fri Apr 17, 2020 5:55 pm
by rferebee
I'd like to escalate this issue for next week. I need someone from Nagios Support to connect with me and figure out what's going on.
My restores aren't working at all since updating to 5.6.13. After a restore all the daemons crash and I have to manually reboot the host and more importantly, the restore isn't actually occurring.
Thank you! Have a great weekend.
Re: Unable to perform restore after 5.6.13
Posted: Mon Apr 20, 2020 1:08 pm
by benjaminsmith
Hi
@rferebee,
Please open a ticket for this issue to get a remote session booked and reference this forum topic.
Thank you.
Benjamin
Re: Unable to perform restore after 5.6.13
Posted: Wed May 13, 2020 9:22 am
by rferebee
This thread can be locked. The issue is resolved. Thank you.