We were able to resolve the log rotate issue but our attempts to upgrade to 5.7 failed and it appears backups/DB corruption is the cause. Here is some of the info we've gathered so far. Any help you can provide with troubleshooting would be appreciated.
A manual backup starts and we see activity in top but once it finishes no files are present. The output below is from the backup logs. There are more of the same lines but this should show all the data available
05-14-2017 00:00:03 DEBUG: Running scheduled local backup ...
05-14-2017 00:00:03 INFO: Creating a local backup: nagiosxi.1494734403
05-14-2017 00:00:03 DEBUG: Sending create local backup command to CmdSubsystem
05-14-2017 00:08:41 INFO: Too many backups! Limit is 4. Removing: nagiosxi.14923 15202.tar.gz before proceeding with backup.
Error backing up MySQL database 'nagios' - check the password in this script!
Error backing up MySQL database 'nagios' - check the password in this script!
Error backing up MySQL database 'nagios' - check the password in this script!
Error backing up MySQL database 'nagios' - check the password in this script!
Running the database repair script fails with the output below. This is just a sample, there are many other mibs listed but it gives you the idea.
Cannot adopt OID in JUNIPER-SRX5000-SPU-MONITORING-MIB: jnxJsSPUMonitoringMaxCPSession ::= { jnxJsSPUMonitoringObjectsEntry 9 }
Cannot adopt OID in JUNIPER-SRX5000-SPU-MONITORING-MIB: jnxJsSPUMonitoringNodeIndex ::= { jnxJsSPUMonitoringObjectsEntry 10 }
Cannot adopt OID in JUNIPER-SRX5000-SPU-MONITORING-MIB: jnxJsSPUMonitoringNodeDescr ::= { jnxJsSPUMonitoringObjectsEntry 11 }
Cannot adopt OID in JUNIPER-SRX5000-SPU-MONITORING-MIB: jnxJsSPUMonitoringFlowSessIPv4 ::= { jnxJsSPUMonitoringObjectsEntry 12 }
Cannot adopt OID in JUNIPER-SRX5000-SPU-MONITORING-MIB: jnxJsSPUMonitoringFlowSessIPv6 ::= { jnxJsSPUMonitoringObjectsEntry 13 }
Cannot adopt OID in JUNIPER-SRX5000-SPU-MONITORING-MIB: jnxJsSPUMonitoringCPSessIPv4 ::= { jnxJsSPUMonitoringObjectsEntry 14 }
Cannot adopt OID in JUNIPER-SRX5000-SPU-MONITORING-MIB: jnxJsSPUMonitoringCPSessIPv6 ::= { jnxJsSPUMonitoringObjectsEntry 15 }
/usr/local/nagiosxi/scripts/repair_databases.lock already exists. Perhaps a repair is already in process ..aborting
Here is what we find in the UI
Thank you in advance for your help.
Unable to backup server, DB issues
Unable to backup server, DB issues
You do not have the required permissions to view the files attached to this post.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Unable to backup server, DB issues
It thinks a backup is already in the process of running
Can you show the output of the following?
Can you show the output of the following?
Code: Select all
ls -l /usr/local/nagiosxi/scripts/repair_databases.lock
df -hRe: Unable to backup server, DB issues
Here is the output you requested.
-rwxr-xr-x 1 nagios nagios 0 Oct 20 2015 /usr/local/nagiosxi/scripts/repair_databases.lock
Filesystem Size Used Avail Use% Mounted on
devtmpfs 3.8G 0 3.8G 0% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 3.9G 218M 3.6G 6% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/sda4 295G 30G 265G 11% /
/dev/sda2 1014M 201M 814M 20% /boot
/dev/sda1 200M 12M 189M 6% /boot/efi
tmpfs 779M 0 779M 0% /run/user/1000
tmpfs 779M 0 779M 0% /run/user/0
-rwxr-xr-x 1 nagios nagios 0 Oct 20 2015 /usr/local/nagiosxi/scripts/repair_databases.lock
Filesystem Size Used Avail Use% Mounted on
devtmpfs 3.8G 0 3.8G 0% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 3.9G 218M 3.6G 6% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/sda4 295G 30G 265G 11% /
/dev/sda2 1014M 201M 814M 20% /boot
/dev/sda1 200M 12M 189M 6% /boot/efi
tmpfs 779M 0 779M 0% /run/user/1000
tmpfs 779M 0 779M 0% /run/user/0
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Unable to backup server, DB issues
ahh, this is a really old lock file, lets just remove it
then try to proceed with your backup
Code: Select all
rm -f /usr/local/nagiosxi/scripts/repair_databases.lockRe: Unable to backup server, DB issues
The backup failed again after the removal of the lock file but I ran the DB repair script again and it was able to complete. After doing that the backup completed successfully. I went back to try and run the update again from the UI and it is just showing "Update in progress. Please wait. Update may take a few minutes.". It continues to show the same thing even after a reboot.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Unable to backup server, DB issues
In some environments a manual upgrade is requireddshearon wrote:The backup failed again after the removal of the lock file but I ran the DB repair script again and it was able to complete. After doing that the backup completed successfully. I went back to try and run the update again from the UI and it is just showing "Update in progress. Please wait. Update may take a few minutes.". It continues to show the same thing even after a reboot.
https://assets.nagios.com/downloads/nag ... ctions.pdf
I would suggest doing this, once complete you can reset the upgrade status page following these instructions
https://support.nagios.com/kb/article/n ... e-851.html
Re: Unable to backup server, DB issues
I ran the upgrade script manually after posting that and everything appeared to work fine. I think we are back in business so you can lock the thread. Thank you for your help!
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Unable to backup server, DB issues
Awesome!dshearon wrote:I ran the upgrade script manually after posting that and everything appeared to work fine. I think we are back in business so you can lock the thread. Thank you for your help!
Locking thread