Nagios XI database failure - restore not working
Posted: Thu Jan 03, 2013 4:50 pm
Hello,
We're running Nagios XI version 2011R3.3.
At some point in the past week it failed. I've been ill with pneumonia, and can't tell exactly when it failed, but the symptoms are that Nagios is running, all my hosts/services are defined in Nagios Core, but XI doesn't know of anything to monitor. I figured it was a database problem.
I attempted a restore from December 20, a date when I know it was working, using the following command:
# /usr/local/nagiosxi/scripts/restore_xi.sh /store/backups/nagiosxi/1356009682.tar.gz
The restore is hanging attempting to restore the MySQL databases:
TS=1357248362
Extracting backup to /store/backups/nagiosxi/1357248362-restore...
In /store/backups/nagiosxi/1357248362-restore/1356009682...
Backup files look okay. Preparing to restore...
Shutting down services...
Stopping nagios: ..........
Warning - nagios did not exit in a timely manner
Stopping ndo2db: done.
NPCD Stopped.
Restoring directories to /...
Restoring Nagios Core...
rm: cannot remove `/usr/local/nagios': Device or resource busy
Restoring Nagios XI...
Restoring NagiosQL...
Restoring NagiosQL backups...
Restoring MySQL databases...
Further, I've found multiple instances of the dbmaint utility running (not sure if this is relevant or not):
nagios 2693 0.0 0.0 2944 956 ? Ss 09:05 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 2699 0.0 0.3 34824 14908 ? S 09:05 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 3228 0.0 0.0 2944 952 ? Ss Jan02 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 3230 0.0 0.3 34824 14804 ? S Jan02 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 4550 0.0 0.0 2944 948 ? Ss Jan02 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 4558 0.0 0.3 34824 14896 ? S Jan02 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 5493 0.0 0.0 2944 952 ? Ss 01:35 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 5502 0.0 0.3 34824 14800 ? S 01:35 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 6204 0.0 0.0 2944 952 ? Ss Jan02 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 6207 0.0 0.3 34824 14804 ? S Jan02 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 6577 0.0 0.0 2944 948 ? Ss Jan02 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 6584 0.0 0.3 34824 14800 ? S Jan02 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 6894 0.0 0.0 2944 1028 ? Ss Jan02 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 6899 0.0 0.3 34824 14896 ? S Jan02 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 7329 0.0 0.0 2944 948 ? Ss Jan02 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 7336 0.0 0.3 34824 14804 ? S Jan02 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 9046 0.0 0.0 2944 948 ? Ss Jan02 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 9049 0.0 0.3 34824 14928 ? S Jan02 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 10313 0.0 0.0 2944 1032 ? Ss Jan02 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 10317 0.0 0.3 34824 14800 ? S Jan02 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 11515 0.0 0.0 2944 952 ? Ss 02:10 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 11523 0.0 0.3 34824 14800 ? S 02:10 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 13248 0.0 0.0 2944 944 ? Ss 10:25 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 13253 0.0 0.3 34824 14784 ? S 10:25 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 14769 0.0 0.0 2944 956 ? Ss Jan02 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 14778 0.0 0.3 34824 14924 ? S Jan02 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 15071 0.0 0.0 2944 948 ? Ss Jan02 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
...
The machine is running Centos 6.0 (Linux version 2.6.32-71.el6.i686 ([email protected]) (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC) ) #1 SMP Fri Nov 12 04:17:17 GMT 2010), running in 32-bit mode.
It is a stand-alone Nagios installation.
Although a long-time Unix user, I have virtually no Linux experience and need help debugging this.
Thanks.
We're running Nagios XI version 2011R3.3.
At some point in the past week it failed. I've been ill with pneumonia, and can't tell exactly when it failed, but the symptoms are that Nagios is running, all my hosts/services are defined in Nagios Core, but XI doesn't know of anything to monitor. I figured it was a database problem.
I attempted a restore from December 20, a date when I know it was working, using the following command:
# /usr/local/nagiosxi/scripts/restore_xi.sh /store/backups/nagiosxi/1356009682.tar.gz
The restore is hanging attempting to restore the MySQL databases:
TS=1357248362
Extracting backup to /store/backups/nagiosxi/1357248362-restore...
In /store/backups/nagiosxi/1357248362-restore/1356009682...
Backup files look okay. Preparing to restore...
Shutting down services...
Stopping nagios: ..........
Warning - nagios did not exit in a timely manner
Stopping ndo2db: done.
NPCD Stopped.
Restoring directories to /...
Restoring Nagios Core...
rm: cannot remove `/usr/local/nagios': Device or resource busy
Restoring Nagios XI...
Restoring NagiosQL...
Restoring NagiosQL backups...
Restoring MySQL databases...
Further, I've found multiple instances of the dbmaint utility running (not sure if this is relevant or not):
nagios 2693 0.0 0.0 2944 956 ? Ss 09:05 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 2699 0.0 0.3 34824 14908 ? S 09:05 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 3228 0.0 0.0 2944 952 ? Ss Jan02 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 3230 0.0 0.3 34824 14804 ? S Jan02 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 4550 0.0 0.0 2944 948 ? Ss Jan02 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 4558 0.0 0.3 34824 14896 ? S Jan02 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 5493 0.0 0.0 2944 952 ? Ss 01:35 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 5502 0.0 0.3 34824 14800 ? S 01:35 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 6204 0.0 0.0 2944 952 ? Ss Jan02 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 6207 0.0 0.3 34824 14804 ? S Jan02 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 6577 0.0 0.0 2944 948 ? Ss Jan02 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 6584 0.0 0.3 34824 14800 ? S Jan02 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 6894 0.0 0.0 2944 1028 ? Ss Jan02 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 6899 0.0 0.3 34824 14896 ? S Jan02 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 7329 0.0 0.0 2944 948 ? Ss Jan02 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 7336 0.0 0.3 34824 14804 ? S Jan02 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 9046 0.0 0.0 2944 948 ? Ss Jan02 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 9049 0.0 0.3 34824 14928 ? S Jan02 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 10313 0.0 0.0 2944 1032 ? Ss Jan02 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 10317 0.0 0.3 34824 14800 ? S Jan02 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 11515 0.0 0.0 2944 952 ? Ss 02:10 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 11523 0.0 0.3 34824 14800 ? S 02:10 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 13248 0.0 0.0 2944 944 ? Ss 10:25 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 13253 0.0 0.3 34824 14784 ? S 10:25 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 14769 0.0 0.0 2944 956 ? Ss Jan02 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
nagios 14778 0.0 0.3 34824 14924 ? S Jan02 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php
nagios 15071 0.0 0.0 2944 948 ? Ss Jan02 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
...
The machine is running Centos 6.0 (Linux version 2.6.32-71.el6.i686 ([email protected]) (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC) ) #1 SMP Fri Nov 12 04:17:17 GMT 2010), running in 32-bit mode.
It is a stand-alone Nagios installation.
Although a long-time Unix user, I have virtually no Linux experience and need help debugging this.
Thanks.