Page 1 of 4

Database Connection Error

Posted: Mon Feb 16, 2015 5:59 pm
by grenley
Hi.

Our SA did a hard reboot of two of our XI servers and we are getting intermittent database connection issues.
The message(s) goes a way for a bit and then returns.
I know this issue is covered in many forum topics.
I've tried the recommended remedy (bring down mysql and run repair_database.sh as root) and it seemed to work, but then some time later the messages returned.
Beneath the messages, which look like this...
Message: A database connection error has been detected, we are attempting to repair the server, if the repair does not resolve the issue, please contact Nagios support.

Run the following from the CLI as root to attempt to repair the DB

/opt/app/nagios/nagiosxi/scripts/repair_databases.sh
The connected agents seem to be processing and updating.

We've bounced XI along with all related components, but no luck.

A corrupted record, perhaps?
What can we do to troubleshoot and resolve?

Thanks,
Rick

Re: Database Connection Error

Posted: Mon Feb 16, 2015 6:04 pm
by abrist
grenley wrote:We've bounced XI along with all related components, but no luck.
If this means rebooting, make sure to shut down ndo2db and mysqld before doing so an unsafe reboot is most common way to cause crashed tables.

Re: Database Connection Error

Posted: Mon Feb 16, 2015 6:51 pm
by grenley
Lesson learned.
What's the most common way to fix crashed tables? (besides the repair_databases.sh script)
Are you saying to bring mysqld and ndo2db down and do a soft reboot?

Re: Database Connection Error

Posted: Tue Feb 17, 2015 9:11 am
by scottwilkerson
You can also login to mysql DB and do a

Code: Select all

mysql -pnagiosxi nagios
REPAIR TABLE table_name;

Re: Database Connection Error

Posted: Tue Feb 17, 2015 11:27 am
by grenley
How do we know what table name needs to be repaired?

Re: Database Connection Error

Posted: Tue Feb 17, 2015 11:31 am
by abrist
The mysqld log should contain related errors:

Code: Select all

tail -25 /var/log/mysqld.log
grenley wrote:What's the most common way to fix crashed tables? (besides the repair_databases.sh script)
Is there a reason why you don't want to run our repair script?

Re: Database Connection Error

Posted: Tue Feb 17, 2015 11:37 am
by grenley
perhaps you missed this in my original question...
I've tried the recommended remedy (bring down mysql and run repair_database.sh as root) and it seemed to work, but then some time later the messages returned.

Re: Database Connection Error

Posted: Tue Feb 17, 2015 11:39 am
by abrist
Ah, my apologies. Our script just runs the repair command suggested by Scott above for each table.

Let us know if you continue to have issues with crashed tables - be careful when you bounce the server!

Re: Database Connection Error

Posted: Tue Feb 17, 2015 11:46 am
by grenley
Yes. We are continuing to have problems. The Event Log shows errors on systems that don't even show up in the Hosts list.
When we try to delete these systems from CCM, we get another db error and then it show there are no hosts. There are a couple of dozen hosts.
The mysql db is essentially useless.
What can we do?

Re: Database Connection Error

Posted: Tue Feb 17, 2015 11:54 am
by scottwilkerson
Lets start by seeing which tables you are still having trouble with as there are several DB's in Nagios XI

Code: Select all

tail -f /var/log/mysqld.log