Database Connection Error

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
grenley
Posts: 96
Joined: Tue May 13, 2014 6:06 pm

Database Connection Error

Post by grenley »

Hi.

Our SA did a hard reboot of two of our XI servers and we are getting intermittent database connection issues.
The message(s) goes a way for a bit and then returns.
I know this issue is covered in many forum topics.
I've tried the recommended remedy (bring down mysql and run repair_database.sh as root) and it seemed to work, but then some time later the messages returned.
Beneath the messages, which look like this...
Message: A database connection error has been detected, we are attempting to repair the server, if the repair does not resolve the issue, please contact Nagios support.

Run the following from the CLI as root to attempt to repair the DB

/opt/app/nagios/nagiosxi/scripts/repair_databases.sh
The connected agents seem to be processing and updating.

We've bounced XI along with all related components, but no luck.

A corrupted record, perhaps?
What can we do to troubleshoot and resolve?

Thanks,
Rick
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Database Connection Error

Post by abrist »

grenley wrote:We've bounced XI along with all related components, but no luck.
If this means rebooting, make sure to shut down ndo2db and mysqld before doing so an unsafe reboot is most common way to cause crashed tables.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
grenley
Posts: 96
Joined: Tue May 13, 2014 6:06 pm

Re: Database Connection Error

Post by grenley »

Lesson learned.
What's the most common way to fix crashed tables? (besides the repair_databases.sh script)
Are you saying to bring mysqld and ndo2db down and do a soft reboot?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Database Connection Error

Post by scottwilkerson »

You can also login to mysql DB and do a

Code: Select all

mysql -pnagiosxi nagios
REPAIR TABLE table_name;
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
grenley
Posts: 96
Joined: Tue May 13, 2014 6:06 pm

Re: Database Connection Error

Post by grenley »

How do we know what table name needs to be repaired?
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Database Connection Error

Post by abrist »

The mysqld log should contain related errors:

Code: Select all

tail -25 /var/log/mysqld.log
grenley wrote:What's the most common way to fix crashed tables? (besides the repair_databases.sh script)
Is there a reason why you don't want to run our repair script?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
grenley
Posts: 96
Joined: Tue May 13, 2014 6:06 pm

Re: Database Connection Error

Post by grenley »

perhaps you missed this in my original question...
I've tried the recommended remedy (bring down mysql and run repair_database.sh as root) and it seemed to work, but then some time later the messages returned.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Database Connection Error

Post by abrist »

Ah, my apologies. Our script just runs the repair command suggested by Scott above for each table.

Let us know if you continue to have issues with crashed tables - be careful when you bounce the server!
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
grenley
Posts: 96
Joined: Tue May 13, 2014 6:06 pm

Re: Database Connection Error

Post by grenley »

Yes. We are continuing to have problems. The Event Log shows errors on systems that don't even show up in the Hosts list.
When we try to delete these systems from CCM, we get another db error and then it show there are no hosts. There are a couple of dozen hosts.
The mysql db is essentially useless.
What can we do?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Database Connection Error

Post by scottwilkerson »

Lets start by seeing which tables you are still having trouble with as there are several DB's in Nagios XI

Code: Select all

tail -f /var/log/mysqld.log
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked