Unable to Access Nagios XI Interface - Database Connection Error

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Post Reply
smoren
Posts: 59
Joined: Tue Sep 29, 2015 7:24 am

Unable to Access Nagios XI Interface - Database Connection Error

Post by smoren »

Hello,

I'm experiencing an unusual issue with our Nagios XI system. When navigating to the login page of Nagios XI, I encounter the following error:
Database Error
A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.
Run the following from the CLI as root to attempt to repair the DB:

/usr/local/nagiosxi/scripts/repair_databases.sh
I ran the suggested repair script, but it did not resolve the issue.

Notably, I can connect to the database from the command line using the mysql command, and the Nagios Core interface is functioning correctly. Monitoring and notifications appear to be working as expected; however, we are unable to access the Nagios XI interface.

Here are the system details:

OS: Red Hat Enterprise Linux 7.9
Database (local): 5.5.68-MariaDB
Nagios XI Version: 5.11.2

Any guidance on how to resolve this issue would be greatly appreciated.
Thanks.

Rene
User avatar
danderson
Posts: 234
Joined: Wed Aug 09, 2023 10:05 am

Re: Unable to Access Nagios XI Interface - Database Connection Error

Post by danderson »

Thanks for reaching out @smoren,

Can you still connect to the mysql database from the command line using the nagiosxi user and the passwords you get from /usr/local/nagiosxi/scripts/get_mysql_passwords.sh?
smoren
Posts: 59
Joined: Tue Sep 29, 2015 7:24 am

Re: Unable to Access Nagios XI Interface - Database Connection Error

Post by smoren »

Unfortunately, I do not have that script on my Nagios server instance. I tried to connect using the usernames and passwords defined in the file /usr/local/nagiosxi/html/config.inc.php. I successfully checked all three sets ($cfg['db_info'] for nagiosxi, ndoutils, and nagiosql).

I'm not sure if it helps, but I checked several log files in /usr/local/nagiosxi/var/. Many of them contain this message:

Code: Select all

Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (13)
Additionally, I confirmed that the file /var/lib/mysql/mysql.sock exists:

Code: Select all

srwxr-xr-x+ 1 mysql mysql 0 Jun 11 22:31 /var/lib/mysql/mysql.sock
User avatar
danderson
Posts: 234
Joined: Wed Aug 09, 2023 10:05 am

Re: Unable to Access Nagios XI Interface - Database Connection Error

Post by danderson »

In $cfg['db_info'], is the dbserver value for everything "localhost"?

Do you have any idea what might have triggered this error? Was it after an upgrade?
smoren
Posts: 59
Joined: Tue Sep 29, 2015 7:24 am

Re: Unable to Access Nagios XI Interface - Database Connection Error

Post by smoren »

It was not set for nagiosxi. The value was empty. I changed it to localhost and restarted the services, but it did not help.

There was no upgrade or any other maintenance. The only potential issue I can think of (though I have no proof so far) is that some automatic action may have temporarily filled up the root partition (which must have recovered automatically as well). Backups are on a separate partition.

However, I noticed that the permissions on the file /var/lib/mysql/mysql.sock seem to be unusual. After restarting the server (or just MariaDB), the file is created with permissions: srwxr-xr-x+. This is when I get the error mentioned in the original post.

If I execute following command, everything starts working again. Until the next restart, when the file is recreated with the 'wrong' permissions.

Code: Select all

chmod go+w /var/lib/mysql/mysql.sock
If you think this could really be the root cause, how to set those permissions permanently?
Thanks.
jsimon
Posts: 318
Joined: Wed Aug 23, 2023 11:27 am

Re: Unable to Access Nagios XI Interface - Database Connection Error

Post by jsimon »

I believe the Linux ACL (Access Control List) is what determines default permissions for a file on creation, if it isn't specified in a mysql config somewhere, and I can't find that personally. I would recommend taking a snapshot of your system before trying the below steps. I would also note that this should not have resulted from our installation instructions. I would also recommend consulting whoever is in charge of setting your servers up as this may be a result of deliberate choices made. Proceed with caution!

Code: Select all

getfacl /var/lib/mysql
Try running the above (if your mysql.sock file is located elsewhere, change the file location in the command accordingly) and see what the output is. If it matches the permissions the file is being made with, I think the next step is to modify the ACL for that directory. I'm providing the next step in case this is true. If not, the setting may be controlled by a mysql config that I haven't located yet, so let's skip that in that case.

Code: Select all

setfacl -d -m group::rwx /var/lib/mysql
Repeat the above with "group" replaced with "other" to set default permissions for the /var/lib/mysql directory. Try restarting mysql and see if the file is created with the permissions we're hoping for.

Let us know how this goes, or if you have any related questions!
User avatar
danderson
Posts: 234
Joined: Wed Aug 09, 2023 10:05 am

Re: Unable to Access Nagios XI Interface - Database Connection Error

Post by danderson »

If you want to keep the ACL in place instead of effectively disabling it, you can go about adding the the nagios user and the apache user to the ACL

Code: Select all

setfacl -d -m nagios:rwx /var/lib/mysql/
setfacl -d -m apache:rwx /var/lib/mysql/
systemctl restart mariadb
smoren
Posts: 59
Joined: Tue Sep 29, 2015 7:24 am

Re: Unable to Access Nagios XI Interface - Database Connection Error

Post by smoren »

Hello everyone,

Thanks for the tips about ACL. I configured it based on your suggestions, and it works as expected.

Unfortunately, I have encountered another problem. :-( After restarting the whole server, I can access the Nagios XI login screen and log in successfully. However, it seems to have trouble communicating with the database. For example, on the Service Status page, the "Last check" is not updating for any service, there are no new events in the Event log (XI interface), and no new notifications on the Notification page (again, XI interface). When I checked the equivalent pages in the Nagios Core interface, everything works as expected.

I executed the following command:

Code: Select all

/usr/local/nagiosxi/scripts/repair_databases.sh
This helped. The Nagios XI interface showed new events in the event log, a list of notifications, the last check was correctly updated, etc. So, I was happy...until the next restart.

I also checked data directly in database in these tables (after reboot, before running repair_databases.sh):
- nagios.nagios_notifications - there are no new data
- nagiosxi.xi_events - new data are present

Do you have any ideas? I'd like to avoid running repair_databases.sh after every reboot of the server. :)

Thanks.
User avatar
danderson
Posts: 234
Joined: Wed Aug 09, 2023 10:05 am

Re: Unable to Access Nagios XI Interface - Database Connection Error

Post by danderson »

I'm wondering if this is because something is breaking your databases.

I also noticed that the repair_databases script will set the sort buffer size to 256M, so I'm curious to see what the default is for you. For me, I was able to find and set sort_buffer_size=256M in /etc/my.cnf.d/mysql-server.cnf but it may be different for your OS. Default options are read from the following files in the given order: /etc/my.cnf /etc/mysql/my.cnf ~/.my.cnf Stop and restart mysql after making those changes.

Let me know if this helps.
jsimon
Posts: 318
Joined: Wed Aug 23, 2023 11:27 am

Re: Unable to Access Nagios XI Interface - Database Connection Error

Post by jsimon »

In addition to @danderson's reply, I would suggest adding this to your /etc/my.cnf under the [mysqld] section (if it isn't already there):

Code: Select all

max_allowed_packet=512M
max_connections=1000
open_files_limit=4096
Make sure to restart your various mariadb and related services after making this change:

Code: Select all

systemctl stop crond
systemctl stop npcd
systemctl stop nagios 
systemctl restart mariadb || systemctl restart mysqld
systemctl start nagios
systemctl start npcd
systemctl start crond
systemctl start httpd
I am also curious to know if you are rebooting your VM with the command line or via powering down the VM directly from your VM manager. If you are doing the latter, try seeing if using

Code: Select all

reboot
or

Code: Select all

init 6
from the CLI has a different outcome.
Post Reply