Database Error

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
SavaSC
Posts: 238
Joined: Wed Feb 23, 2011 4:49 pm

Database Error

Post by SavaSC »

I updated the OS on our Naigos instances this morning. One of them came back fine. The other one started throwing a DB error. Here it is:
Databse Error
A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.
Run the following from the CLI as root to attempt to repair the DB:
/usr/local/nagiosxi/scripts/repair_databases.sh
I have run the repair multiple times. I have done a DB sweep multiple times. I have even rebooted the box a couple of times. Still shows an error.

When I run service postgres status I get the following
[root@APNagiosXI ~]# service postgresql status
Redirecting to /bin/systemctl status postgresql.service
â postgresql.service - PostgreSQL database server
Loaded: loaded (/usr/lib/systemd/system/postgresql.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2018-01-08 10:29:55 EST; 10min ago
Process: 10538 ExecStop=/usr/bin/pg_ctl stop -D ${PGDATA} -s -m fast (code=exited, status=0/SUCCESS)
Process: 14916 ExecStart=/usr/bin/pg_ctl start -D ${PGDATA} -s -o -p ${PGPORT} -w -t 300 (code=exited, status=0/SUCCESS)
Process: 14911 ExecStartPre=/usr/bin/postgresql-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)
Main PID: 14920 (postgres)
CGroup: /system.slice/postgresql.service
ââ14920 /usr/bin/postgres -D /var/lib/pgsql/data -p 5432
ââ14921 postgres: logger process
ââ14923 postgres: checkpointer process
ââ14924 postgres: writer process
ââ14925 postgres: wal writer process
ââ14926 postgres: autovacuum launcher process
ââ14927 postgres: stats collector process
ââ14928 postgres: nagiosxi nagiosxi 127.0.0.1(59052) idle
ââ14931 postgres: nagiosxi nagiosxi 127.0.0.1(59054) idle
ââ14934 postgres: nagiosxi nagiosxi 127.0.0.1(59056) idle
ââ14937 postgres: nagiosxi nagiosxi 127.0.0.1(59058) idle
ââ14944 postgres: nagiosxi nagiosxi 127.0.0.1(59064) idle
ââ14950 postgres: nagiosxi nagiosxi 127.0.0.1(59068) idle
ââ14953 postgres: nagiosxi nagiosxi 127.0.0.1(59070) idle
ââ15067 postgres: nagiosxi nagiosxi 127.0.0.1(59096) idle
ââ15645 postgres: nagiosxi nagiosxi 127.0.0.1(59104) idle
ââ15822 postgres: nagiosxi nagiosxi 127.0.0.1(59110) idle
ââ18668 postgres: nagiosxi nagiosxi 127.0.0.1(59470) idle
ââ18671 postgres: nagiosxi nagiosxi 127.0.0.1(59472) idle
ââ18672 postgres: nagiosxi nagiosxi 127.0.0.1(59474) idle
ââ18673 postgres: nagiosxi nagiosxi 127.0.0.1(59476) idle
ââ18676 postgres: nagiosxi nagiosxi 127.0.0.1(59480) idle
ââ18699 postgres: nagiosxi nagiosxi 127.0.0.1(59488) idle

Jan 08 10:29:54 APNagiosXI.ltcsvc.com systemd[1]: Starting PostgreSQL database server...
Jan 08 10:29:55 APNagiosXI.ltcsvc.com systemd[1]: Started PostgreSQL database server.
Any ideas on what I can do to fix this?

Thanks.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Database Error

Post by npolovenko »

Hello, @SavaSC.
I know you've already done troubleshooting on your own but please follow this tutorial in case you've missed something: https://support.nagios.com/kb/article.php?id=25
Also, I'd like to see the system profile in order to go over some major log files.
To send us your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and attach it to your next post. Or you could upload it to the cloud storage of your choice and share a link with me in PM.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
SavaSC
Posts: 238
Joined: Wed Feb 23, 2011 4:49 pm

Re: Database Error

Post by SavaSC »

That was the page I did my initial troubleshooting from. I'll go through it again, just to make sure.

Here is the profile from that box.
profile (2).zip
You do not have the required permissions to view the files attached to this post.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Database Error

Post by npolovenko »

@SavaSC First, please run and share the output of these commands with us:

Code: Select all

service ndo2db start
service ndo2db status
After that please run the following command to repair the database:

Code: Select all

mysqlcheck -r -f -u root -pnagiosxi --all-databases
Please share the following log file with us:

Code: Select all

/var/lib/pgsql/data/pg_log/postgresql-Mon.log
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
SavaSC
Posts: 238
Joined: Wed Feb 23, 2011 4:49 pm

Re: Database Error

Post by SavaSC »

Here is the output.
[root@APNagiosXI ~]# service ndo2db start
Starting ndo2db (via systemctl): [ OK ]
[root@APNagiosXI ~]# service ndo2db status
ndo2db is not running but subsystem locked
Here is the requested log file.
postgresql-Mon.log
You do not have the required permissions to view the files attached to this post.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Database Error

Post by dwhitfield »

Regarding the instructions below, if you do not have killall, you can install it via the following command:
# yum install psmisc

If psmisc is not in your repos, then instead you can check to make sure nagios is not running with
# ps -aef | grep nagios

If that document does not resolve your issue, please run the following commands in order and report any errors. You ***must*** use mariadb instead of mysqld in the commands below, ***if*** you have mariadb.
# service nagios stop
# service ndo2db stop
# service mysqld stop
# service crond stop
# service httpd stop
# killall -9 nagios
# killall -9 ndo2db
# rm -f /usr/local/nagios/var/rw/nagios.cmd
# rm -f /usr/local/nagios/var/nagios.lock
# rm -f /usr/local/nagios/var/ndo.sock
# rm -f /usr/local/nagios/var/ndo2db.lock
# rm -f /usr/local/nagiosxi/var/reconfigure_nagios.lock
# for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
# service mysqld start
# service ndo2db start
# service nagios start
# service httpd start
# service crond start
SavaSC
Posts: 238
Joined: Wed Feb 23, 2011 4:49 pm

Re: Database Error

Post by SavaSC »

Here are the results
[root@APNagiosXI ~]# service nagios stop
Stopping nagios (via systemctl): [ OK ]
[root@APNagiosXI ~]# service ndo2db stop
Stopping ndo2db (via systemctl): [ OK ]
[root@APNagiosXI ~]# service mysqld stop
Redirecting to /bin/systemctl stop mysqld.service
Failed to stop mysqld.service: Unit mysqld.service not loaded.
[root@APNagiosXI ~]# service crond stop
Redirecting to /bin/systemctl stop crond.service
[root@APNagiosXI ~]# service httpd stop
Redirecting to /bin/systemctl stop httpd.service
[root@APNagiosXI ~]# killall -9 nagios
nagios: no process found
[root@APNagiosXI ~]# killall -9 ndo2db
ndo2db: no process found
[root@APNagiosXI ~]# rm -f /usr/local/nagios/var/rw/nagios.cmd
[root@APNagiosXI ~]# rm -f /usr/local/nagios/var/nagios.lock
[root@APNagiosXI ~]# rm -f /usr/local/nagios/var/ndo.sock
[root@APNagiosXI ~]# rm -f /usr/local/nagios/var/ndo2db.lock
[root@APNagiosXI ~]# rm -f /usr/local/nagiosxi/var/reconfigure_nagios.lock
[root@APNagiosXI ~]# for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
[root@APNagiosXI ~]# service mysqld start
Redirecting to /bin/systemctl start mysqld.service
Failed to start mysqld.service: Unit not found.
[root@APNagiosXI ~]# service ndo2db start
Starting ndo2db (via systemctl): [ OK ]
[root@APNagiosXI ~]# service nagos start
Redirecting to /bin/systemctl start nagos.service
Failed to start nagos.service: Unit not found.
[root@APNagiosXI ~]# service nagios start
Starting nagios (via systemctl): [ OK ]
[root@APNagiosXI ~]# service httpd start
Redirecting to /bin/systemctl start httpd.service
[root@APNagiosXI ~]# service crond start
Redirecting to /bin/systemctl start crond.service
[root@APNagiosXI ~]#
It seems to be saying that their isn't a mysqld service.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Database Error

Post by dwhitfield »

Your assessment is accurate. The good news is I have already likely given you the resolution.
dwhitfield wrote: You ***must*** use mariadb instead of mysqld in the commands below, ***if*** you have mariadb.
More importantly, getting rid of the ndo2db lock and sock file was what I really wanted to do, which appears to have worked. Does ndo2db still give the weird status?
SavaSC
Posts: 238
Joined: Wed Feb 23, 2011 4:49 pm

Re: Database Error

Post by SavaSC »

Sorry for the delay in getting back to you. In a meeting all morning.

That looks to have done it. No errors.

Thank you for your help.

You may close this thread.
Locked