DB Errors - Nagios down

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

DB Errors - Nagios down

Post by jbennett »

Code: Select all

Message: A database connection error has been detected, we are attempting to repair the server, if the repair does not resolve the issue, please contact Nagios support.

Run the following from the CLI as root to attempt to repair the DB

/usr/local/nagiosxi/scripts/repair_databases.sh
I came in this morning to a Nagios server that appears to not be able to connect to the DB. As per the instructions on the page, I run the repair tool (as found here: http://assets.nagios.com/downloads/nagi ... tabase.pdf as well).

I don't get any errors and it completes just fine.

I have verified that MySQL is running:

Code: Select all

]# /etc/init.d/mysqld status
mysqld (pid 25360) is running...
When I check the error log for MySQL, I don't find any errors around the time that Nagios appears to sent out its last alert:

Code: Select all

141008 15:16:10 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.0.95'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distribution
141013  9:49:29 [Note] /usr/libexec/mysqld: Normal shutdown
I'm not out of disk space, apparently:

Code: Select all

]# df  -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00_ROOT
                       48G   36G  9.1G  80% /
/dev/mapper/VolGroup00-LogVol00
                      3.0G   95M  2.7G   4% /tmp
/dev/mapper/VolGroup00-LogVol00_VAR
                      5.7G  4.2G  1.3G  77% /var
/dev/hda1             190M   40M  141M  23% /boot
tmpfs                 5.9G     0  5.9G   0% /dev/shm
tmpfs                 125M   57M   69M  46% /var/nagiosramdisk
And inodes don't seem to be a problem:

Code: Select all

# df -i
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/mapper/VolGroup00-LogVol00_ROOT
                     12799776  218121 12581655    2% /
/dev/mapper/VolGroup00-LogVol00
                      793600     741  792859    1% /tmp
/dev/mapper/VolGroup00-LogVol00_VAR
                     1540096   90193 1449903    6% /var
/dev/hda1              50200      50   50150    1% /boot
tmpfs                1538707       1 1538706    1% /dev/shm
tmpfs                1538707      14 1538693    1% /var/nagiosramdisk
When I go to verify configs, everything appears to be fine with only warnings about 'failure_prediction_enabled' present.

I've checked nagios.log and don't see anything in there upon initial inspection.

Any help would be greatly appreciated as our system is not currently running.

EDIT: Upon checking the httpd log, I find the following:

Code: Select all

[Mon Oct 13 10:48:11 2014] [error] [client 10.100.39.62] PHP Warning:  pg_pconnect() [<a href='function.pg-pconnect'>function.pg-pconnect</a>]: Unable to connect to PostgreSQL server: FATAL:  database is not accepting commands to avoid wraparound data loss in database "postgres"\nHINT:  Stop the postmaster and use a standalone backend to vacuum database "postgres". in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-postgres64.inc.php on line 682, referer: http://lnttavmnag1/nagiosxi/includes/components/nocscreen/noc.php
[Mon Oct 13 10:48:11 2014] [error] [client 10.100.39.62] PHP Notice:  Undefined variable: result in /usr/local/nagiosxi/html/includes/db.inc.php on line 241, referer: http://lnttavmnag1/nagiosxi/includes/components/nocscreen/noc.php
I have no idea where to go from here though!
User avatar
lgroschen
Posts: 384
Joined: Wed Nov 27, 2013 1:17 pm

Re: DB Errors - Nagios down

Post by lgroschen »

jbennett,
First try following this guide in the Nagios XI FAQ to see if you can Vacuum the DB and fix the problem:

http://support.nagios.com/wiki/index.ph ... .22_in_log

If that doesn't fix your problem try re-installing postgres:

Code: Select all

yum reinstall -y postgresql postgresql-devel
Lastly, if you have a backup of your system see if you can restore back to the most recent one and check to see if the problem is resolved.


Report back here if you have any success, also any logs after running these steps would also help for further troubleshooting.

/Luke
/Luke
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: DB Errors - Nagios down

Post by abrist »

This issue is actually with postgres. Please following the steps in the faq below to fix the issue with a vacuum:
http://support.nagios.com/wiki/index.ph ... .22_in_log
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: DB Errors - Nagios down

Post by jbennett »

As soon as I try the first step:

Code: Select all

# psql nagiosxi nagiosxi
psql: FATAL:  database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT:  Stop the postmaster and use a standalone backend to vacuum database "postgres".
If I stop postgresql and try and access, I get the following:

Code: Select all

# /etc/init.d/postgresql stop
Stopping postgresql service:                               [  OK  ]
# psql nagiosxi nagiosxi
psql: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: DB Errors - Nagios down

Post by jbennett »

OK - after some more digging, I came across this:

http://serverfault.com/questions/292713/postgres-vacuum

Specifically, the second answer.

So I have done the following:

Code: Select all

# su postgres
bash-3.2$ postgres -D /var/lib/pgsql/data/ postgres
WARNING:  database "postgres" must be vacuumed within 1000000 transactions
HINT:  To avoid a database shutdown, execute a full-database VACUUM in "postgres".

PostgreSQL stand-alone backend 8.1.23
backend> VACUUM
backend> VACUUM ANALYZE
backend> VACUUM FULL

Code: Select all

bash-3.2$ postgres -D /var/lib/pgsql/data/ nagiosxi
WARNING:  database "postgres" must be vacuumed within 1000000 transactions
HINT:  To avoid a database shutdown, execute a full-database VACUUM in "nagiosxi".

PostgreSQL stand-alone backend 8.1.23
backend> VACUUM
backend> VACUUM ANALYZE
backend> VACUUM FULL

Code: Select all

bash-3.2$ postgres -D /var/lib/pgsql/data/ template1
WARNING:  database "postgres" must be vacuumed within 1000000 transactions
HINT:  To avoid a database shutdown, execute a full-database VACUUM in "template1".

PostgreSQL stand-alone backend 8.1.23
backend> VACUUM
backend> VACUUM ANALYZE
backend> VACUUM FULL
I am now able to access my Nagios install. Do I continue to run these commands on these databases until I no longer get the warnings?

IE:

Code: Select all

WARNING:  database "template1" must be vacuumed within 999483 transactions
In this process I did get the following warning:

Code: Select all

NOTICE:  number of page slots needed (29024) exceeds max_fsm_pages (20000)
HINT:  Consider increasing the configuration parameter "max_fsm_pages" to a value over 29024.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: DB Errors - Nagios down

Post by lmiltchev »

Execute vacuum in the standalong mode one more time by running the following commands (one by one):

Code: Select all

service postgresql stop
su postgres
echo "VACUUM FULL;" > /tmp/fix.sql
postgres -D /var/lib/pgsql/data nagiosxi < /tmp/fix.sql
echo "VACUUM FULL;" > /tmp/fix.sql
postgres -D /var/lib/pgsql/data postgres < /tmp/fix.sql
echo "VACUUM FULL;" > /tmp/fix.sql
postgres -D /var/lib/pgsql/data template1 < /tmp/fix.sql
exit
service postgresql start
Let me know if this helped.
Be sure to check out our Knowledgebase for helpful articles and solutions!
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: DB Errors - Nagios down

Post by jbennett »

I've run the commands suggested. I'm assuming that you were haing it output VACUUM FULL to /tmp/fix.sql for a reason, but I'm not sure I follow.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: DB Errors - Nagios down

Post by abrist »

You will most likely see some warnings - though it may still be working fine. As long as the transaction wraparound errors are gone, you should be ok.

PS: I think I will use your notes to update the FAQ at some point. Thanks kindly.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: DB Errors - Nagios down

Post by jbennett »

I came in this morning to a server that was down agian.

I cannot start postgres.

Maillog shows the following:

Code: Select all

Oct 14 08:27:53 [server] sendmail[3314]: rejecting connections on daemon MTA: load average: 258
tail -f /var/lib/pgsql/data/pg_log/postgresql-Mon.log reports the following:

Code: Select all

FATAL:  connection limit exceeded for non-superusers
(I increased max_connections from 200 to 300 and rebooted - no dice)

Code: Select all

LOG:  unexpected EOF on client connection
(??)

Code: Select all

ERROR:  relation "xi_notifications" does not exist
(The errors in the log file are referenced in the following thread: http://support.nagios.com/forum/viewtop ... 03&p=19738

Indicating that even though I'm running on 2014R1.5, I'm still experiencing some old table issues?)

I tried increasing "max_fsm_pages" from 20000 to 40000 and then starting postgresql again, but no dice.
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: DB Errors - Nagios down

Post by jbennett »

Code: Select all

# tail /var/log/httpd/error_log
[Tue Oct 14 10:49:22 2014] [error] [client 10.100.39.24] PHP Warning:  Invalid argument supplied for foreach() in /usr/local/nagiosxi/html/includes/components/nagiosim/nagiosim.inc.php on line 395, referer: http://lnttavmnag1/nagiosxi/includes/components/nocscreen/noc.php
[Tue Oct 14 10:49:22 2014] [error] [client 10.100.39.24] PHP Warning:  pg_pconnect() [<a href='function.pg-pconnect'>function.pg-pconnect</a>]: Unable to connect to PostgreSQL server: could not connect to server: Connection refused\n\tIs the server running on host "localhost" and accepting\n\tTCP/IP connections on port 5432? in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-postgres64.inc.php on line 682, referer: http://lnttavmnag1/nagiosxi/includes/components/nocscreen/noc.php
[Tue Oct 14 10:49:22 2014] [error] [client 10.100.39.24] PHP Notice:  Undefined variable: result in /usr/local/nagiosxi/html/includes/db.inc.php on line 241, referer: http://lnttavmnag1/nagiosxi/includes/components/nocscreen/noc.php
[Tue Oct 14 10:49:23 2014] [error] [client 10.100.39.24] PHP Warning:  pg_pconnect() [<a href='function.pg-pconnect'>function.pg-pconnect</a>]: Unable to connect to PostgreSQL server: could not connect to server: Connection refused\n\tIs the server running on host "localhost" and accepting\n\tTCP/IP connections on port 5432? in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-postgres64.inc.php on line 682, referer: http://lnttavmnag1/nagiosxi/includes/components/nocscreen/noc.php
[Tue Oct 14 10:49:23 2014] [error] [client 10.100.39.24] PHP Notice:  Undefined variable: result in /usr/local/nagiosxi/html/includes/db.inc.php on line 241, referer: http://lnttavmnag1/nagiosxi/includes/components/nocscreen/noc.php
[Tue Oct 14 10:49:23 2014] [error] [client 10.100.39.24] PHP Warning:  pg_pconnect() [<a href='function.pg-pconnect'>function.pg-pconnect</a>]: Unable to connect to PostgreSQL server: could not connect to server: Connection refused\n\tIs the server running on host "localhost" and accepting\n\tTCP/IP connections on port 5432? in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-postgres64.inc.php on line 682, referer: http://lnttavmnag1/nagiosxi/includes/components/nocscreen/noc.php
[Tue Oct 14 10:49:23 2014] [error] [client 10.100.39.24] PHP Notice:  Undefined variable: result in /usr/local/nagiosxi/html/includes/db.inc.php on line 241, referer: http://lnttavmnag1/nagiosxi/includes/components/nocscreen/noc.php
[Tue Oct 14 10:49:23 2014] [error] [client 10.100.39.24] PHP Warning:  Invalid argument supplied for foreach() in /usr/local/nagiosxi/html/includes/components/nagiosim/nagiosim.inc.php on line 395, referer: http://lnttavmnag1/nagiosxi/includes/components/nocscreen/noc.php
[Tue Oct 14 10:49:23 2014] [error] [client 10.100.39.24] PHP Warning:  pg_pconnect() [<a href='function.pg-pconnect'>function.pg-pconnect</a>]: Unable to connect to PostgreSQL server: could not connect to server: Connection refused\n\tIs the server running on host "localhost" and accepting\n\tTCP/IP connections on port 5432? in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-postgres64.inc.php on line 682, referer: http://lnttavmnag1/nagiosxi/includes/components/nocscreen/noc.php
[Tue Oct 14 10:49:23 2014] [error] [client 10.100.39.24] PHP Notice:  Undefined variable: result in /usr/local/nagiosxi/html/includes/db.inc.php on line 241, referer: http://lnttavmnag1/nagiosxi/includes/components/nocscreen/noc.php
Locked