Page 1 of 2

PostgreSQL Deadlocks

Posted: Fri Jan 25, 2013 7:44 am
by Gavin
Hi,

We're seeing a *lot* of deadlock errors in PSQL. These are written to the GUI, and when tailing the postgresql log file, we're seeing loads of these:

Code: Select all

DETAIL:  Process 11410 waits for ShareLock on transaction 7985885; blocked by process 14847.
        Process 14847 waits for ShareLock on transaction 7985803; blocked by process 11410.
        Process 11410: UPDATE xi_options SET value='1359117361' WHERE name='ccm_last_update'
        Process 14847: UPDATE xi_options SET value='1359117381' WHERE name='ccm_last_update'
HINT:  See server log for query details.
STATEMENT:  UPDATE xi_options SET value='1359117361' WHERE name='ccm_last_update'
ERROR:  deadlock detected
DETAIL:  Process 15764 waits for ShareLock on transaction 7985896; blocked by process 14898.
        Process 14898 waits for ShareLock on transaction 7985991; blocked by process 15764.
        Process 15764: UPDATE xi_options SET value='1359117408' WHERE name='ccm_last_update'
        Process 14898: UPDATE xi_options SET value='1359117385' WHERE name='ccm_last_update'
HINT:  See server log for query details.
STATEMENT:  UPDATE xi_options SET value='1359117408' WHERE name='ccm_last_update'
ERROR:  deadlock detected
DETAIL:  Process 14786 waits for ShareLock on transaction 7986027; blocked by process 11448.
        Process 11448 waits for ShareLock on transaction 7986030; blocked by process 14786.
        Process 14786: UPDATE xi_options SET value='1359117411' WHERE name='ccm_last_update'
        Process 11448: UPDATE xi_options SET value='1359117411' WHERE name='ccm_last_update'
HINT:  See server log for query details.
STATEMENT:  UPDATE xi_options SET value='1359117411' WHERE name='ccm_last_update'
I have done a manual diff of a backup from the 24th at 6am, and it isn't hugely different. Since then, Nagios has been upgraded from R3 to R4. The '/var/lib/pgsql/pgstartup.log' log doesn't show any errors.

We're also frequently seeing 'cannot connect to database' errors when using CCM pages. I've done a repair of the MySQL database, but this seems to be more of a PostgreSQL issue with the nagiosxi database.

This is massively visible to our business, as the errors (all referring to ccm_last_update / deadlocks) are appearing all over the web interface too.

Can you please advise?

Many thanks,

Gavin

Re: PostgreSQL Deadlocks

Posted: Fri Jan 25, 2013 10:18 am
by mguthrie
First, lets make sure postgres has a fresh vacuum run on it:
http://support.nagios.com/wiki/index.ph ... .22_in_log

XI uses persistent connections to connect to postgresql. Is it possible that the max connection limit is being exceeded on your system?

Re: PostgreSQL Deadlocks

Posted: Fri Jan 25, 2013 10:36 am
by Gavin
What might cause the connection limit to be hit? PostgreSQL is configured (on the same server) as per the nagios xi installer, so all settings should be default.

I tried to perform a vaccum, but got a permissions error:

Code: Select all

nagiosxi=> VACUUM;
WARNING:  pgstat wait timeout
WARNING:  skipping "pg_authid" --- only superuser can vacuum it
WARNING:  skipping "pg_database" --- only superuser can vacuum it
WARNING:  skipping "pg_tablespace" --- only superuser can vacuum it
WARNING:  skipping "pg_pltemplate" --- only superuser can vacuum it
WARNING:  skipping "pg_shdepend" --- only superuser can vacuum it
WARNING:  skipping "pg_shdescription" --- only superuser can vacuum it
WARNING:  skipping "pg_auth_members" --- only superuser can vacuum it
VACUUM
nagiosxi=>  VACUUM ANALYZE;
WARNING:  skipping "pg_authid" --- only superuser can vacuum it
WARNING:  skipping "pg_database" --- only superuser can vacuum it
WARNING:  skipping "pg_tablespace" --- only superuser can vacuum it
WARNING:  skipping "pg_pltemplate" --- only superuser can vacuum it
WARNING:  skipping "pg_shdepend" --- only superuser can vacuum it
WARNING:  skipping "pg_shdescription" --- only superuser can vacuum it
WARNING:  skipping "pg_auth_members" --- only superuser can vacuum it
VACUUM
nagiosxi=> 
The above was with me running as 'root', logging into psql using nagiosxi/nagiosxi. I'm not terrible au fait with PostgreSQL - am I doing something wrong here?

Thanks,

Gavin

Re: PostgreSQL Deadlocks

Posted: Fri Jan 25, 2013 10:45 am
by Gavin
I've been educated on PostgreSQL now, and have logged into the database as the 'postgres' user. I can now run VACCUM ok.

Should the nagiosxi user have permission to do this? Could it be that the db scripts are running but can't actually do anything?

Thanks,

Gavin

Re: PostgreSQL Deadlocks

Posted: Fri Jan 25, 2013 10:48 am
by Gavin
Sorry for the triple post...

I've vacuumed and restarted postgresql, the error is still appearing.

Thanks,

Gavin

Re: PostgreSQL Deadlocks

Posted: Fri Jan 25, 2013 10:50 am
by mguthrie
Typically I do it for both nagiosxi and postgres users just to be safe.

Try this to see how many open connections you have:

Code: Select all

psql nagiosxi nagiosxi
select count(*) from pg_stat_activity ;
\q

Re: PostgreSQL Deadlocks

Posted: Fri Jan 25, 2013 11:01 am
by Gavin
Should 'nagiosxi' be able to VACUUM? pg_stat_activity was at 42 when I first ran it, its now hovering around 30.

Notably, postgresql-Fri.log does contain quite a lot of 'WARNING: pgstat wait timeout'...

Thanks,

Gavin

Re: PostgreSQL Deadlocks

Posted: Fri Jan 25, 2013 11:15 am
by mguthrie
Check your max_connections variable in the following file. Try increasing that and see if it resolves the issue.

/var/lib/pgsql/data/postgresql.conf

If not, also try running the following:

Code: Select all

touch /usr/local/nagiosxi/etc/components/ccm_config.inc.php
chown apache:nagios /usr/local/nagiosxi/etc/components/ccm_config.inc.php

Re: PostgreSQL Deadlocks

Posted: Fri Jan 25, 2013 11:21 am
by Gavin
I've changed the limit from 100 to 1000 and I've touched and chown'd that file.

I haven't seen that error again *yet*, but I'll continue to monitor.

Thanks,

Gavin

Re: PostgreSQL Deadlocks

Posted: Fri Jan 25, 2013 11:36 am
by mguthrie
Ok, I'll take a look at the call that was generating the error and see if we can eliminate that as a potential problem.