URGENT Postgresql server unalaiable Production server down!

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

URGENT Postgresql server unalaiable Production server down!

Post by benhank »

HELP!
for some reason postgresql has crashed on my Prod and secondary servers!
i am getting this error

Code: Select all

psql: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
I tried to use the db repair script, tried a restore nothing is working! We have no active monitoring until this is resolved!
i really need some help fellas!
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: URGENT Postgresql server unalaiable Production server do

Post by tgriep »

The error message suggest that the servers hard drive has filled up and the Postgress database cannot write temporary files.
To check the space of the server you can run this command

Code: Select all

df -h
If it shows that the drive is full, you can use the following commands to see what is taking up the space.

Find largest 10 directories by size command:

Code: Select all

find / -type d -print0 | xargs -0 du | sort -n | tail -10 | cut -f2 | xargs -I{} du -sh {}
Find the largest 10 files by size command:

Code: Select all

find / -type f -print0 | xargs -0 du | sort -n | tail -10 | cut -f2 | xargs -I{} du -sh {}
Try to free up some space on the server.

Files that can be deleted on Nagios system to free up space.

Generic Files
For generic Centos and Redhat installs, most log files are stored in the following folder and it's subfolders.

Code: Select all

/var/log
Most files in that folder are setup to automatically be compressed and deleted by the logrotate application. It allows automatic rotation, compression, removal of log files.
But if the server is totally out of space, the files that end in a date code can be deleted to free up space.

Some files in that folder do not get rotated and over time, could grow very large.
The MYSQL and Mariadb log files for example, do not get rotated on some systems.
If you do not need to keep then, they can be truncated.
The MYSQL log file is typically located here

Code: Select all

/var/log/mysqld.log
The Mariadb log file is typically located here

Code: Select all

/var/log/mariadb/mariadb.log
To truncate those files, you would just need to pipe null into the file.

Code: Select all

> /var/log/mysqld.log
> /var/log/mariadb/mariadb.log
If the image is a Nagios supplied VM, you can increase the drive space by following this article.
https://support.nagios.com/kb/article/n ... e-266.html
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

Re: URGENT Postgresql server unalaiable Production server do

Post by benhank »

Code: Select all

Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_lkensherlockp-lv_root
                       58G   23G   32G  42% /
tmpfs                  12G     0   12G   0% /dev/shm
/dev/sda1             477M  162M  291M  36% /boot
/dev/mapper/vg_lkensherlockp-lv_home
                       65G   14G   48G  23% /home
isilon.healthone.org:/ifs/data/monitoring/sherlock
                      300G   41G  260G  14% /scratch
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: URGENT Postgresql server unalaiable Production server do

Post by npolovenko »

Hello, @benhank. What is the output of this command?

Code: Select all

service postgresql restart
Make sure that your Nagios server hasn't run out of space.

Please run through the Postgres vacuum steps in this tutorial:
https://support.nagios.com/kb/article.php?id=25

If the problem is still present after the vacuuming please send in your system profile.
To send us your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and upload it to a cloud storage of your choice. You can share a link with me in a personal message.
After you upload the profile please post something in this thread to bring it up in the support queue.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: URGENT Postgresql server unalaiable Production server do

Post by tgriep »

It could be that the inodes are full, run this to see if that is the issue.

Code: Select all

df -i
If that is OK, then check this log file for any errors and post them here.

Code: Select all

/var/lib/pgsql/data/pg_log/postgresql-Tue.log
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

Re: URGENT Postgresql server unalaiable Production server do

Post by benhank »

Code: Select all

Stopping postgresql service:                               [FAILED]
Starting postgresql service:                               [FAILED]

ive cleaned out a few large files but it still wont start
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

Re: URGENT Postgresql server unalaiable Production server do

Post by benhank »

So far all of my log files are in the kb range
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

Re: URGENT Postgresql server unalaiable Production server do

Post by benhank »

Code: Select all

 echo "vacuum;vacuum analyze;vacuum full;"|psql nagiosxi postgres
psql: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

Re: URGENT Postgresql server unalaiable Production server do

Post by benhank »

my iptables are turned off too
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

Re: URGENT Postgresql server unalaiable Production server do

Post by benhank »

ok here is the shocking twist:
this issue has happened simultaneously on two seperatee servers at the same time.
i have backups of my postgresqldb,
how do i uninstall and then reinstall postgresql, and then restore the correct permissions so I can reimport the postgresql db?
Mysql is working fine, i think this may be (unless you guys say otherwise) the best and fastest way to get this working again.
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
Locked