Database Crash and Corrupt

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
mikew
Posts: 243
Joined: Sun Feb 05, 2012 7:05 pm

Database Crash and Corrupt

Post by mikew »

I am working on a Nagios XI server 2012R2.5 which was running fine until last night the / partition filled up and the database crashed. I do not know where several GB of data came from....still have to figure that out. But no one can login.

I have checked the MySQL database nagios and it was corrupt but I fixed that with the --repair option and mysqlcheck. The nagiosql database came out OK. (I used the script in XI to fix the MySQL databases) However, I expect that the postgresql database is a problem as no one can login. What is the best way to fix that database?

I do have backups, just wanted to fix it first.

I am now seeing other issues:
* the permissions on /usr/local/nagios/var/rw/nagios.cmd are:
-rw-rw-r root:nagcmd

The nagios.log shows it is bailing out because the permissions are wrong on this file...as the named pipe is not working.

So I know the permissions are wrong. I have deleted all lock files in the /usr/local/nagios/var/ directory and restarted but permissions return incorrect. I have restarted the server...no help.

System is a CentOS 6.x 64-bit
32 vCPU and 12 GB of RAM
Mike Weber

Nagios Training/Consulting
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Database Crash and Corrupt

Post by abrist »

Are you still out of space, or did you increase the provisioned disk size? If it is still full, we need to clear up space before we do anything else.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
mikew
Posts: 243
Joined: Sun Feb 05, 2012 7:05 pm

Re: Database Crash and Corrupt

Post by mikew »

It has space now.

grep nag /etc/group
nagios:x:501:nagios,apache
nagcmd:x:502:nagios,apache

Each time nagios is started the permissions on nagios.cmd go back to

-rw-rw-r root:nagcmd

Which of course means it will bail and not work.
Last edited by mikew on Wed Dec 18, 2013 5:05 pm, edited 1 time in total.
Mike Weber

Nagios Training/Consulting
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Database Crash and Corrupt

Post by abrist »

The command pipe permissions should be:

Code: Select all

prw-rw---- 1 nagios nagcmd 0 Dec 18 13:10 /usr/local/nagios/var/rw/nagios.cmd
To change the permissions:

Code: Select all

cd /usr/local/nagios/var/rw/
mknod /usr/local/nagios/var/rw/nagios.cmd p
chown nagios:nagcmd nagios.cmd
chmod 660 nagios.cmd
ls -la
service nagios restart
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
mikew
Posts: 243
Joined: Sun Feb 05, 2012 7:05 pm

Re: Database Crash and Corrupt

Post by mikew »

As soon as I remove the file it is recreated by some process. I have done a killall -9 nagios


So I have nagios running with the right permissions:

I had to write a script to remove and then set permissions.

BUT..no one can login still....

I am now able to login to nagios core but not XI
Mike Weber

Nagios Training/Consulting
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Database Crash and Corrupt

Post by abrist »

Do you have a umask set on the system? Or selinux running?

Code: Select all

umask
getenforce
Additionally, did you reboot the server after clearing space? I ask because that is often the easiest way to make sure your services come up. I am available tomorrow to elevate this to a remote session if need be.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
mikew
Posts: 243
Joined: Sun Feb 05, 2012 7:05 pm

Re: Database Crash and Corrupt

Post by mikew »

No selinux, this is a system that has been running for 6 weeks with no problems. The crash has damaged the postgresql database I think.
Mike Weber

Nagios Training/Consulting
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Database Crash and Corrupt

Post by abrist »

You an attempt to run a vacuum on it:
http://support.nagios.com/wiki/index.ph ... .22_in_log
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
mikew
Posts: 243
Joined: Sun Feb 05, 2012 7:05 pm

Re: Database Crash and Corrupt

Post by mikew »

So I did those commands....no joy.
Mike Weber

Nagios Training/Consulting
User avatar
mikew
Posts: 243
Joined: Sun Feb 05, 2012 7:05 pm

Re: Database Crash and Corrupt

Post by mikew »

I was able to get back in by using this script:

/usr/local/nagiosxi/scripts/reset_nagiosadmin_password.php --password=xxxxxx

However, my fear is that there are more permission issues here so I will do some checking and report back if I find anything more.

Note: When this happened no account would work.
Mike Weber

Nagios Training/Consulting
Locked