Database Crash and Corrupt
Database Crash and Corrupt
I am working on a Nagios XI server 2012R2.5 which was running fine until last night the / partition filled up and the database crashed. I do not know where several GB of data came from....still have to figure that out. But no one can login.
I have checked the MySQL database nagios and it was corrupt but I fixed that with the --repair option and mysqlcheck. The nagiosql database came out OK. (I used the script in XI to fix the MySQL databases) However, I expect that the postgresql database is a problem as no one can login. What is the best way to fix that database?
I do have backups, just wanted to fix it first.
I am now seeing other issues:
* the permissions on /usr/local/nagios/var/rw/nagios.cmd are:
-rw-rw-r root:nagcmd
The nagios.log shows it is bailing out because the permissions are wrong on this file...as the named pipe is not working.
So I know the permissions are wrong. I have deleted all lock files in the /usr/local/nagios/var/ directory and restarted but permissions return incorrect. I have restarted the server...no help.
System is a CentOS 6.x 64-bit
32 vCPU and 12 GB of RAM
I have checked the MySQL database nagios and it was corrupt but I fixed that with the --repair option and mysqlcheck. The nagiosql database came out OK. (I used the script in XI to fix the MySQL databases) However, I expect that the postgresql database is a problem as no one can login. What is the best way to fix that database?
I do have backups, just wanted to fix it first.
I am now seeing other issues:
* the permissions on /usr/local/nagios/var/rw/nagios.cmd are:
-rw-rw-r root:nagcmd
The nagios.log shows it is bailing out because the permissions are wrong on this file...as the named pipe is not working.
So I know the permissions are wrong. I have deleted all lock files in the /usr/local/nagios/var/ directory and restarted but permissions return incorrect. I have restarted the server...no help.
System is a CentOS 6.x 64-bit
32 vCPU and 12 GB of RAM
Mike Weber
Nagios Training/Consulting
Nagios Training/Consulting
Re: Database Crash and Corrupt
Are you still out of space, or did you increase the provisioned disk size? If it is still full, we need to clear up space before we do anything else.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Database Crash and Corrupt
It has space now.
grep nag /etc/group
nagios
501:nagios,apache
nagcmd
502:nagios,apache
Each time nagios is started the permissions on nagios.cmd go back to
-rw-rw-r root:nagcmd
Which of course means it will bail and not work.
grep nag /etc/group
nagios
nagcmd
Each time nagios is started the permissions on nagios.cmd go back to
-rw-rw-r root:nagcmd
Which of course means it will bail and not work.
Last edited by mikew on Wed Dec 18, 2013 5:05 pm, edited 1 time in total.
Mike Weber
Nagios Training/Consulting
Nagios Training/Consulting
Re: Database Crash and Corrupt
The command pipe permissions should be:
To change the permissions:
Code: Select all
prw-rw---- 1 nagios nagcmd 0 Dec 18 13:10 /usr/local/nagios/var/rw/nagios.cmdCode: Select all
cd /usr/local/nagios/var/rw/
mknod /usr/local/nagios/var/rw/nagios.cmd p
chown nagios:nagcmd nagios.cmd
chmod 660 nagios.cmd
ls -la
service nagios restartFormer Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Database Crash and Corrupt
As soon as I remove the file it is recreated by some process. I have done a killall -9 nagios
So I have nagios running with the right permissions:
I had to write a script to remove and then set permissions.
BUT..no one can login still....
I am now able to login to nagios core but not XI
So I have nagios running with the right permissions:
I had to write a script to remove and then set permissions.
BUT..no one can login still....
I am now able to login to nagios core but not XI
Mike Weber
Nagios Training/Consulting
Nagios Training/Consulting
Re: Database Crash and Corrupt
Do you have a umask set on the system? Or selinux running?
Additionally, did you reboot the server after clearing space? I ask because that is often the easiest way to make sure your services come up. I am available tomorrow to elevate this to a remote session if need be.
Code: Select all
umask
getenforceFormer Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Database Crash and Corrupt
No selinux, this is a system that has been running for 6 weeks with no problems. The crash has damaged the postgresql database I think.
Mike Weber
Nagios Training/Consulting
Nagios Training/Consulting
Re: Database Crash and Corrupt
You an attempt to run a vacuum on it:
http://support.nagios.com/wiki/index.ph ... .22_in_log
http://support.nagios.com/wiki/index.ph ... .22_in_log
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Database Crash and Corrupt
So I did those commands....no joy.
Mike Weber
Nagios Training/Consulting
Nagios Training/Consulting
Re: Database Crash and Corrupt
I was able to get back in by using this script:
/usr/local/nagiosxi/scripts/reset_nagiosadmin_password.php --password=xxxxxx
However, my fear is that there are more permission issues here so I will do some checking and report back if I find anything more.
Note: When this happened no account would work.
/usr/local/nagiosxi/scripts/reset_nagiosadmin_password.php --password=xxxxxx
However, my fear is that there are more permission issues here so I will do some checking and report back if I find anything more.
Note: When this happened no account would work.
Mike Weber
Nagios Training/Consulting
Nagios Training/Consulting