Page 3 of 5
Re: Nagios performance trouble
Posted: Wed Sep 19, 2012 7:54 pm
by hhlodge
So I've been on RAM disk for all but perf data and things were great all day, but now load is up. vmstat doesn't show any blocking processes, but I see this in /var/log/mysqld.log.
Code: Select all
120919 20:47:44 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
120919 20:47:44 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
120919 20:50:03 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_externalcommands' is marked as crashed and last (automatic?) repair failed
120919 20:50:03 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_externalcommands' is marked as crashed and last (automatic?) repair failed
120919 20:50:03 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
120919 20:50:03 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
120919 20:50:54 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
120919 20:50:54 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_statehistory' is marked as crashed and last (automatic?) repair failed
Re: Nagios performance trouble
Posted: Thu Sep 20, 2012 9:24 am
by mguthrie
Try running our repair procedure on the database:
http://assets.nagios.com/downloads/nagi ... tabase.pdf
Also, lets make sure postgres is ok well:
Code: Select all
psql nagiosxi nagiosxi
vacuum;
vacuum analyze;
vaccum full;
\q
psql postgres postgres
vacuum;
vacuum analyze;
vaccum full;
\q
Re: Nagios performance trouble
Posted: Thu Sep 20, 2012 3:04 pm
by hhlodge
This morning I noticed I had a failed drive on the my /usr/local RAID set. I checked the RAID yesterday via remote ILO console when I rebooted and it reported okay, but maybe the drive has been failing and causing this and finally let go. I've replaced the drive and ran the suggested commands and things look good all around now. I'll follow up if things go south again. Thanks for all the help.
Re: Nagios performance trouble
Posted: Thu Sep 20, 2012 3:31 pm
by slansing
Good to hear you found the bad apple. Hopefully that's all it was.
Re: Nagios performance trouble
Posted: Mon Sep 24, 2012 1:10 pm
by hhlodge
All good after 3 days.
Re: Nagios performance trouble
Posted: Thu Nov 01, 2012 8:20 am
by hhlodge
I continue with load issues. Last night I hit a load average on 19 and continue to see blocked processes. I have run the Postgres vacuum procedure a couple times but it seems for naught. I'm at a loss what to do. I wanted to move this from physical to a VM but I don't dare with this kind of performance issue. Any thoughts as to whether upgrading to from 2011R3.2 to 2012R1.1 might put things to a correct state?
Re: Nagios performance trouble
Posted: Thu Nov 01, 2012 9:17 am
by mguthrie
With quite a while between this post and the previous issue, lets start from the top on this. What do you have showing as the top CPU consuming processes when running:
Check /var/log/mysqld.log and make sure there aren't any corrupted tables.
Re: Nagios performance trouble
Posted: Thu Nov 01, 2012 9:29 am
by hhlodge
mysqld for the most part with httpd and php coming in behind it.
Re: Nagios performance trouble
Posted: Thu Nov 01, 2012 4:27 pm
by mguthrie
I would recommend restarting apache and then also running the mysql DB repair procedure.
Are performance graphs updating ok?
Do you see any red dots from the Admin page on the subsystem components?
Re: Nagios performance trouble
Posted: Fri Nov 02, 2012 8:24 am
by hhlodge
I stopped/started httpd and then did the repair but it could not start mysqld after.
Code: Select all
recovering (with sort) MyISAM-table 'nagios_timeperiod_timeranges.MYI'
Data records: 166
- Fixing index 1
- Fixing index 2
Timeout error occurred trying to start MySQL Daemon.
Starting MySQL: [FAILED]
There were no errors in the repair process before this. So I tried the suggested next step, but that wasn't happening.
Code: Select all
[root@psm-itmon ~]# mysql -u ndoutils -pn@gweb nagios -e 'TRUNCATE TABLE nagios_logentries'
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)
[root@psm-itmon ~]# service mysqld start
Timeout error occurred trying to start MySQL Daemon.
Starting MySQL: [FAILED]
[root@psm-itmon ~]# tail -20 /var/log/mysqld.log
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
[root@psm-itmon ~]# mysql -u ndoutils -pn@gweb nagios -e 'TRUNCATE TABLE nagios_logentries'
[root@psm-itmon ~]# ps -ef | grep mys
root 17258 1 0 08:26 pts/0 00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/usr/local/var/lib/mysql --socket=/usr/local/var/lib/mysql/mysql.sock --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --user=mysql
mysql 17308 17258 0 08:26 pts/0 00:00:00 /usr/libexec/mysqld --basedir=/usr --datadir=/usr/local/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --socket=/usr/local/var/lib/mysql/mysql.sock
root 18000 1 0 08:27 pts/0 00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/usr/local/var/lib/mysql --socket=/usr/local/var/lib/mysql/mysql.sock --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --user=mysql
mysql 18050 18000 0 08:27 pts/0 00:00:00 /usr/libexec/mysqld --basedir=/usr --datadir=/usr/local/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --socket=/usr/local/var/lib/mysql/mysql.sock
root 18668 23518 0 08:28 pts/0 00:00:00 grep mys
So I rebooted and I am immediately getting blocked processes and increasingly high load and the web interface is dreadfully slow. I never see red dots in the admin page. I am also getting a lot of WMI checks timing out now. Graphs don't seem to have any gaps in data.