Page 4 of 5

Re: Nagios performance trouble

Posted: Fri Nov 02, 2012 12:55 pm
by scottwilkerson
After rebooting do you still have errors in the mysql/log ?

Code: Select all

tail -20 /var/log/mysqld.log

Re: Nagios performance trouble

Posted: Fri Nov 02, 2012 12:58 pm
by hhlodge

Code: Select all

# tail -20 /var/log/mysqld.log
121102  8:40:00 [ERROR] Can't start server: Bind on TCP/IP port: Address already in use
121102  8:40:00 [ERROR] Do you already have another mysqld server running on port: 3306 ?
121102  8:40:00 [ERROR] Aborting

121102  8:40:00  InnoDB: Starting shutdown...
121102  8:46:18  InnoDB: Shutdown completed; log sequence number 0 43655
121102  8:46:18 [Note] /usr/libexec/mysqld: Shutdown complete

121102 08:46:18  mysqld ended

121102  8:46:34 [Note] /usr/libexec/mysqld: Normal shutdown

121102  8:46:36 [Note] /usr/libexec/mysqld: Shutdown complete

121102 08:46:36  mysqld ended
                                                                    <<<<<<<<<<<< reboot
121102 08:49:44  mysqld started
121102  8:49:44  InnoDB: Started; log sequence number 0 43655
121102  8:49:44 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.0.77'  socket: '/usr/local/var/lib/mysql/mysql.sock'  port: 3306  Source distribution

Re: Nagios performance trouble

Posted: Fri Nov 02, 2012 1:09 pm
by mguthrie
Right now things do still point to the problem surrounding mysqld. Have you made any further attempts to complete the repair procedure? If so, I would try to complete that if the repair run hasn't completed yet.

Re: Nagios performance trouble

Posted: Fri Nov 02, 2012 1:12 pm
by scottwilkerson
Lets do a running tail on the nagios.log and the npcd.log to see if anything is showing up there

Code: Select all

tail -f /usr/local/nagios/var/nagios.log
and

Code: Select all

tail -f /usr/local/nagios/var/npcd.log
Finally, can you run the following

Code: Select all

echo "show processlist;"|mysql -pnagiosxi|wc -l
cat /etc/my.cnf|grep max

Re: Nagios performance trouble

Posted: Fri Nov 02, 2012 3:31 pm
by hhlodge
I'll do the other repairs and the tail's but here's the output you asked for.

Code: Select all

root@psm-itmon log]# echo "show processlist;"|mysql -pnagiosxi|wc -l
56
[root@psm-itmon log]# cat /etc/my.cnf|grep max
[root@psm-itmon log]#                   <<< no matches

Re: Nagios performance trouble

Posted: Fri Nov 02, 2012 5:55 pm
by scottwilkerson
Ok this is showing me that we are hitting the max connections (default 50) for mysqld

Add the following in /etc/my.cnf just below [mysqld]

Code: Select all

max_connections=200
then

Code: Select all

service mysqld restart

Re: Nagios performance trouble

Posted: Mon Nov 05, 2012 8:54 am
by hhlodge
This definitely seems to have helped. I still got some load alerts but only for be 4 to 6, nothing like previously. sar data shows 95% idle over the past 2 days and I have no blocked processes. Hopefully that will be the end of it. I do see a lot of these as I tail nagios.log.

Code: Select all

[1351889661] SERVICE ALERT: localhost;Total Processes;WARNING;SOFT;2;PROCS WARNING: 300 processes with STATE = RSZDT
Is that excessive or should I just up the threshold?

Code: Select all

[1351889792] Warning: Global service event handler command '/usr/bin/php /usr/local/nagiosxi/scripts/handle_nagioscore_event.php --handler-type=service --host="localhost" --service="Total Processes" --hostaddress="127.0.0.1" --hoststate=UP --hoststateid=0 --hosteventid=0 --hostproblemid=0 --servicestate=OK --servicestateid=0 --lastservicestate=WARNING --lastservicestateid=1 --servicestatetype=SOFT --currentattempt=3 --maxattempts=4 --serviceeventid=260372 --serviceproblemid=0 --serviceoutput="PROCS OK: 176 processes with STATE = RSZDT" --longserviceoutput=""' timed out after 30 seconds
Are these the execution of the commands itself timing out?

Re: Nagios performance trouble

Posted: Mon Nov 05, 2012 11:25 am
by mguthrie
Can you post the output from the following:

Code: Select all

ps aux | grep eventman

Re: Nagios performance trouble

Posted: Tue Nov 06, 2012 9:20 am
by hhlodge

Code: Select all

# ps aux | grep eventman
nagios   26698  0.0  0.0   8724   964 ?        Ss   09:19   0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local/nagiosxi/var/eventman.log 2>&1
nagios   26699  0.9  0.3 181148 25448 ?        S    09:19   0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
root     26922  0.0  0.0  61240   784 pts/0    S+   09:19   0:00 grep eventman

Re: Nagios performance trouble

Posted: Tue Nov 06, 2012 2:28 pm
by scottwilkerson
How many CPU's does your Nagios server have?

How many host/service checks do you run?

The new #'s you are posting may be perfectly normal on a multi-core machine running a fair number of checks.

You may want to change the warning/critical thresholds for the local host in you have a multi-CPU machine as the default thresholds are based on a one core machine in a light environment