Nagios performance trouble

scottwilkerson · Post by **scottwilkerson** » Fri Nov 02, 2012 12:55 pm

After rebooting do you still have errors in the mysql/log ?

tail -20 /var/log/mysqld.log

hhlodge · Post by **hhlodge** » Fri Nov 02, 2012 12:58 pm

Code: Select all

# tail -20 /var/log/mysqld.log
121102  8:40:00 [ERROR] Can't start server: Bind on TCP/IP port: Address already in use
121102  8:40:00 [ERROR] Do you already have another mysqld server running on port: 3306 ?
121102  8:40:00 [ERROR] Aborting

121102  8:40:00  InnoDB: Starting shutdown...
121102  8:46:18  InnoDB: Shutdown completed; log sequence number 0 43655
121102  8:46:18 [Note] /usr/libexec/mysqld: Shutdown complete

121102 08:46:18  mysqld ended

121102  8:46:34 [Note] /usr/libexec/mysqld: Normal shutdown

121102  8:46:36 [Note] /usr/libexec/mysqld: Shutdown complete

121102 08:46:36  mysqld ended
                                                                    <<<<<<<<<<<< reboot
121102 08:49:44  mysqld started
121102  8:49:44  InnoDB: Started; log sequence number 0 43655
121102  8:49:44 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.0.77'  socket: '/usr/local/var/lib/mysql/mysql.sock'  port: 3306  Source distribution

mguthrie · Post by **mguthrie** » Fri Nov 02, 2012 1:09 pm

Right now things do still point to the problem surrounding mysqld. Have you made any further attempts to complete the repair procedure? If so, I would try to complete that if the repair run hasn't completed yet.

scottwilkerson · Post by **scottwilkerson** » Fri Nov 02, 2012 1:12 pm

Lets do a running tail on the nagios.log and the npcd.log to see if anything is showing up there

Code: Select all

tail -f /usr/local/nagios/var/nagios.log

and

Code: Select all

tail -f /usr/local/nagios/var/npcd.log

Finally, can you run the following

Code: Select all

echo "show processlist;"|mysql -pnagiosxi|wc -l
cat /etc/my.cnf|grep max

hhlodge · Post by **hhlodge** » Fri Nov 02, 2012 3:31 pm

I'll do the other repairs and the tail's but here's the output you asked for.

Code: Select all

root@psm-itmon log]# echo "show processlist;"|mysql -pnagiosxi|wc -l
56
[root@psm-itmon log]# cat /etc/my.cnf|grep max
[root@psm-itmon log]#                   <<< no matches

scottwilkerson · Post by **scottwilkerson** » Fri Nov 02, 2012 5:55 pm

Ok this is showing me that we are hitting the max connections (default 50) for mysqld

Add the following in /etc/my.cnf just below [mysqld]

Code: Select all

max_connections=200

then

Code: Select all

service mysqld restart

hhlodge · Post by **hhlodge** » Mon Nov 05, 2012 8:54 am

This definitely seems to have helped. I still got some load alerts but only for be 4 to 6, nothing like previously. sar data shows 95% idle over the past 2 days and I have no blocked processes. Hopefully that will be the end of it. I do see a lot of these as I tail nagios.log.

Code: Select all

[1351889661] SERVICE ALERT: localhost;Total Processes;WARNING;SOFT;2;PROCS WARNING: 300 processes with STATE = RSZDT

Is that excessive or should I just up the threshold?

Code: Select all

[1351889792] Warning: Global service event handler command '/usr/bin/php /usr/local/nagiosxi/scripts/handle_nagioscore_event.php --handler-type=service --host="localhost" --service="Total Processes" --hostaddress="127.0.0.1" --hoststate=UP --hoststateid=0 --hosteventid=0 --hostproblemid=0 --servicestate=OK --servicestateid=0 --lastservicestate=WARNING --lastservicestateid=1 --servicestatetype=SOFT --currentattempt=3 --maxattempts=4 --serviceeventid=260372 --serviceproblemid=0 --serviceoutput="PROCS OK: 176 processes with STATE = RSZDT" --longserviceoutput=""' timed out after 30 seconds

Are these the execution of the commands itself timing out?

mguthrie · Post by **mguthrie** » Mon Nov 05, 2012 11:25 am

Can you post the output from the following:

Code: Select all

ps aux | grep eventman

hhlodge · Post by **hhlodge** » Tue Nov 06, 2012 9:20 am

Code: Select all

# ps aux | grep eventman
nagios   26698  0.0  0.0   8724   964 ?        Ss   09:19   0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local/nagiosxi/var/eventman.log 2>&1
nagios   26699  0.9  0.3 181148 25448 ?        S    09:19   0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
root     26922  0.0  0.0  61240   784 pts/0    S+   09:19   0:00 grep eventman

scottwilkerson · Post by **scottwilkerson** » Tue Nov 06, 2012 2:28 pm

How many CPU's does your Nagios server have?

How many host/service checks do you run?

The new #'s you are posting may be perfectly normal on a multi-core machine running a fair number of checks.

You may want to change the warning/critical thresholds for the local host in you have a multi-CPU machine as the default thresholds are based on a one core machine in a light environment

Nagios Support Forum

Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble