Nagios performance trouble

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios performance trouble

Post by scottwilkerson »

After rebooting do you still have errors in the mysql/log ?

Code: Select all

tail -20 /var/log/mysqld.log
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
hhlodge
Posts: 206
Joined: Tue Mar 08, 2011 2:13 pm

Re: Nagios performance trouble

Post by hhlodge »

Code: Select all

# tail -20 /var/log/mysqld.log
121102  8:40:00 [ERROR] Can't start server: Bind on TCP/IP port: Address already in use
121102  8:40:00 [ERROR] Do you already have another mysqld server running on port: 3306 ?
121102  8:40:00 [ERROR] Aborting

121102  8:40:00  InnoDB: Starting shutdown...
121102  8:46:18  InnoDB: Shutdown completed; log sequence number 0 43655
121102  8:46:18 [Note] /usr/libexec/mysqld: Shutdown complete

121102 08:46:18  mysqld ended

121102  8:46:34 [Note] /usr/libexec/mysqld: Normal shutdown

121102  8:46:36 [Note] /usr/libexec/mysqld: Shutdown complete

121102 08:46:36  mysqld ended
                                                                    <<<<<<<<<<<< reboot
121102 08:49:44  mysqld started
121102  8:49:44  InnoDB: Started; log sequence number 0 43655
121102  8:49:44 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.0.77'  socket: '/usr/local/var/lib/mysql/mysql.sock'  port: 3306  Source distribution
- Kyle
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Nagios performance trouble

Post by mguthrie »

Right now things do still point to the problem surrounding mysqld. Have you made any further attempts to complete the repair procedure? If so, I would try to complete that if the repair run hasn't completed yet.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios performance trouble

Post by scottwilkerson »

Lets do a running tail on the nagios.log and the npcd.log to see if anything is showing up there

Code: Select all

tail -f /usr/local/nagios/var/nagios.log
and

Code: Select all

tail -f /usr/local/nagios/var/npcd.log
Finally, can you run the following

Code: Select all

echo "show processlist;"|mysql -pnagiosxi|wc -l
cat /etc/my.cnf|grep max
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
hhlodge
Posts: 206
Joined: Tue Mar 08, 2011 2:13 pm

Re: Nagios performance trouble

Post by hhlodge »

I'll do the other repairs and the tail's but here's the output you asked for.

Code: Select all

root@psm-itmon log]# echo "show processlist;"|mysql -pnagiosxi|wc -l
56
[root@psm-itmon log]# cat /etc/my.cnf|grep max
[root@psm-itmon log]#                   <<< no matches
- Kyle
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios performance trouble

Post by scottwilkerson »

Ok this is showing me that we are hitting the max connections (default 50) for mysqld

Add the following in /etc/my.cnf just below [mysqld]

Code: Select all

max_connections=200
then

Code: Select all

service mysqld restart
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
hhlodge
Posts: 206
Joined: Tue Mar 08, 2011 2:13 pm

Re: Nagios performance trouble

Post by hhlodge »

This definitely seems to have helped. I still got some load alerts but only for be 4 to 6, nothing like previously. sar data shows 95% idle over the past 2 days and I have no blocked processes. Hopefully that will be the end of it. I do see a lot of these as I tail nagios.log.

Code: Select all

[1351889661] SERVICE ALERT: localhost;Total Processes;WARNING;SOFT;2;PROCS WARNING: 300 processes with STATE = RSZDT
Is that excessive or should I just up the threshold?

Code: Select all

[1351889792] Warning: Global service event handler command '/usr/bin/php /usr/local/nagiosxi/scripts/handle_nagioscore_event.php --handler-type=service --host="localhost" --service="Total Processes" --hostaddress="127.0.0.1" --hoststate=UP --hoststateid=0 --hosteventid=0 --hostproblemid=0 --servicestate=OK --servicestateid=0 --lastservicestate=WARNING --lastservicestateid=1 --servicestatetype=SOFT --currentattempt=3 --maxattempts=4 --serviceeventid=260372 --serviceproblemid=0 --serviceoutput="PROCS OK: 176 processes with STATE = RSZDT" --longserviceoutput=""' timed out after 30 seconds
Are these the execution of the commands itself timing out?
- Kyle
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Nagios performance trouble

Post by mguthrie »

Can you post the output from the following:

Code: Select all

ps aux | grep eventman
hhlodge
Posts: 206
Joined: Tue Mar 08, 2011 2:13 pm

Re: Nagios performance trouble

Post by hhlodge »

Code: Select all

# ps aux | grep eventman
nagios   26698  0.0  0.0   8724   964 ?        Ss   09:19   0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local/nagiosxi/var/eventman.log 2>&1
nagios   26699  0.9  0.3 181148 25448 ?        S    09:19   0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
root     26922  0.0  0.0  61240   784 pts/0    S+   09:19   0:00 grep eventman
- Kyle
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios performance trouble

Post by scottwilkerson »

How many CPU's does your Nagios server have?

How many host/service checks do you run?

The new #'s you are posting may be perfectly normal on a multi-core machine running a fair number of checks.

You may want to change the warning/critical thresholds for the local host in you have a multi-CPU machine as the default thresholds are based on a one core machine in a light environment
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked