Page 1 of 2

Performance issues

Posted: Tue Dec 10, 2013 11:48 am
by westernuniv
I had some performance issues as described in the topic http://support.nagios.com/forum/viewtop ... 4&start=10 but it is now locked so I can't add an update.

I applied the tuning that was recommended and upgraded to 2012R2.5 and things seemed to be running smoothly until today (16 days after changes were made). My CPU is spiking (which in turn is causing other checks to time out) and memory is steadily increasing. The box has 96GB of memory and 74GB is in use.

Watching htop the CPU goes through the roof when JMX queries are run (pinning all 24 cores over 100%). To me it sounds like there is a memory leak some where since the problem is gradual and only clears on reboot.

httpd seems to spike as well but not as much.

Nagios XI: 2012R2.5
OS: CentOS 6.4 (Final)
PHP Version: 5.3.3

Re: Performance issues

Posted: Tue Dec 10, 2013 11:58 am
by abrist
How many checks are you running per 5 minutes?
What is the average check latency/duration for those JMX checks?
Have you added any checks in the last 16 days? If so, what kinds of checks?

Re: Performance issues

Posted: Tue Dec 10, 2013 12:39 pm
by westernuniv
Every 5 minutes:
Host Checks: 420
Service Checks: 2123

The average check latency for JMX is around 0.18 seconds
The average duration is all over the map, anything from 0.1 sec to 46 seconds (I did a random sampling of 20-30 jmx checks)

Within the past 16 days there have been approx 50 checks added. The majority being NRPE checks to other hosts and 2-3 JMX checks, a HTTP post (to test site auth) and some SSH

Re: Performance issues

Posted: Tue Dec 10, 2013 12:43 pm
by slansing
96GB of memory and 74GB is in use
Is anything else installed on this system besides Nagios XI and it's dependencies? Can you determine through TOP or other sources what is eating this memory up?

Re: Performance issues

Posted: Tue Dec 10, 2013 1:09 pm
by westernuniv
Nope, just Nagios and its related dependencies. The things that are using the most memory (but only averaging 0.1 - 0.2% per process) are mysql and httpd.

Re: Performance issues

Posted: Tue Dec 10, 2013 1:28 pm
by lmiltchev
What is the iowait (wa) that is shown when you run top?

Re: Performance issues

Posted: Tue Dec 10, 2013 1:45 pm
by westernuniv

Code: Select all

top - 13:45:01 up 16 days,  5:29,  2 users,  load average: 2.36, 3.17, 3.63
Tasks: 686 total,   1 running, 685 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.6%us,  0.7%sy,  0.0%ni, 96.6%id,  0.1%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  99059156k total, 98209912k used,   849244k free,   608536k buffers
Swap: 16777208k total,    14656k used, 16762552k free, 22465972k cached

Re: Performance issues

Posted: Tue Dec 10, 2013 1:49 pm
by abrist
How much of that 74gb is used for disk caching?

Code: Select all

free -m

Re: Performance issues

Posted: Tue Dec 10, 2013 1:52 pm
by westernuniv

Code: Select all

# free -m
             total       used       free     shared    buffers     cached
Mem:         96737      95923        813          0        594      21940
-/+ buffers/cache:      73389      23348
Swap:        16383         14      16369

Re: Performance issues

Posted: Tue Dec 10, 2013 2:03 pm
by abrist
Ahhh, the 74gb is without the disk buffers. Can you post the output of the following in code wraps:

Code: Select all

ps -aux | sort -k 3