Suffering from heavy load problems

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
User avatar
niebais
Posts: 349
Joined: Tue Apr 13, 2010 2:15 pm

Suffering from heavy load problems

Post by niebais »

I need to open up a new case about this issue. We have 290 hosts and 680 services on our server. Our server is a centos 5 system (32 bit) with 4 processors and 12 gigs of ram. The load average rarely if ever drops below 4.45. Also, in the past couple of days we've had it jump to a load average of 10 and it went to 20, which caused a problem of slowing down the system so much that some alerts weren't coming back in 10 seconds. Pulling up the Nagios console was going extremely slow as well. Looking at top I noticed a ton of httpd processes running constantly. I pulled them up in the server-status apache page to try and find out what was going on:

0-0 3289 0/210/210 _ 59.71 0 399 0.0 0.36 0.36 10.35.42.242 ourserver.nuskin.net GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
1-0 18732 0/90/199 _ 25.53 2 346 0.0 0.18 0.39 10.35.42.143 ourserver.nuskin.net GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
2-0 - 0/0/202 . 4.84 23 0 0.0 0.00 0.28 ::1 ourserver.nuskin.net OPTIONS * HTTP/1.0
3-0 3292 0/213/213 W 62.26 0 0 0.0 0.48 0.48 10.35.42.200 ourserver.nuskin.net GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
4-0 24678 0/45/191 _ 12.67 0 493 0.0 0.04 0.30 10.35.42.242 ourserver.nuskin.net GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
5-0 27685 0/25/212 W 6.41 0 0 0.0 0.04 0.35 10.35.42.98 ourserver.nuskin.net GET /server-status HTTP/1.1
6-0 3295 0/210/210 _ 60.95 1 386 0.0 0.34 0.34 10.35.42.143 ourserver.nuskin.net GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
7-0 3296 0/213/213 _ 59.40 0 742 0.0 0.33 0.33 10.35.42.242 ourserver.nuskin.net GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
8-0 29254 0/13/186 _ 3.57 2 292 0.0 0.02 0.29 10.35.42.143 ourserver.nuskin.net GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
9-0 23099 0/55/192 W 16.24 0 0 0.0 0.07 0.33 10.35.42.200 ourserver.nuskin.net GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
10-0 16624 0/106/202 _ 28.38 0 569 0.0 0.22 0.40 10.35.42.242 ourserver.nuskin.net GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
11-0 3401 0/209/209 _ 58.83 2 604 0.0 0.33 0.33 10.35.42.189 ourserver.nuskin.net GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
12-0 30270 0/5/155 _ 1.38 1 365 0.0 0.00 0.26 10.35.43.245 ourserver.nuskin.net GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
13-0 26543 0/33/201 L 9.42 0 0 0.0 0.05 0.30 10.35.42.242 ourserver.nuskin.net GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
14-0 3413 0/206/206 W 57.95 0 0 0.0 0.40 0.40 10.35.42.200 ourserver.nuskin.net GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
15-0 4602 0/197/197 W 55.76 0 0 0.0 0.38 0.38 10.35.42.200 ourserver.nuskin.net GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
16-0 21254 0/68/161 W 19.18 0 0 0.0 0.14 0.28 10.35.42.200 ourserver.nuskin.net GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
17-0 26557 0/32/120 _ 9.54 1 311 0.0 0.06 0.27 10.35.42.143 ourserver.nuskin.net GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
18-0 5355 0/190/190 _ 54.01 1 314 0.0 0.30 0.30 10.35.43.245 ourserver.nuskin.net GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
19-0 11133 0/145/145 W 40.12 0 0 0.0 0.22 0.22 10.35.42.200 ourserver.nuskin.net GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
20-0 - 0/0/41 . 5.87 79 0 0.0 0.00 0.06 ::1 ourserver.nuskin.net OPTIONS * HTTP/1.0
21-0 - 0/0/109 . 30.55 125 0 0.0 0.00 0.17 ::1 ourserver.nuskin.net OPTIONS * HTTP/1.0
22-0 13603 0/125/125 _ 36.02 0 303 0.0 0.24 0.24 10.35.42.242 ourserver.nuskin.net GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%


This page is being hit constantly:
/nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%

I can't tell what this particular function is, but I'm trying to figure out why it is taking so much of our server up. If I stop the apache server, the load will drop down to about 1.8 or 1.9 to 3.0. I would like to figure out how to optimize or even cache this particular call so that the server won't be tied up with this process.

I've tried some http optimizations, but nothing has had much effect on it quite yet.

What can I do this address this issue?
We've thought about using the DNX system or even fusion when it comes out, but I need some help in the meantime.
mmestnik
Posts: 972
Joined: Mon Feb 15, 2010 2:23 pm

Re: Suffering from heavy load problems

Post by mmestnik »

Fusion is out/available. This is the Ajax handler that each page in XI has a number of running. Ajax is called every ~two seconds to refresh the data you see in NagiosXI and there are often more then once instance of Ajax per page and on top of that I'd imagine each user would have more then one page/tab active.

Afraid caching these would defeat there purpose to provide updated content. We are looking into solutions, but haven't adopted any mainly because of bugs or missing features in php.
User avatar
niebais
Posts: 349
Joined: Tue Apr 13, 2010 2:15 pm

Re: Suffering from heavy load problems

Post by niebais »

I'll let you know how DNX works. We'll probably try fusion out in the near future as well.
User avatar
niebais
Posts: 349
Joined: Tue Apr 13, 2010 2:15 pm

Re: Suffering from heavy load problems

Post by niebais »

As an FYI, we ended up adding some EMC disks to the following directories:
/var/lib
/usr/local

This solved a huge portion of our load problem. I don't suspect that mysql (located in /var/lib) was causing any of the slowness, but I suspect that there was constant file updating happening somewhere in the nagios directory. (/usr/local/nagios, /usr/local/nagiosxi)
Locked