Page 2 of 2
Re: Very Slow populating the dashboard
Posted: Mon Nov 01, 2010 1:54 pm
by mguthrie
With that kind of lag time it's possible the system load is the reason for the delay. You've got a larger load of services, are you running most of these as active checks that are being processed by your Nagios XI server? To increase your performance, you might want to look into passive checks and distributed monitoring options. This would free up some of the load on your XI Server so that you could view services in a more timely manner.
http://support.nagios.com/wiki/index.ph ... itoring.3F
Re: Very Slow populating the dashboard
Posted: Tue Nov 02, 2010 11:56 am
by nod2002
That is an option we are considering, however the load issue looks to be caused by the front end. We have 2 indentical servers with the same nagios config installed. Both have about 400 hosts and 3500 service checks all being currently run locally. The load on the prod server is normally in the range 7-9 and the DR server is noramly about 2.
The only difference between them both is the Prod server is have several users access the web interface, the DR box has no users unless we failover between servers.
The servers are HP DL380 G6 units with 36Gb of RAM and2x X5560 @ 2.80GHz Quad CPU's, the DISK setup is 8x 73GB in RAID10 array.
When looking at top on the server it does indeed look like a large chunk of the CPU time is being up by both mysql & apache rather than nagios checks / process.
Here is one snapshot.
top - 16:51:39 up 1 day, 10:34, 3 users, load average: 5.40, 5.53, 6.20
Tasks: 322 total, 3 running, 317 sleeping, 0 stopped, 2 zombie
Cpu(s): 20.2%us, 3.6%sy, 0.0%ni, 76.0%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 36912908k total, 22867872k used, 14045036k free, 1625164k buffers
Swap: 2097144k total, 0k used, 2097144k free, 16821496k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6136 mysql 15 0 429m 100m 4608 S 159.7 0.3 2999:05 mysqld
1561 apache 16 0 266m 25m 3960 S 28.6 0.1 0:01.76 httpd
25307 apache 16 0 268m 27m 4456 S 25.9 0.1 0:05.90 httpd
22169 apache 16 0 265m 26m 4392 S 23.9 0.1 0:07.37 httpd
31984 apache 15 0 263m 23m 4364 S 16.6 0.1 0:02.50 httpd
14834 apache 15 0 264m 24m 4456 R 14.6 0.1 0:14.39 httpd
5535 apache 15 0 265m 24m 4472 R 9.3 0.1 0:47.55 httpd
5011 apache 16 0 264m 24m 4392 S 6.6 0.1 0:14.80 httpd
22208 apache 16 0 268m 28m 4172 S 5.0 0.1 0:08.51 httpd
25437 apache 15 0 260m 21m 4384 S 5.0 0.1 0:05.38 httpd
14790 apache 15 0 264m 24m 4440 S 4.6 0.1 0:14.05 httpd
18225 apache 16 0 264m 24m 4220 S 4.6 0.1 0:07.98 httpd
25436 apache 16 0 260m 21m 4100 S 4.6 0.1 0:02.77 httpd
31930 apache 16 0 260m 21m 4172 S 4.6 0.1 0:02.05 httpd
31982 apache 15 0 264m 23m 4368 S 4.6 0.1 0:03.09 httpd
31983 apache 16 0 260m 21m 4076 S 4.6 0.1 0:02.93 httpd
14789 apache 16 0 263m 24m 4388 S 4.3 0.1 0:12.89 httpd
22205 apache 16 0 263m 23m 4172 S 4.3 0.1 0:07.01 httpd
31929 apache 16 0 261m 21m 4144 S 4.3 0.1 0:03.22 httpd
22133 apache 15 0 265m 24m 4456 S 4.0 0.1 0:10.25 httpd
22170 apache 15 0 261m 21m 4236 S 4.0 0.1 0:06.56 httpd
27564 apache 16 0 262m 22m 4456 S 4.0 0.1 0:26.36 httpd
22207 apache 16 0 264m 24m 4392 S 3.7 0.1 0:08.34 httpd
31981 apache 16 0 263m 24m 4316 S 3.7 0.1 0:02.40 httpd
1657 apache 16 0 260m 21m 4052 S 2.7 0.1 0:00.45 httpd
14832 apache 15 0 265m 24m 4456 S 2.3 0.1 0:11.65 httpd
1771 postgres 15 0 119m 3832 2624 S 1.7 0.0 0:00.07 postmaster
Re: Very Slow populating the dashboard
Posted: Tue Nov 02, 2010 2:37 pm
by mguthrie
I found another thread with someone noticing the same issue, see if is related to your situation.
http://support.nagios.com/forum/viewtop ... f=16&t=849
It sounds like this has to do with the Ajax requests. In XI, we use ajax requests to keep the data fresh in the browser without having to reload the page each time you want to see a current view. This happens about every 30 seconds. We may need to do some investigating and documentation on system optimization for larger environments. Let us know if this seems to be the issue on your end.
Re: Very Slow populating the dashboard
Posted: Wed Nov 03, 2010 9:35 am
by digitallook
It looks like it could be the same problem.
Srv PID Acc M CPU SS Req Conn Child Slot Client VHost Request
0-0 16473 0/12/686 W 4.19 0 0 0.0 0.08 4.28 10.222.202.196 localnagiosxi GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
1-0 993 0/217/663 W 70.83 0 0 0.0 1.48 3.85 10.222.203.178 localnagiosxi GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
2-0 16874 0/13/669 _ 2.98 3 140 0.0 0.06 4.26 10.222.203.142 localnagiosxi GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
3-0 28362 0/114/679 W 38.01 0 0 0.0 0.64 4.16 10.222.203.178 localnagiosxi GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
4-0 9362 0/27/614 _ 12.29 4 2747 0.0 0.15 4.09 10.222.203.191 localnagiosxi GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
5-0 16875 0/16/609 _ 4.01 4 285 0.0 0.10 4.04 10.222.203.148 localnagiosxi GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
6-0 21966 0/66/647 _ 20.31 2 6484 0.0 0.37 3.86 10.222.203.191 localnagiosxi GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
7-0 18008 0/4/640 W 1.28 0 0 0.0 0.03 3.51 10.222.203.151 localnagiosxi GET /server-status HTTP/1.1
8-0 - 0/0/604 . 8.09 48 0 0.0 0.00 3.74 ::1 localnagiosxi OPTIONS * HTTP/1.0
9-0 4820 0/87/620 _ 29.27 3 2799 0.0 0.50 4.68 10.222.203.151 localnagiosxi GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
10-0 9912 0/25/613 _ 9.14 3 4409 0.0 0.18 3.54 10.222.203.191 localnagiosxi GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
11-0 32512 0/43/599 W 12.00 0 0 0.0 0.25 4.16 10.222.202.196 localnagiosxi GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
12-0 10316 0/23/571 _ 7.99 0 127 0.0 0.14 3.65 127.0.0.1 localnagiosxi POST /nagiosxi/backend/ HTTP/1.1
13-0 15994 0/78/628 _ 25.21 2 1794 0.0 0.60 3.82 10.222.203.191 localnagiosxi GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
14-0 32515 0/44/628 _ 13.08 2 2835 0.0 0.29 4.45 10.222.203.191 localnagiosxi GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
15-0 419 0/40/614 _ 10.70 1 4789 0.0 0.33 3.65 10.222.203.191 localnagiosxi GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
16-0 22157 0/108/558 _ 37.88 3 0 0.0 0.67 3.81 10.222.203.151 localnagiosxi GET /server-status HTTP/1.1
17-0 11093 0/16/473 _ 3.48 3 2513 0.0 0.15 2.48 10.222.203.151 localnagiosxi GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
18-0 11119 1/14/546 C 5.52 0 0 0.0 0.07 3.29 ::1 localnagiosxi OPTIONS * HTTP/1.0
19-0 - 0/0/503 . 17.99 61 0 0.0 0.00 3.57 ::1 localnagiosxi OPTIONS * HTTP/1.0
20-0 - 0/0/485 . 12.65 0 0 0.0 0.00 2.78 ::1 localnagiosxi OPTIONS * HTTP/1.0
21-0 11120 0/19/561 _ 5.16 2 300 0.0 0.09 2.91 10.222.203.178 localnagiosxi GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
22-0 16383 0/83/491 _ 25.86 0 297 0.0 0.51 3.18 10.222.202.175 localnagiosxi GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
23-0 1246 0/44/455 _ 12.65 3 3239 0.0 0.44 3.13 10.230.5.117 localnagiosxi GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
24-0 - 0/0/497 . 16.20 179 0 0.0 0.00 3.72 ::1 localnagiosxi OPTIONS * HTTP/1.0
25-0 - 0/0/612 . 36.08 49 0 0.0 0.00 4.01 ::1 localnagiosxi OPTIONS * HTTP/1.0
26-0 16384 0/66/522 _ 27.73 2 159 0.0 0.62 2.96 127.0.0.1 localnagiosxi POST /nagiosxi/backend/ HTTP/1.1
27-0 16385 0/76/347 _ 25.63 3 163 0.0 0.47 2.16 10.222.203.142 localnagiosxi GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
28-0 - 0/0/192 . 1.86 529 0 0.0 0.00 1.40 ::1 localnagiosxi OPTIONS * HTTP/1.0
29-0 - 0/0/227 . 16.33 1370 0 0.0 0.00 1.27 ::1 localnagiosxi OPTIONS * HTTP/1.0
Re: Very Slow populating the dashboard
Posted: Wed Nov 03, 2010 9:39 am
by mguthrie
Ok, we'll do some discussing on this end and see if we can find some options that will help with the system load for larger environments like this one.
Re: Very Slow populating the dashboard
Posted: Wed Nov 10, 2010 5:33 am
by digitallook
Hello.
Did you manage to come up with any sort of solution for the loading of the ajax pages?
Re: Very Slow populating the dashboard
Posted: Wed Nov 10, 2010 3:21 pm
by mguthrie
Here's a summary on the ajax request with the dashlets from our lead developer.
On systems with either:
1. several host/service groups
2. several open window panes/tabes
3. many dashlets on an open dashboard
.. the browser is doing several ajax request. Most dashlets update anywhere between 60-90 seconds. A few update every 10 seconds, or less, but these are rare.
If you have 50 dashlets spread across your various browser windows/tabs, approximately 1 update per second is occurring. This may cause some load/delays on the browser (which can slow ajax requests like adding things to a dashboard) and can slow the server, especially if multiple users are accessing the interface.
Below is quoted from another customer with a larger monitoring setup as a possible solution. We haven't confirmed this as a solution, but I thought you might be interested. He was reporting the some issues with heavier loads related to the Ajax requests.
As an FYI, we ended up adding some EMC disks to the following directories:
/var/lib
/usr/local
This solved a huge portion of our load problem. I don't suspect that mysql (located in /var/lib) was causing any of the slowness, but I suspect that there was constant file updating happening somewhere in the nagios directory. (/usr/local/nagios, /usr/local/nagiosxi)
We'd be interested in your feedback on this, we're considering having a config setting or a tweak or some sorts that would ratchet down the refresh interval for dashlets to save on performance. Any thoughts regarding this?
Also, some users have had great success improving performance on heavy loaded systems by using DNX with XI. Here's the documentation on it if you're interested.
http://library.nagios.com/library/produ ... ith-nagios