Slow Performance of Network Analyzer

This support forum board is for support questions relating to Nagios Network Analyzer, our network traffic and bandwidth analysis solution.
OptimusB
Posts: 146
Joined: Mon Oct 27, 2014 10:08 pm
Location: Canada
Contact:

Slow Performance of Network Analyzer

Post by OptimusB »

I am experiencing some odd behavior in Network Analyzer. It takes a long time to navigate the pages as it takes a long time to load.

If I click on a particular source, the bandwidth graphs come up. The Top 5 talkers section takes a while to load, sometimes not all of them. Once I am in the source view and I click on Sources along the top menu, it will take a long time for the page to load. (Like over a minute) Then the sources page comes up blank and eventually after clicking to other menus and I come back to sources, everything is there again.

Overall it seems very sluggish. I've bumped the resources on the VM to 4cVPUs and 4GB of RAM, but it doesn't look like the box is starved of any resources. Wondering if others have experienced this. We only have a few sources collecting.
User avatar
lgroschen
Posts: 384
Joined: Wed Nov 27, 2013 1:17 pm

Re: Slow Performance of Network Analyzer

Post by lgroschen »

can you drop some error logs in here. Tail them so we can see if there is anything generating errors:

Code: Select all

tail -f /var/log/httpd/error_log
tail -f /var/log/httpd/access_log
Then try and navigate throughout Analyzer to generate some of the errors you were seeing
/Luke
OptimusB
Posts: 146
Joined: Mon Oct 27, 2014 10:08 pm
Location: Canada
Contact:

Re: Slow Performance of Network Analyzer

Post by OptimusB »

Looks like error logs are fine...

Code: Select all

[root@kdcbchngona01 ~]# tail -f /var/log/httpd/error_log
[Fri Jan 23 09:16:34 2015] [notice] caught SIGTERM, shutting down
[Fri Jan 23 09:17:12 2015] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Fri Jan 23 09:17:12 2015] [notice] Digest: generating secret for digest authentication ...
[Fri Jan 23 09:17:12 2015] [notice] Digest: done
[Fri Jan 23 09:17:13 2015] [notice] Apache/2.2.15 (Unix) DAV/2 PHP/5.3.3 configured -- resuming normal operations
[Fri Jan 23 11:21:23 2015] [notice] caught SIGTERM, shutting down
[Fri Jan 23 11:22:00 2015] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Fri Jan 23 11:22:00 2015] [notice] Digest: generating secret for digest authentication ...
[Fri Jan 23 11:22:00 2015] [notice] Digest: done
[Fri Jan 23 11:22:00 2015] [notice] Apache/2.2.15 (Unix) DAV/2 PHP/5.3.3 configured -- resuming normal operations
The delay in request doesn't seem to show any errors in the access log, but this provides a time stamp of what I am seeing....

Code: Select all

10.242.13.10 - - [23/Jan/2015:11:47:21 -0800] "GET /nagiosna/index.php/sources/3 HTTP/1.1" 200 21259 "http://10.242.13.76/nagiosna/index.php/sources" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"
10.242.13.10 - - [23/Jan/2015:11:47:21 -0800] "POST /nagiosna/index.php/api/system/source_status HTTP/1.1" 200 190 "http://10.242.13.76/nagiosna/index.php/sources/3" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"
10.242.13.10 - - [23/Jan/2015:11:47:21 -0800] "POST /nagiosna/index.php/api/reports/execute_anonymous HTTP/1.1" 200 1802 "http://10.242.13.76/nagiosna/index.php/sources/3" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"
10.242.13.10 - - [23/Jan/2015:11:47:21 -0800] "POST /nagiosna/index.php/api/graphs/execute HTTP/1.1" 200 8291 "http://10.242.13.76/nagiosna/index.php/sources/3" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"
10.242.13.10 - - [23/Jan/2015:11:47:21 -0800] "POST /nagiosna/index.php/api/reports/execute_anonymous HTTP/1.1" 200 1802 "http://10.242.13.76/nagiosna/index.php/sources/3" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"
10.242.13.10 - - [23/Jan/2015:11:47:21 -0800] "POST /nagiosna/index.php/api/reports/execute_anonymous HTTP/1.1" 200 1802 "http://10.242.13.76/nagiosna/index.php/sources/3" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"
10.242.13.10 - - [23/Jan/2015:11:47:21 -0800] "POST /nagiosna/index.php/api/reports/execute_anonymous HTTP/1.1" 200 1802 "http://10.242.13.76/nagiosna/index.php/sources/3" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"
10.242.13.10 - - [23/Jan/2015:11:47:24 -0800] "GET /nagiosna/index.php/api/system/reverse_lookup/10.62.11.39 HTTP/1.1" 200 26 "http://10.242.13.76/nagiosna/index.php/sources/3" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"
10.242.13.10 - - [23/Jan/2015:11:47:21 -0800] "POST /nagiosna/index.php/api/views/get_views HTTP/1.1" 200 2 "http://10.242.13.76/nagiosna/index.php/sources/3" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"
10.242.13.10 - - [23/Jan/2015:11:47:31 -0800] "GET /nagiosna/index.php/api/system/reverse_lookup/10.62.10.3 HTTP/1.1" 200 25 "http://10.242.13.76/nagiosna/index.php/sources/3" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"
10.242.13.10 - - [23/Jan/2015:11:47:54 -0800] "POST /nagiosna/index.php/api/system/source_status HTTP/1.1" 200 844 "http://10.242.13.76/nagiosna/index.php" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"
10.242.13.10 - - [23/Jan/2015:11:47:24 -0800] "GET /nagiosna/index.php/api/system/reverse_lookup/10.62.11.43 HTTP/1.1" 200 26 "http://10.242.13.76/nagiosna/index.php/sources/3" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"
10.242.13.10 - - [23/Jan/2015:11:47:54 -0800] "GET /nagiosna/index.php/api/system/reverse_lookup/10.62.11.43 HTTP/1.1" 200 26 "http://10.242.13.76/nagiosna/index.php/sources/3" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"
10.242.13.10 - - [23/Jan/2015:11:47:34 -0800] "GET /nagiosna/index.php/api/system/reverse_lookup/10.62.11.10 HTTP/1.1" 200 26 "http://10.242.13.76/nagiosna/index.php/sources/3" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"
10.242.13.10 - - [23/Jan/2015:11:48:14 -0800] "GET /nagiosna/index.php/api/system/reverse_lookup/10.62.11.39 HTTP/1.1" 200 26 "http://10.242.13.76/nagiosna/index.php/sources/3" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"
10.242.13.10 - - [23/Jan/2015:11:48:54 -0800] "GET /nagiosna/index.php/api/system/reverse_lookup/10.62.10.3 HTTP/1.1" 200 25 "http://10.242.13.76/nagiosna/index.php/sources/3" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"
10.242.13.10 - - [23/Jan/2015:11:48:14 -0800] "GET /nagiosna/index.php/sources HTTP/1.1" 200 13751 "http://10.242.13.76/nagiosna/index.php/sources/3" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"
10.242.13.10 - - [23/Jan/2015:11:47:27 -0800] "GET /nagiosna/index.php/api/system/reverse_lookup/142.52.245.190 HTTP/1.1" 200 29 "http://10.242.13.76/nagiosna/index.php/sources/3" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"
10.242.13.10 - - [23/Jan/2015:11:49:54 -0800] "GET /nagiosna/index.php/api/system/reverse_lookup/10.62.11.39 HTTP/1.1" 200 26 "http://10.242.13.76/nagiosna/index.php/sources/3" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"
I noticed that the time stamps seems a bit off....

Here's doing a tail -f /var/log/httpd/access_log | grep failure

Code: Select all

10.242.13.10 - - [23/Jan/2015:11:53:26 -0800] "GET /nagiosna/index.php/api/graphs/failures?sid=1&begindate=-60+minutes&enddate=-5+minutes HTTP/1.1" 200 197 "http://10.242.13.76/nagiosna/index.php" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"
10.242.13.10 - - [23/Jan/2015:11:53:26 -0800] "GET /nagiosna/index.php/api/graphs/failures?sid=3&begindate=-60+minutes&enddate=-5+minutes HTTP/1.1" 200 197 "http://10.242.13.76/nagiosna/index.php" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"
10.242.13.10 - - [23/Jan/2015:11:53:26 -0800] "GET /nagiosna/index.php/api/graphs/failures?sid=2&begindate=-60+minutes&enddate=-5+minutes HTTP/1.1" 200 197 "http://10.242.13.76/nagiosna/index.php" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"
10.242.13.10 - - [23/Jan/2015:11:53:27 -0800] "GET /nagiosna/index.php/api/graphs/failures?sid=4&begindate=-60+minutes&enddate=-5+minutes HTTP/1.1" 200 197 "http://10.242.13.76/nagiosna/index.php" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36"
User avatar
lgroschen
Posts: 384
Joined: Wed Nov 27, 2013 1:17 pm

Re: Slow Performance of Network Analyzer

Post by lgroschen »

That failure grep is showing the proper 200 return codes of the abherent behavior graphs so those are good. Have you tried rebooting the VM and checking if it made any difference?

Also what are the VM statistics, you said memory is 4GB what about HDD, CPU, etc?
/Luke
OptimusB
Posts: 146
Joined: Mon Oct 27, 2014 10:08 pm
Location: Canada
Contact:

Re: Slow Performance of Network Analyzer

Post by OptimusB »

I have tried several reboots and no success. Actually after the reboots, the sources are left in a stopped state and I have to manually start them up?

The specs of the VM:
4vCPU
4GB RAM
50GB HDD

Storage is on NFS and Network is on 10Gb interfaces. I have log server running in the same environment and that seems to work just fine.
A review of the system resources and looks like it is barely using any. 3GB+ of RAM free and 0 swap used with low CPU usage.
OptimusB
Posts: 146
Joined: Mon Oct 27, 2014 10:08 pm
Location: Canada
Contact:

Re: Slow Performance of Network Analyzer

Post by OptimusB »

It almost seems like it is waiting on network, or hanging up somewhere. Does the code reach out to the internet at all or some sort of lookup? Our server environment is blocked from reaching the internet, and could be hanging something up?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Slow Performance of Network Analyzer

Post by scottwilkerson »

OptimusB wrote:It almost seems like it is waiting on network, or hanging up somewhere.
It is, it is waiting on the NFS.
OptimusB wrote:Storage is on NFS and Network is on 10Gb interfaces. I have log server running in the same environment and that seems to work just fine.
This is definitely going to be a problem, Network Analyzer has to pull in HUGE files to process the data, and in the setup you described all of these files will have to be sent across the 10gb connection before it can start processing them.

I would highly recommend moving to local storage for Network Analyzer, preferably on SSD's, it is a VERY I/O intense application.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
OptimusB
Posts: 146
Joined: Mon Oct 27, 2014 10:08 pm
Location: Canada
Contact:

Re: Slow Performance of Network Analyzer

Post by OptimusB »

We are using the downloaded appliance and we are using NFS datastore for the VM infrastructure. I don't see any high network traffic. The data is local to the VM in a VMDK, so when I am accessing Nagios NA via the web URL, all the data is already there. Initial look at disk activity is also fairly low.

A bit more testing....

If I log in and navigate through the menus without clicking on Sources, it seems to be snappy.
Once I click on Sources, and click on a source for details, it shows the screen with bandwidth and the top talkers slowly populates.
It is from here, no matter what I click, the delay kicks in.
(When the page finally loads, it is fast to navigate to all pages again, until I click on the sources details.)

I am wondering if there is any issues with the data? or perhaps like you said, it is taking time to process the data...
Currently it is showing 1.8GB and 323MB for the two sources that I have configured.

So I narrowed it down to it doing reverse DNS lookup. I reconfigured my DNS settings and looks like it is working a bit snappier now. Looks like it is heavily relying on DNS queries when you get Source details? Top 5 Talkers still takes a while to load, but reacting much better than before.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Slow Performance of Network Analyzer

Post by abrist »

OptimusB wrote:Looks like it is heavily relying on DNS queries when you get Source details?
Yep.
OptimusB wrote:Currently it is showing 1.8GB
How long did you set this retention window? At a certain point, the amount of data retained will slow down any operation that requires parsing the historical data.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
OptimusB
Posts: 146
Joined: Mon Oct 27, 2014 10:08 pm
Location: Canada
Contact:

Re: Slow Performance of Network Analyzer

Post by OptimusB »

Currently there's about 4 days worth of data so far. With our scope, we are planning of having up to 5 weeks of data retained. Some of our issues stretches for weeks, and it would be nice to have that details there for troubleshooting. As far as impact, it sounds like this might cause performance issues once we have a lot more sources in place?
Locked