Page 3 of 8
Re: Network Analyzer Slow
Posted: Mon May 11, 2015 1:04 pm
by jomann
We recommend that NNA be installed on a bare metal server, not a VM - unless that the VM has access to physical hard drives on the machine it is currently running on. The reason being is that nfcapd/sfcapd create .flow files that are just binary files that are written out to a directory on the disk - these are then read by nfdump to give you the ability to query. The top 5 talkers, the last 30 min bar charts, and the actual saving of data is all I/O operations. The slower the I/O, the slower the whole NNA system will go. We've seen in some peoples environments that when they are connected to a shared disk system (a NAS for example) they don't get nearly enough read/write to comfortably perform a query in the GB of size in a timely manner because they are reading/writing at less than 2MB/s.
The problem with queries running slow may also have to due with I/O but moreso because queries have to read ALL the file data for the selected time period in order to get the results back.
What are you using for the shared disk on the virtual machine? Have you tried it with something capable of higher speed I/O? Can you test the speed of the transfer rate of your NAS (if applicable) and see if that is slowing it down?
The calculation message is wrong. But this doesn't affect your actual query/reports. Sometimes it has a hard time with checking if the actual hours/days are longer than one another. It's a bug that's being fixed in the next release.
Re: Network Analyzer Slow
Posted: Mon May 11, 2015 1:55 pm
by CFT6Server
Putting this in a physical server will not be possible at this point. However, our virtual environment is at an enterprise setup with Cisco UCS infrastructure with servers requiring much higher IO than what I believe NNA is requesting. We are using NetApp storage with SSD caching and the VM storage is on NFS. Even though it is shared, it is capable of running at high speeds.
I have been trying to gauge the disk performance of the VM, but I am not having any luck. Running sysstat tools only shows writes but not reads? I am kind of confuse by that. Looking at the VM performance, we are getting maximum of 2ms latency for disks. I am trying to get actual readings to confirm that it is reading the files and it is having disk performance issues.
I ran sar and iostat, both did not produce any read statistics.
This is the dd test on the VM.
Code: Select all
# dd if=/dev/zero of=/root/testfile bs=1G count=1 conv=fdatasync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 2.41284 s, 445 MB/s
# dd if=/dev/zero of=/root/testfile bs=64k count=32000 conv=fdatasync
32000+0 records in
32000+0 records out
2097152000 bytes (2.1 GB) copied, 5.41672 s, 387 MB/s
# dd if=/root/testfile of=/dev/null
4096000+0 records in
4096000+0 records out
2097152000 bytes (2.1 GB) copied, 2.04559 s, 1.0 GB/s
# dd if=/root/testfile of=/dev/null bs=64k count=32000
32000+0 records in
32000+0 records out
2097152000 bytes (2.1 GB) copied, 0.303178 s, 6.9 GB/s
# dd if=/root/testfile of=/dev/null bs=8k count=32000
32000+0 records in
32000+0 records out
262144000 bytes (262 MB) copied, 0.0540918 s, 4.8 GB/s
Some additional testing
Code: Select all
# hdparm -tT /dev/sda1
/dev/sda1:
Timing cached reads: 15948 MB in 2.00 seconds = 7983.65 MB/sec
Timing buffered disk reads: 236 MB in 3.01 seconds = 78.50 MB/sec
FYI - I've verified that the iostat reads are showing stats during the tests, but I still cannot see any read stats during the queries.
Re: Network Analyzer Slow
Posted: Mon May 11, 2015 2:29 pm
by jdalrymple
Just some additional help from the storage end of things...
With NFS on NetApp - make good and sure your disk is aligned. If not it's 2 reads per 4k block. Don't ignore this step. Incidentally if that 1 VM is aligned and the other 23,999 VMs on your array are not you will potentially suffer.
Ask your SAN administrators to watch the behavior of the VM using OnCommand. They'll have to isolate the VM somehow, perhaps on its own host, perhaps on its own datastore, but then you'd get the best metrics - those offered by the disks themselves.
Lastly - regarding NetApp's SSD Cache (Flash Cache), it's read-only which is good for you, however it's pretty simplistic in its management capabilities. If your SAP or ERP is on the same array it's likely that the dbs they're beating up are more likely to own that cache than your NNA stream. This ultimately results in NNA getting data at rest speeds. If you're talking flash-pool, that's a different story, but it's also not a cache so I'll assume you're not.
Re: Network Analyzer Slow
Posted: Mon May 11, 2015 3:51 pm
by CFT6Server
Thanks and I will check with our SAN admins. This cluster and the VMs on it aren't very demanding VMs. It is used for management servers only. I don't think it is contenting with a large number of VMs. I agree that isolating it will give us better metrics, but does not return real life results since a VM will never be on its own. Based on what we are seeing or getting from the current VM, is that sufficient? This is what I think should be gauged. However while running NNA, I really don't see the disk activity that you are speaking of and perhaps I am looking at the wrong metrics?
If it is working away trying to read the files, a process or disk activity would show I assume? But I am not seeing that. How can I better get a sense of what NNA is doing? Any suggestions on that front?
Re: Network Analyzer Slow
Posted: Tue May 12, 2015 11:05 am
by jdalrymple
CFT6Server wrote:If it is working away trying to read the files, a process or disk activity would show I assume? But I am not seeing that. How can I better get a sense of what NNA is doing? Any suggestions on that front?
You might want to install iotop and see if it lends any useful data. I don't expect it will - the one thing we may see would be if we actually are thrashing your disks or not. Note you have to run iotop as root.
Code: Select all
[root@localhost ~]# yum -y install iotop
Re: Network Analyzer Slow
Posted: Tue May 12, 2015 1:25 pm
by CFT6Server
I am still not seeing any disk reads even with iotop which is very strange. When I run the queries, I just can't seem to gauge the read performance. I've tried iotop, iostat, sar. When I am using tools like dd and seeker, it shows the reads.... Like you said, it should at least show if something is thrashing away at the disk, which it isn't....
As I need to be able to provide some reports and usage out of NNA, let's take a step back and review what is gathered so far.
1. The queries taking a long time due to size of flow data and how NNA is currently programmed to read all the data.
2. In trying to review disk usage, various tools have been used to capture read/write performance, but does not show any reads during the queries.
3. If read performance is the bottleneck, then we should be able to see this in read performance or latency somehow?
4. Seems that once the query is stuck, it will just keep spinning. There is no timeout. I've left this running for a while and it was still spinning away.
Some theories
Even though it is true that NNA has to parse through a lot of files, once the chord diagram finishes, processes (monitored via top) stops once that is completed. The query results will continue to spin. I think there is a limit somewhere that it is hitting? Perhaps the size of the data or the number of returned results, or perhaps both? After a certain point, the queries just does not finish... ever, just the chord diagram does.
So even with faster disks, which might improve on how fast the chord diagram gets produces, I am not entire convinced that this will help with the query results. Unless we can monitor and find the bottleneck. Even during the high CPU usage, I still could not observe disk read activity.
Is there any other debug built in to actually follow what NNA is doing under the hood? Sorry I apologize in advance for any wrong assumptions made here, as they are based on what I know regarding NNA so far.
Re: Network Analyzer Slow
Posted: Tue May 12, 2015 2:17 pm
by CFT6Server
I finally got it to show the reads. This shows up when I click on the Network Traffic tab within XI. IN contrast, when I click on "open this query in Network Analyzer", it does not show any read activity.
I can see the disk reads and looks to top out at about 27-30MB/s. Interestingly enough. After capturing this the first time, it does not show up no matter what I run.
Re: Network Analyzer Slow
Posted: Tue May 12, 2015 4:38 pm
by jdalrymple
CFT6Server wrote: Interestingly enough. After capturing this the first time, it does not show up no matter what I run.
This makes it sound like some caching is happening somewhere in-system. When the reads don't exist the UI load is still slow though? That would certainly take the blame off the disks (something a former SAN admin loves to do).
Re: Network Analyzer Slow
Posted: Tue May 12, 2015 4:59 pm
by CFT6Server
This could be true. Since I am definitely not see the disk activity anymore. Perhaps the data it requires is still in memory? But doesn't explain why it still takes a long time to process the queries. I will try to do more testing with the disk activity and monitor the resources today.
Re: Network Analyzer Slow
Posted: Wed May 13, 2015 1:39 pm
by lmiltchev
I will try to do more testing with the disk activity and monitor the resources today.
Sounds good! Let us know if you are still having the same issue.