Network Analyzer Slow
-
CFT6Server
- Posts: 506
- Joined: Wed Apr 15, 2015 4:21 pm
Re: Network Analyzer Slow
So we have not gotten anywhere on this and am at a lost. Basically the incoming netflow data is of all packets and the that is not changing.
As a last bit of brainstorming to make this usable, is there some way that I can drop of limit the packets being recorded by NNA? Say drop x amount of flow data coming in?
As a last bit of brainstorming to make this usable, is there some way that I can drop of limit the packets being recorded by NNA? Say drop x amount of flow data coming in?
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: Network Analyzer Slow
I'd still REALLY like to see the IO metrics from OnCommand or maybe even a 'sysstat -x 1' output on your filer at the time that you experience the slowness.
In addition, I wouldn't totally abandon the thought that there are some DNS issues. Does this box have good DNS servers to look at (internet is unnecessary, but DNS forwarders would be good)? Maybe just some tcpdumping port 53 when you're experiencing the issue also?
In addition, I wouldn't totally abandon the thought that there are some DNS issues. Does this box have good DNS servers to look at (internet is unnecessary, but DNS forwarders would be good)? Maybe just some tcpdumping port 53 when you're experiencing the issue also?
Re: Network Analyzer Slow
To supplement jdalrymples post, I'm interested in seeing how many hostnames are currently stored in your NagiosNA database.
Code: Select all
echo "select count(*) from nagiosna.nagiosna_hostname_cache;" | mysql-
CFT6Server
- Posts: 506
- Joined: Wed Apr 15, 2015 4:21 pm
Re: Network Analyzer Slow
Isolating this VM and doing a query on the filer might be challenging with our current setup. I can ask our storage admins to get output of some stats where the VMDK is sitting.... The VMDK is sitting on NFS volume presented to the VMware environment, from NNA's perspective, this is just a local disk and we should be seeing some IO stats but I am not....
so for the hostname query...
Our Resolve Hostname settings are unchecked
Also using tcpdump to watch for DNS (port 53) shows no activity when running queries or clicking on the NNA tab in XI.
I am able to capture the disk activity when I run queries. (images attached)
Looks like it reads for a while then there is no further activities. Then a top shows that nfdump is still running for a while, then data shows up.
so for the hostname query...
Code: Select all
# echo "select count(*) from nagiosna.nagiosna_hostname_cache;" | mysql
count(*)
10
I am able to capture the disk activity when I run queries. (images attached)
Looks like it reads for a while then there is no further activities. Then a top shows that nfdump is still running for a while, then data shows up.
You do not have the required permissions to view the files attached to this post.
-
CFT6Server
- Posts: 506
- Joined: Wed Apr 15, 2015 4:21 pm
Re: Network Analyzer Slow
Adding to the DNS confirmation. I've enabled hostname lookup and it seems to be running fine. DNS resolution works and does not add additional latency. So having DNS checked or unchecked doesn't seem to affect it.
I've screen captured a query for src/dst IP for all sources in the last 2 days as a stress test. This should try to crunch through a lot of data. Here's a more detailed look with additional nmon details.
I think in the end it is just crunching through a lot of data like you said. We only added a few sources in, but have more to add. After running it for a while, some of the queries from the NNA tab in XI runs ok and doesn't take as long (from caching perhaps?) But larger queries at 24hours+ does take a while.
I've screen captured a query for src/dst IP for all sources in the last 2 days as a stress test. This should try to crunch through a lot of data. Here's a more detailed look with additional nmon details.
I think in the end it is just crunching through a lot of data like you said. We only added a few sources in, but have more to add. After running it for a while, some of the queries from the NNA tab in XI runs ok and doesn't take as long (from caching perhaps?) But larger queries at 24hours+ does take a while.
You do not have the required permissions to view the files attached to this post.
Re: Network Analyzer Slow
CFT6Server,
Based on your testing I believe we can conclude that DNS resolution isn't slowing your NNA server down in any notable way.
I was speaking with a coworker about this case, and we're wondering if you have any sources that are accepting flows from multiple hosts. I read through the thread and I don't believe that this question has been answered yet.
Is there any chance that you're sending more than one flow to the source set up to accept your switch flows? If there is more than one flow incoming, could you separate the flows into individual sources? The hope is that separating the sources out could help with performance.
Based on your testing I believe we can conclude that DNS resolution isn't slowing your NNA server down in any notable way.
I was speaking with a coworker about this case, and we're wondering if you have any sources that are accepting flows from multiple hosts. I read through the thread and I don't believe that this question has been answered yet.
Is there any chance that you're sending more than one flow to the source set up to accept your switch flows? If there is more than one flow incoming, could you separate the flows into individual sources? The hope is that separating the sources out could help with performance.
-
CFT6Server
- Posts: 506
- Joined: Wed Apr 15, 2015 4:21 pm
Re: Network Analyzer Slow
Hope I understanding you correctly. Each source is configured to collect from a single source. I have source groups configured which includes each of the sources, but that's it.
Here are the sources
Here are the sources
You do not have the required permissions to view the files attached to this post.
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: Network Analyzer Slow
CFT6Server,
The more I read this and think about it the more I'm leaning away from it being a disk issue. I really would like to see how NetApp feels about the resources it's consuming but alas it may just not happen.
I go back to page 1 though, and you posted some top screenshots where you were beating up all of your CPUs heavily. I also noted that we asked you to throw memory at it which you did but it didn't have an amazing effect. At risk of sounding lame by asking you to "just throw resources" at a problem I'd like it if you could add more cores, at least 2, preferably 4. I'd like to see how that affected the behavior if at all.
At the end of the day there is no doubt that your performance issues are from the big files you're parsing, we know that. I'm just not convinced that your disks are your bottleneck so CPU would be the next logical answer. It's obvious nfdump is CPU intensive - let's see if it's so much so that it's causing your issue.
Lastly - one other thought we've had - what would the possibility be of splitting up your flow data based upon some interfaces or groups of interfaces within the switch to breakdown the nfcap files more. That would obviously be a bit micro-managy, but it might be worth trying if not too difficult on the switch end. Obviously it's trivial within NNA.
The more I read this and think about it the more I'm leaning away from it being a disk issue. I really would like to see how NetApp feels about the resources it's consuming but alas it may just not happen.
I go back to page 1 though, and you posted some top screenshots where you were beating up all of your CPUs heavily. I also noted that we asked you to throw memory at it which you did but it didn't have an amazing effect. At risk of sounding lame by asking you to "just throw resources" at a problem I'd like it if you could add more cores, at least 2, preferably 4. I'd like to see how that affected the behavior if at all.
At the end of the day there is no doubt that your performance issues are from the big files you're parsing, we know that. I'm just not convinced that your disks are your bottleneck so CPU would be the next logical answer. It's obvious nfdump is CPU intensive - let's see if it's so much so that it's causing your issue.
Lastly - one other thought we've had - what would the possibility be of splitting up your flow data based upon some interfaces or groups of interfaces within the switch to breakdown the nfcap files more. That would obviously be a bit micro-managy, but it might be worth trying if not too difficult on the switch end. Obviously it's trivial within NNA.
-
CFT6Server
- Posts: 506
- Joined: Wed Apr 15, 2015 4:21 pm
Re: Network Analyzer Slow
Thanks jdalrymple for the response. I am starting to think the same and noticed that nfdump is basically pinned the CPU. Looks like it is a single threaded process and throw more cores at it might allow multiple queries, but each query is limited to 1 core. I have been watching this, and you can confirm the same. Looks like there have been suggestions to make nfdump a mutlthreaded process, but I can't find any updates on this.
I am open to suggestions to help increase the performance. I can add additional vCPU or RAM at this to see if there are any improvements. Current configuration is 4 cores and 8GB of RAM.
I have been talking to our network team and they are reluctant to make changes to their configuration. Some ideas which I suggested but still working on convincing them. One thought I had was to configure the netflow exports to do random sampling. This will help cut down the noise and hopefully allow queries to run faster, and give us better retention. Unfortunately this might introduce additional configuration they are not willing to entertain at this point. We have netflow exports sent to other products and we are currently doing full netflow exports. But I can also pose your suggest as well to see if it can help with this situation.
One other thought is - would there be any way to "intercept" or filter the netflow data before they are written to the folder? (trying to create some sort of random sampling on NNA's end and discard some data so the total amount of data is more manageable)
Side note - Do you know if there is any caching that happens, so that once we run a query once, it doesn't have to reparse all the files again on a separate query?
I am open to suggestions to help increase the performance. I can add additional vCPU or RAM at this to see if there are any improvements. Current configuration is 4 cores and 8GB of RAM.
I have been talking to our network team and they are reluctant to make changes to their configuration. Some ideas which I suggested but still working on convincing them. One thought I had was to configure the netflow exports to do random sampling. This will help cut down the noise and hopefully allow queries to run faster, and give us better retention. Unfortunately this might introduce additional configuration they are not willing to entertain at this point. We have netflow exports sent to other products and we are currently doing full netflow exports. But I can also pose your suggest as well to see if it can help with this situation.
One other thought is - would there be any way to "intercept" or filter the netflow data before they are written to the folder? (trying to create some sort of random sampling on NNA's end and discard some data so the total amount of data is more manageable)
Side note - Do you know if there is any caching that happens, so that once we run a query once, it doesn't have to reparse all the files again on a separate query?
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: Network Analyzer Slow
Hi CFT6Server
Re: CPU - I won't lie, I don't expect adding vCPUs to totally solve your problem. It MIGHT help though. It seems the way that the nfcap files are processed is in a serial fashion. The things I don't know and really would like to put together a lab to experiment:
1) If I run a small query is there a way to prevent nfdump from having to reread all of the nfcap files
2) How can I increase the cache lifetime for queries I run frequently (homepage queries)
3) How much time is spent on die and how much is spent waiting for IO
This is me talking out loud, I don't expect you to have the answers, these are the things I want to work on in my own head to try to help you solve your problem. In the meantime I suggested the vCPU upgrade if for no other reason than to just see if there is ANY difference.
Re: you wanting to intercept and prune your data - is that an acceptable solution for you? For many it wouldn't be, and I don't want that to be the solution. The idea behind NNA is to give you a complete view of your network. To go a step further it would be nice for us to give you that complete view in a timely fashion
Re: caching - there is no doubt a cache. As nfdump serially processes those nfcap files it filters out all the unneeded data and stores what's left in a temporary location. This process is not documented well and even worse, adjusting the properties of that cache (size, lifetime, etc) are not documented at all from what I can tell.
Sorry for not having a better answer at the moment, lets start with vCPUs and in turn I'll continue researching my above 3 thinking points.
Re: CPU - I won't lie, I don't expect adding vCPUs to totally solve your problem. It MIGHT help though. It seems the way that the nfcap files are processed is in a serial fashion. The things I don't know and really would like to put together a lab to experiment:
1) If I run a small query is there a way to prevent nfdump from having to reread all of the nfcap files
2) How can I increase the cache lifetime for queries I run frequently (homepage queries)
3) How much time is spent on die and how much is spent waiting for IO
This is me talking out loud, I don't expect you to have the answers, these are the things I want to work on in my own head to try to help you solve your problem. In the meantime I suggested the vCPU upgrade if for no other reason than to just see if there is ANY difference.
Re: you wanting to intercept and prune your data - is that an acceptable solution for you? For many it wouldn't be, and I don't want that to be the solution. The idea behind NNA is to give you a complete view of your network. To go a step further it would be nice for us to give you that complete view in a timely fashion
Re: caching - there is no doubt a cache. As nfdump serially processes those nfcap files it filters out all the unneeded data and stores what's left in a temporary location. This process is not documented well and even worse, adjusting the properties of that cache (size, lifetime, etc) are not documented at all from what I can tell.
Sorry for not having a better answer at the moment, lets start with vCPUs and in turn I'll continue researching my above 3 thinking points.