NFS tuning or LogServer tuning to help CPU usage

chud · Post by **chud** » Thu Oct 03, 2019 9:26 am

Hello folks.

My Nagios Log Server is storing logs in an NFS share on our Isilon. I know local storage is preferable, but this was what was best for us in terms of available space.

Anyway, our Log Server (which has 4 CPU) keeps alerting in Nagios XI, and after some troubleshooting it looks like accessing the NFS share may be one reason for the high CPU usage.

One additional piece of info, the CPU alerts became more of a problem after we added our Cisco Firepower to start sending logs; we have dialed this device back to only sending logs for errors, but CPU alerts continue with the Log Server.

Any advice on tuning to help with CPU usage?

Post by **mbellerue** » Thu Oct 03, 2019 1:16 pm

NFS storage tuning is outside of what we can help with. I can point you to a helpful doc on the subject, but everyone's storage setup is unique to their environment.

The Linux Documentation Project: Optimizing NFS Performance
https://www.tldp.org/HOWTO/NFS-HOWTO/performance.html

chud · Post by **chud** » Thu Oct 03, 2019 2:29 pm

How about Log Server or ELK stack tuning?

Post by **mbellerue** » Thu Oct 03, 2019 2:59 pm

We do have a doc on general performance tuning. I apologize, I should have posted this earlier.
https://assets.nagios.com/downloads/nag ... hrough.pdf

For ELK stack, that's a little more complicated. You can certainly search and find a number of articles related to performance tuning, but do be careful before implementing any changes.

chud · Post by **chud** » Thu Oct 03, 2019 4:45 pm

My main question regarding the ELK stack is if I increase the log server's RAM or CPU, what should I adjust in either LogServer or in ELK to take advantage of the additional resources?
For example: Logstash workers, ElasticSearch Java heap size, etc ?

scottwilkerson · Post by **scottwilkerson** » Fri Oct 04, 2019 6:42 am

chud wrote:My main question regarding the ELK stack is if I increase the log server's RAM or CPU, what should I adjust in either LogServer or in ELK to take advantage of the additional resources?
For example: Logstash workers, ElasticSearch Java heap size, etc ?

You just need to restart elasticsearch

We have a script that calculates what the best heap size it that set it when elasticsearch starts

chud · Post by **chud** » Fri Oct 04, 2019 8:51 am

scottwilkerson wrote: We have a script that calculates what the best heap size it that set it when elasticsearch starts

Can you provide the location of that script so that I can take a look at it?

scottwilkerson · Post by **scottwilkerson** » Fri Oct 04, 2019 9:01 am

On a CentOS/RHEL system it is

Code: Select all

/etc/sysconfig/elasticsearch

On Ubuntu/Debian

Code: Select all

/etc/default/elasticsearch

The line you are looking for is

Code: Select all

ES_HEAP_SIZE=$(expr $(free -m|awk '/^Mem:/{print $2}') / 2 )m

chud · Post by **chud** » Tue Nov 05, 2019 10:45 am

As mentioned previously, our Nagios Log Server is alerting in Nagios XI because of all the traffic from our Cisco Firepower and various servers that are sending logs.
However at this point we don't even have all of our servers, routers, and switches sending to Log Server yet.

So the question is, what do you do for Log Server to help it handle the traffic?

Is it just a matter of adding more CPU and increasing the NIC capacity on the server itself?
Or can you balance the traffic if you have a cluster?
My understanding is that if you have a 2-node cluster, the second log server is just a mirror of the first - so it is a cluster from a data redundancy standpoint, but not from a performance standpoint - there is no added performance benefit.
If we go to a 3-node (or more) cluster, is there a performance advantage (sort of like putting multiple web servers behind a load balancer)?

scottwilkerson · Post by **scottwilkerson** » Tue Nov 05, 2019 11:43 am

chud wrote: Is it just a matter of adding more CPU and increasing the NIC capacity on the server itself?

this will help a little

chud wrote:Or can you balance the traffic if you have a cluster?

Yes this would be the preferred method, you can send logs to any of the instances in the cluster and this will spread out the load caused by log ingestion

chud wrote:My understanding is that if you have a 2-node cluster, the second log server is just a mirror of the first - so it is a cluster from a data redundancy standpoint, but not from a performance standpoint - there is no added performance benefit.

This is incorrect, you can use all instances for spreading the load for ingestion. Additionally, while you are correct in that with a 2 node cluster you have a replica, you are incorrect in thinking that it doesn't help performance.

Each of the nodes can not only be used for ingestion of logs, but additionally, they all participate in spreading the load when log data is being queried no matter which node you are logged into, they all share the load.

Nagios Support Forum

NFS tuning or LogServer tuning to help CPU usage

NFS tuning or LogServer tuning to help CPU usage

Re: NFS tuning or LogServer tuning to help CPU usage

Re: NFS tuning or LogServer tuning to help CPU usage

Re: NFS tuning or LogServer tuning to help CPU usage

Re: NFS tuning or LogServer tuning to help CPU usage

Re: NFS tuning or LogServer tuning to help CPU usage

Re: NFS tuning or LogServer tuning to help CPU usage

Re: NFS tuning or LogServer tuning to help CPU usage

Re: NFS tuning or LogServer tuning to help CPU usage

Re: NFS tuning or LogServer tuning to help CPU usage