Interface slow down

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Locked
stecino
Posts: 248
Joined: Thu Mar 14, 2013 4:42 pm

Interface slow down

Post by stecino »

Hello,

My interface has slowed down a great deal. This is what I have. Each node has 4 VCPU's and 8GB RAM

IP Hostname Port 1m, 5m, 15m Load CPU % Memory Used Memory Free Storage Total Storage Available Elasticsearch Logstash Actions
10.xx.x.247 xxx2nls2 9300 1.31, 0.83, 0.77 45% 23% 76% 639.8GB 536.3GB [Elasticsearch is running...] [Logstash is running...] -
10.xx.x.246 xxx2nls1 9300 0.39, 0.22, 0.32 14% 25% 74% 639.8GB 532.8GB [Elasticsearch is running...] [Logstash is running...] -
10.yy.y.246 yyy2nls1 9300 0.18, 0.34, 0.37 9% 23% 76% 639.8GB 532.7GB [Elasticsearch is running...] [Logstash is running...] -
10.yy.y.147 yyy2nls2 9300 2.13, 1.65, 1.71 19% 26% 73% 639.8GB 534.8GB [Elasticsearch is running...] [Logstash is running...] -


590,538,616 Documents
135.7GB Primary Size
265.4GB Total Size
4 Data Instances
272 Total Shards
28 Indices


4 Total Instances
0 Client
4 Master/Data
16 Processors
0% Process CPU
5.00 GBMemory Used
0 bytesSwap
2,559.20 GB Total Storage
2,272.31 G BFree Storage
8.39 G BData Read
8.73 GB Data Written
17.12 GB I/O Size

This is what the top shows:

1213 nagios 20 0 37.1g 2.1g 934m S 194.2 26.9 4051:37 /usr/bin/java -Xms256m -Xmx1g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyO
28899 nagios 39 19 2966m 392m 11m S 104.1 4.9 1634:49 /usr/bin/java -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Xmx500m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75

One of the processes is logstash, the other one is elastic search

Has anyone run into issues like I am doing?
cmerchant
Posts: 546
Joined: Wed Sep 24, 2014 11:19 am

Re: Interface slow down

Post by cmerchant »

How many servers are you collecting logs from?

How many documents are you collecting per day/hour/minute?

How Long do you retain you log data?

Have considered filtering your inbound data to the logserver cluster?

Are each of your nodes on the same vmware server?

Your cpu % level would indicate your I/O bound, could be disk, or network.
stecino
Posts: 248
Joined: Thu Mar 14, 2013 4:42 pm

Re: Interface slow down

Post by stecino »

cmerchant wrote:How many servers are you collecting logs from?

How many documents are you collecting per day/hour/minute?

How Long do you retain you log data?

Have considered filtering your inbound data to the logserver cluster?

Are each of your nodes on the same vmware server?

Your cpu % level would indicate your I/O bound, could be disk, or network.
At the moment I have 80 logsources. This includes server level logs, as well as application logs.
I am collecting anywhere between 45-90 million documents a day.
So far I have it set to 30 day retention.
What kind of filtering would you propose?
All my nodes have identical resources.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Interface slow down

Post by abrist »

stecino wrote:What kind of filtering would you propose?
It all depends on what you need. Take a look at the logs coming from a source. Are there any lines you feel you do not need?
As you may be I/O bound, have you though about spinning up a second instance or increasing the speed of your disk subsystem?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
stecino
Posts: 248
Joined: Thu Mar 14, 2013 4:42 pm

Re: Interface slow down

Post by stecino »

abrist wrote:
stecino wrote:What kind of filtering would you propose?
It all depends on what you need. Take a look at the logs coming from a source. Are there any lines you feel you do not need?
As you may be I/O bound, have you though about spinning up a second instance or increasing the speed of your disk subsystem?
I updates setup-linux.sh to use UDP, as oppose to TCP, I am also made sure that logsources were sending to cluster node that was in the same network.
This addressed the issue.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Interface slow down

Post by tmcdonald »

As long as you understand and accept the risks associated with UDP, that will reduce the overhead somewhat.
Former Nagios employee
Locked