increase performance of nxlog / logstash
increase performance of nxlog / logstash
Hi,
one of our server is delivering most of our logfiles.
I wanted to add an additional high load logfile (high event count and long lines).
When I add the logfile I noticed, that the throughput from this server is going down rapidly. The logs are not processed in realtime.
The server which runs logserver went up to 15% CPU usage, so there should be room for more.
For log forwarding we are using nxlog.
I assume the bottleneck is either in nxlog, which cannot forward fast enough or in logstash which cannot accept fast enough.
How can I tune up the worker threads of nxlog and logstash? I suppose that would help.
This particular nxlog instance is running on Windows 2012, if it matters.
Thanks,
Andreas
one of our server is delivering most of our logfiles.
I wanted to add an additional high load logfile (high event count and long lines).
When I add the logfile I noticed, that the throughput from this server is going down rapidly. The logs are not processed in realtime.
The server which runs logserver went up to 15% CPU usage, so there should be room for more.
For log forwarding we are using nxlog.
I assume the bottleneck is either in nxlog, which cannot forward fast enough or in logstash which cannot accept fast enough.
How can I tune up the worker threads of nxlog and logstash? I suppose that would help.
This particular nxlog instance is running on Windows 2012, if it matters.
Thanks,
Andreas
Re: increase performance of nxlog / logstash
A bit strange. I tried out to remove another high load logfile. But the problem stays. No errros in logs of nxlog or logstash.
Re: increase performance of nxlog / logstash
Generally the recommendation would be to add additional NLS instances and configure some sort of load balancing.
If there's nothing crazy happening in /var/log/logstash/logstash.log, I'm pretty sure the bottleneck is in nxlog in this case. Logstash is pretty good about throwing exceptions when it's under extreme stress. If you can provide your logstash and nxlog configurations we *might* be able to offer suggestions on better optimization of those configurations.
If there's nothing crazy happening in /var/log/logstash/logstash.log, I'm pretty sure the bottleneck is in nxlog in this case. Logstash is pretty good about throwing exceptions when it's under extreme stress. If you can provide your logstash and nxlog configurations we *might* be able to offer suggestions on better optimization of those configurations.
You could certainly tune up the Java system variables by modifying /etc/init.d/logstash, but that's not something we will provide much input/advice on as a matter of policy._asp_ wrote:How can I tune up the worker threads of nxlog and logstash?
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
Re: increase performance of nxlog / logstash
here is my nxlog configuration for the particular server:
If I enable ttp (which is the 9. logfile) the performance is going down rapidly, much slower than realtime.
Code: Select all
## See the nxlog reference manual at
## http://nxlog.org/nxlog-docs/en/nxlog-reference-manual.html
## Please set the ROOT to the folder your nxlog was installed into,
## otherwise it will not start.
#define ROOT C:\Program Files\nxlog
define ROOT C:\Program Files (x86)\nxlog
define CERT %ROOT%\cert
Moduledir %ROOT%\modules
#CacheDir %ROOT%\data
#Pidfile %ROOT%\data\nxlog.pid
#SpoolDir %ROOT%\data
#LogFile %ROOT%\data\nxlog.log
CacheDir e:\nxlog\data
Pidfile g:\nxlog\data\nxlog.pid
SpoolDir e:\nxlog\data
LogFile g:\nxlog\nxlog.log
# Include fileop while debugging, also enable in the output module below
<Extension fileop>
Module xm_fileop
</Extension>
<Extension json>
Module xm_json
</Extension>
<Extension syslog>
Module xm_syslog
</Extension>
<Input internal>
Module im_internal
</Input>
# Watch your own files
<Input nxlog>
Module im_file
#File '%ROOT%\data\nxlog.log'
File 'g:\nxlog\nxlog.log'
SavePos TRUE
Exec $Message = $raw_event;
</Input>
<Input tuxedo-ulog>
Module im_file
File "g:\\amest\\logfile\\ULOG\\ULOG." + strftime(now(), "%m%d%y")
SavePos TRUE
# build the filename
EXEC if (file_name() =~/(\d\d)(\d\d)(\d\d)$/) \
{\
$logTime ="20"+$3 + "-" + $1 + "-" +$2; \
if ($raw_event =~/^(\d\d)(\d\d)(\d\d)/ )\
{\
$logTime = $logTime + " " + $1+":" + $2 + ":" + $3;\
}\
};
#frop trace lines
Exec if ($raw_event =~/.*TRACE\:.*/) drop();
Exec $Component = 'Tuxedo';
Exec $LogFile = 'ULOG';
</Input>
<Input tuxedo-useractionlog>
Module im_file
File "g:\\amest\\logfile\\useractionlog\\useractions-" + strftime(now(), "%Y%m%d") + ".log"
SavePos TRUE
Exec $Component = 'Tuxedo';
Exec $LogFile = 'useractionlog';
</Input>
<Input tuxedo_tmadmin_usercount>
Module im_file
File "g:/Input_for_Kibana/prod/Tuxedo/tmadmin_usercount-" + strftime(now(), "%Y%m%d") + ".log"
SavePos TRUE
Exec $Component = 'Tuxedo';
Exec $LogFile = 'tmadmin_usercount';
</Input>
<Input rdautoorders>
Module im_file
File "g:/amest/logfile/rdAutoOrders/rdautoorders-" + strftime(now(), "%Y%m%d") + ".log"
SavePos TRUE
Exec $Component = 'Tuxedo';
Exec $LogFile = 'rdautoorders';
</Input>
<Input tapdispatcher>
Module im_file
File "g:/amest/logfile/tapDispatcher/tap-"+ strftime(now(), "%Y%m%d") + ".log"
SavePos true
Exec $Component = 'Tuxedo';
Exec $LogFile = 'tapdispatcher';
</Input>
<Input ttp>
Module im_file
File "g:/amest/logfile/ttp/ttp-"+ strftime(now(), "%Y%m%d") + ".log"
SavePos true
Exec $Component = 'Tuxedo';
Exec $LogFile = 'ttp';
</Input>
<Input topbeat>
Module im_file
File "g:/Input_For_Kibana/topbeat/topbeat.log"
SavePos true
Exec $Component = 'SystemStats';
Exec $LogFile = 'topbeat';
Exec $osType = 'win';
</Input>
<Input smsc-http-recv>
Module im_file
File "g:/amest/logfile/smsc-http-recv/smsc-http-recv-"+ strftime(now(), "%Y%m%d") + ".log"
SavePos true
Exec $Component = 'Tuxedo';
Exec $LogFile = 'smsc-http-recv';
</Input>
<Input smsc-http-send>
Module im_file
File "g:/amest/logfile/smsc-http-send/smsc-http-send-"+ strftime(now(), "%Y%m%d") + ".log"
SavePos true
Exec $Component = 'Tuxedo';
Exec $LogFile = 'smsc-http-send';
</Input>
<Output out>
# connection stuff
Module om_tcp
Host xxx.xxx.xxxx.xxxx
Port xxxx
# server stuff like name, stage, etc.
Exec $HostName = 'xxx';
Exec $Stage = 'Production';
Exec $ServerType = 'yyyy';
# write copy the raw event to message field
Exec $message = $raw_event;
# convert raw event including all fields to json
Exec $raw_event = to_json();
# Uncomment for debug output
#Exec file_write('g:\nxlog\nxlog_debug_output.log', $raw_event + "\n");
</Output>
<Route 1>
Path internal, tuxedo-ulog, tuxedo-useractionlog, tuxedo_tmadmin_usercount, topbeat, smsc-http-recv,smsc-http-send, rdautoorders, tapdispatcher => out
</Route>
Re: increase performance of nxlog / logstash
when adding the ttp log in nxlog, top is going up like this:
load 1 before was at 0.5
But still I don't see any bottleneck here on the NLS.
As I wrote before I doublechecked again:
everything from this server is delayed, where nxlog needs to handle ttp log.
I don't see high disk queues or high I/O on the log drive on the server running nxlog.
All other servers with their own nxlog are running smooth as before.
Code: Select all
top - 10:37:57 up 4 days, 23:25, 3 users, load average: 1.40, 0.90, 0.71
Tasks: 255 total, 1 running, 254 sleeping, 0 stopped, 0 zombie
Cpu(s): 7.4%us, 0.5%sy, 14.6%ni, 77.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 20462820k total, 19616600k used, 846220k free, 263096k buffers
Swap: 8388320k total, 56356k used, 8331964k free, 6540136k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23398 nagios 39 19 5298m 575m 14m S 195.5 2.9 8:29.53 java
61938 nagios 20 0 25.5g 10g 242m S 91.4 54.5 1538:17 java
But still I don't see any bottleneck here on the NLS.
As I wrote before I doublechecked again:
everything from this server is delayed, where nxlog needs to handle ttp log.
I don't see high disk queues or high I/O on the log drive on the server running nxlog.
All other servers with their own nxlog are running smooth as before.
Re: increase performance of nxlog / logstash
I started a 2nd nxlog instance on the server in foreground run mode.
The problem stayed the same.
The problem stayed the same.
Re: increase performance of nxlog / logstash
I "played" a bit more.
The problem seems to be logstash / filter specific.
I set up an additional logstash instance (2.4.1 of pure elk) on the NLS server. Using an additional nxlog instance to ship the log.
I used following config:
Here I got throughput of about 60 events per second. I don't know, yet, if that was slower than realtime or if only 60 lines per second are generated.
But the interesting thing is, when I enabled the filter.
Metrics was going down: starting with 16 events per sec and throughput is falling the longer it runs. Interesting: Load of server goes up to 22 when running in separate logstash instance (8 working threads).
So I assume I need some advice in optimizing my filter code.
I would be glad if you could point at some pitfalls.
I assume I am not allowed to post sample log lines here.
But each line is between 300 and 16k characters long.
The field payload up to 8k characters.
The problem seems to be logstash / filter specific.
I set up an additional logstash instance (2.4.1 of pure elk) on the NLS server. Using an additional nxlog instance to ship the log.
I used following config:
Code: Select all
input
{
tcp
{
type => 'import_json'
tags => 'import_json'
port => 5000
codec => json
}
}
filter {
metrics {
meter => "LogFile"
add_tag => "metric"
flush_interval => 10
}
}
output {
if "metric" in [tags] {
stdout {
#codec => line {
# format => "1m rate: %{LogFile.rate_1m} ( %{LogFile.count} )"
#}
codec => "rubydebug"
}
}
}
output
{
null{}
}
But the interesting thing is, when I enabled the filter.
Metrics was going down: starting with 16 events per sec and throughput is falling the longer it runs. Interesting: Load of server goes up to 22 when running in separate logstash instance (8 working threads).
So I assume I need some advice in optimizing my filter code.
Code: Select all
if ([LogFile] == 'ttp' or [type] == 'ttp')
{
# drop if it is a comment line
if [message] =~ /^#.*/
{
drop
{
}
}
else
{
# parse the message
grok
{
match => ['message','^(?<msisdn>\+\d+)\s*(-)?\s+0x(?<ttpSeqIdHex>[\d\w]{4})\s*\[.*\]\s*\w+\s*(?<logTime>\d{2}\.\d{2}\.\d{4} \d{2}\:\d{2}\:\d{2})\s*(?<sidHex>[\d\w]+)\s*TTP DATA\:(?<payload>(?<ttpProtocolVersionHex>[\d\w]{2}) [\w\d]{2} (?<ttpLengthHex>[\w\d]{2} [\w\d]{2}).*) \(.*']
}
# convert hex to integer
ruby
{
code =>
"
#convert ttplength
hexString = event['ttpLengthHex']
number = hexString[0..1] + hexString[3..4]
event['ttpPacketLength'] = number.hex
# convert seq id
event['ttpSeqIdInt'] = event['ttpSeqIdHex'].to_s.hex
# convert protocol version
event['ttpProtocolVersionInt'] = event['ttpProtocolVersionHex'].to_s.hex
# convert SID
event['sidInt'] = event['sidHex'].to_s.hex
event['sidHex'] = '0x' + event['sidHex'].to_s
"
}
# check for sid = 0x82 (check as integer because it should be faster)
if [sidInt] == 130
{
# parse payload details (tap protocol for tbm2 only)
if [payload] =~ /.*02 07 02 .. .. .. .. 28 00 00.*/
{
# is position message
mutate { add_field => { 'messageType' => 'Position' } }
}
else
{
# is maintenance data
if [payload] =~ /.*02 07 02 .. .. .. .. 28 00 20.*/
{
mutate { add_field => { 'messageType' => 'Maintenance' } }
}
else
{
# is driving time message
if [payload] =~ /.*02 07 02 .. .. .. .. 28 00 40.*/
{
mutate { add_field => { 'messageType' => 'DrivingTime' } }
# process details of driving time messages
grok
{
match => ['payload','.*02 07 02 .. .. .. .. 28 00 40 (?<dtcoDetails>..).*']
}
# parse the details of dtco message
if [dtcoDetails] == '01'
{
mutate { add_field => { 'messageTypeDetails' => 'tco_data_cyclic' } }
}
else
{
if [dtcoDetails] == '02'
{
mutate { add_field => { 'messageTypeDetails' => 'tco_data_on_card_in' } }
}
else
{
if [dtcoDetails] == '03'
{
mutate { add_field => { 'messageTypeDetails' => 'tco_data_on_card_out' } }
}
else
{
if [dtcoDetails] == '04'
{
mutate { add_field => { 'messageTypeDetails' => 'tco_data_on_task_change' } }
}
else
{
if [dtcoDetails] == '0d'
{
mutate { add_field => { 'messageTypeDetails' => 'tco_data_on_ignition_off' } }
}
else
{
if [dtcoDetails] == '0e'
{
mutate { add_field => { 'messageTypeDetails' => 'tco_data_on_warn_driver_1' } }
}
else
{
if [dtcoDetails] == '12'
{
mutate { add_field => { 'messageTypeDetails' => 'tco_data_on_change_followUpTime_1_to_2' } }
}
else
{
if [dtcoDetails] == '13'
{
mutate { add_field => { 'messageTypeDetails' => 'tco_data_on_motor_on' } }
}
}
}
}
}
}
}
}
}
else
{
if [payload] =~ /.*02 07 02 .. .. .. .. 28 00 50.*/
{
mutate { add_field => { 'messageType' => 'TourHistory' } }
}
else
{
if [payload] =~ /.*02 07 02 .. .. .. .. 28 00 60.*/
{
mutate { add_field => { 'messageType' => 'VOA' } }
}
else
{
if [payload] =~ /.*02 07 02 .. .. .. .. 28 00 70.*/
{
mutate { add_field => { 'messageType' => 'Geofencing' } }
}
}
}
}
}
}
}
# convert extracted timestamp logTime to elasticSearch's timestamp field
date
{
match => ['logTime', 'dd.MM.YYYY HH:mm:ss']
timezone => 'UTC'
remove_field => ['logTime']
}
# mutate part
mutate
{
# delete fields no longer used
remove_field => ['dtcoDetails']
remove_field => ['payload']
}
}
}
I assume I am not allowed to post sample log lines here.
But each line is between 300 and 16k characters long.
The field payload up to 8k characters.
Re: increase performance of nxlog / logstash
That would depend more on your organizations policy for that sort of thing. If you can provide some sample logs I can take a closer look for sure. No promises on results though_asp_ wrote:I assume I am not allowed to post sample log lines here.
I doubt that would fit in a forum post, but you could put it on a service like pastebin and share it that way.
This information combined with the logstash filter presented definitely lead me to believe the bottleneck is logstash._asp_ wrote: But each line is between 300 and 16k characters long.
The field payload up to 8k characters.
If payload has a max size of 8k characters, I can pretty much guarantee all the regex matches against that field is a large part of what's causing the bottleneck. Stuff like if [payload] =~ /.*02 07 02 .. .. .. .. 28 00 60.*/ has an awful lot of overhead since the message needs to be evaluated in it's entirety with matches like this.
If all you're examining is hexidecimal data, you could probably write a pretty lean Ruby filter to leverage some bitwise operators and tag the messages that way:
https://calleerlandsson.com/rubys-bitwise-operators/
The added benefit of using bitwise operators would also be that you could effectively knock the message's size down at each evaluation step, but that depends on how the data is structured.
The grok filter applied to the message also has a lot of overhead:
Code: Select all
grok
{
match => ['message','^(?<msisdn>\+\d+)\s*(-)?\s+0x(?<ttpSeqIdHex>[\d\w]{4})\s*\[.*\]\s*\w+\s*(?<logTime>\d{2}\.\d{2}\.\d{4} \d{2}\:\d{2}\:\d{2})\s*(?<sidHex>[\d\w]+)\s*TTP DATA\:(?<payload>(?<ttpProtocolVersionHex>[\d\w]{2}) [\w\d]{2} (?<ttpLengthHex>[\w\d]{2} [\w\d]{2}).*) \(.*']
}Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
Re: increase performance of nxlog / logstash
thanks for the answer.
I will try on improving the regex in first place. I will try to get rid of some .* stuff and to reduce the steps regex needs. I assume here is the most potential.
When I have done it I will keep you updated on the result
I will try on improving the regex in first place. I will try to get rid of some .* stuff and to reduce the steps regex needs. I assume here is the most potential.
When I have done it I will keep you updated on the result
Re: increase performance of nxlog / logstash
performance increase is 16 events /s to 160 events / s.
But what I don't understand is the following:
My cpu is only under 30% load (4cpu with hyperthreading) Logstash has 8 working threads. Is there a way to increase multithreading per logfile? There is no need for logstash keeping this log serialized since we have no multilining in this log and no multilining filter at all in logstash. Multilining I am currently solving via nxlog.
Disk (ssd) seems to be no bottleneck (disk queue is 0.5 and active time at 25%)
But what I don't understand is the following:
My cpu is only under 30% load (4cpu with hyperthreading) Logstash has 8 working threads. Is there a way to increase multithreading per logfile? There is no need for logstash keeping this log serialized since we have no multilining in this log and no multilining filter at all in logstash. Multilining I am currently solving via nxlog.
Disk (ssd) seems to be no bottleneck (disk queue is 0.5 and active time at 25%)