Page 1 of 1

multiline filter

Posted: Thu Oct 01, 2015 11:24 am
by Envera IT
Trying to get multiline filters working but it keeps crashing logstash. These are coming in via a log4j input so I think I'm correct in applying a filter for this instead of using the multiline input. The first event has the host name of the device sending the log, and the second event has the details of the log itself; in this case a HDD temp alarm. These always come in as an unknown event type as the DVR's are sending events to a server of ours that process transactions, that server doesn't need to know what to do with these so it just logs them. We're trying to get those logs parsed out into a usable format for reporting in NLS. Think I'm doing something wrong with the filter syntax which is crashing logstash, any help would be appreciated.

Filter matches the second event and should be joining it to the first. Regexp seems to be ok, when I test it.

Code: Select all

if [type] == "mytype" {
  	multiline {
    	pattern => "(\W|^)Received\san\sunknown\sevent\stype(\W|$)"
    	negate => true
    	what => previous
  		}
}
First Event

Code: Select all

{
  "_index": "logstash-2015.10.01",
  "_type": "mytype",
  "_id": "txKLBkFGS6GPGfg_dhfVtw",
  "_score": null,
  "_source": {
    "message": "Unknown event from some thing",
    "@version": "1",
    "@timestamp": "2015-10-01T16:08:57.032Z",
    "type": "mytype",
    "host": "10.0.1.141:54633",
    "path": "com.mycompany.east.driver.dvr.mydvrtype.comm.DvrEventHandler",
    "priority": "ERROR",
    "logger_name": "com.mycompany.east.driver.dvr.mydvrtype.comm.DvrEventHandler",
    "thread": "Incoming Data Listener",
    "class": "com.mycompany.east.driver.dvr.mydvrtype.comm.DvrEventHandler",
    "file": "DvrEventHandler.java:157",
    "method": "processFT2Event",
    "bundle.id": "84",
    "bundle.version": "2.1.3",
    "bundle.name": "com.mycompany.east.driver.dvr.mydvrtype"
  },
  "sort": [
    1443715737032
  ]
}
Second Event

Code: Select all

{
  "_index": "logstash-2015.10.01",
  "_type": "mytype",
  "_id": "3zUNQeTQRUyP_NLIACzFbQ",
  "_score": null,
  "_source": {
    "message": "Received an unknown event type - SMART HDD ALERT TEMP. Raw event = S4007601-10-15 12:08:09 R Z000 0102 SYST 001 SYSTEM              |SMART HDD ALERT TEMP|?|               0|0000000000000000",
    "@version": "1",
    "@timestamp": "2015-10-01T16:08:57.006Z",
    "type": "mytype",
    "host": "10.0.1.141:54633",
    "path": "com.mycompany.east.driver.dvr.mydvrtype.ft2.events.FT2DvrEvent",
    "priority": "ERROR",
    "logger_name": "com.mycompany.east.driver.dvr.mydvrtype.ft2.events.FT2DvrEvent",
    "thread": "Incoming Data Listener",
    "class": "com.mycompany.east.driver.dvr.mydvrtype.ft2.events.FT2DvrEvent",
    "file": "FT2DvrEvent.java:152",
    "method": "parseAlarmDescription",
    "bundle.id": "84",
    "bundle.version": "2.1.3",
    "bundle.name": "com.mycompany.east.driver.dvr.mydvrtype"
  },
  "sort": [
    1443715737006
  ]
}

***edited out a name that is not important.

Re: multiline filter

Posted: Thu Oct 01, 2015 12:12 pm
by jolson
These are coming in via a log4j input so I think I'm correct in applying a filter for this instead of using the multiline input. The first event has the host name of the device sending the log, and the second event has the details of the log itself; in this case a HDD temp alarm. These always come in as an unknown event type as the DVR's are sending events to a server of ours that process transactions, that server doesn't need to know what to do with these so it just logs them. We're trying to get those logs parsed out into a usable format for reporting in NLS. Think I'm doing something wrong with the filter syntax which is crashing logstash, any help would be appreciated.
It's quite likely that this crash is due to multiple Logstash workers. To elaborate - the last time I checked, if the Logstash process is using multiple workers (as it does in Nagios Log Server for performance purposes) the multiline filter can cause Logstash to crash. This is very likely what is happening here, and you have a couple of options:

* Reduce the amount of workers manually from 4 to 1 on all of your instances. This will keep Logstash single-threaded, but can cause other complications (such as Logstash slowing down).

* Use multiline on the input instead, since the input codec is multi-threaded.

* Concatenate your log files together on the client-side instead of concatenating them on the Log Server side. (It sounds like this might not be an option based on your post)

Is there an option that you would prefer to use? I'm happy to help you along whatever path you choose.

Re: multiline filter

Posted: Thu Oct 01, 2015 1:09 pm
by Envera IT
You are correct, I did see an error about workers when I was troubleshooting this last week. I'd like to try using the multiline input codec, the logs are coming in from a java application and the input I'm using is the log4j

https://www.elastic.co/guide/en/logstas ... iline.html is what I'm reading now.


This is the input I'm using to accept the logs from the host currently.

Code: Select all

log4j {
    type => 'mytype'
    port => 5522
}
Does this look right? I don't want to use the negate option as I only want to match logs that match the regexp pattern. Haven't tried testing it yet, figure you might tell me I'm totally off here or have better way of doing it. Unsure if I'm matching the type at the right spot in the input.

Code: Select all

log4j {
  type => "mytype"
  codec => multiline {
    pattern => "(\W|^)Received\san\sunknown\sevent\stype(\W|$)"
    what => previous
  }
}

Re: multiline filter

Posted: Thu Oct 01, 2015 3:08 pm
by jolson
Your input as configured will do the following:

Make any log that matches regex (\W|^)Received\san\sunknown\sevent\stype(\W|$) append to the previous log. Let's use the following example.

Example filter:

Code: Select all

log4j {
  type => "mytype"
  codec => multiline {
    pattern => "Some error.*"
    what => previous
  }
}
Example logs:

Some error about java
java stack trace
stack trace stuff
more stack trace
all of these logs are sent individually
Some error about httpd
Some error about crond
example crond error text


The logs would be grouped appropriately:

Code: Select all

Some error about java
java stack trace
stack trace stuff
more stack trace
all of these logs are sent individually 

Code: Select all

Some error about httpd

Code: Select all

Some error about crond
example crond error text
Assuming your regex matches the log files that you're accepting on the log4j input, you shouldn't have a problem. Sometimes the multiline syntax can be tricky, let me know if you have questions. If you have any full example logs you can send my way, I'd be happy to write up a regular expression/example input that would be appropriate.

Re: multiline filter

Posted: Sun Oct 04, 2015 5:00 pm
by Envera IT
Thanks @Jolson,

I tried using the input but it's not working as intended, however the log4j portion is still parsing out the fields as if nothing changed.

I took a look at the original log, but unfortunately, it differs from the logs I'm capturing. I found this interesting, apparently the developers of our application are writing the logs to memory and dumping those to file once a day. What they're sending to the syslog server is more verbose than what they're writing to the log file. I don't know enough about log4j to give much insight, but it looks like they're only writing a single category to the local file.

Regardless, would a few RAW events and the current input in NLS be enough to work off of?

Please give detailed comments like you've been doing or nudge me in the direction I should be looking; I want to know what I'm doing wrong. Let me know what you need.

Input

Code: Select all

log4j {
  type => "mytype"
  port => 5522
  codec => multiline {
    pattern => "(\W|^)Received\san\sunknown\sevent\stype(\W|$)"
    what => previous
  }
}
First Event

Code: Select all

{
  "_index": "logstash-2015.10.04",
  "_type": "mytype",
  "_id": "_cTsFfB9SuS6iax9z_tcAg",
  "_score": null,
  "_source": {
    "message": "Unknown event from some thing",
    "@version": "1",
    "@timestamp": "2015-10-04T21:41:02.484Z",
    "type": "mytype",
    "host": "10.0.1.141:54638",
    "path": "com.mycompany.east.driver.dvr.somecompany.comm.DvrEventHandler",
    "priority": "ERROR",
    "logger_name": "com.mycompany.east.driver.dvr.somecompany.comm.DvrEventHandler",
    "thread": "Incoming Data Listener",
    "class": "com.mycompany.east.driver.dvr.somecompany.comm.DvrEventHandler",
    "file": "DvrEventHandler.java:157",
    "method": "processFT2Event",
    "bundle.id": "84",
    "bundle.version": "2.1.3",
    "bundle.name": "com.mycompany.east.driver.dvr.somecompany"
  },
  "highlight": {
    "type": [
      "@start-highlight@mytype@end-highlight@"
    ],
    "type.raw": [
      "@start-highlight@mytype@end-highlight@"
    ]
  },
  "sort": [
    1443994862484
  ]
}
Second Event

Code: Select all

{
  "_index": "logstash-2015.10.04",
  "_type": "mytype",
  "_id": "wtbNms3ERIeJBujkhpou6A",
  "_score": null,
  "_source": {
    "message": "Received an unknown event type - EXIT ERROR. Raw event = S4007604-10-15 17:41:02 - Z000 0010 SYST 001 SYSTEM              |EXIT ERROR          |?|               0|0000000000000000",
    "@version": "1",
    "@timestamp": "2015-10-04T21:41:02.458Z",
    "type": "mytype",
    "host": "10.0.1.141:54638",
    "path": "com.mycompany.east.driver.dvr.somecompany.ft2.events.FT2DvrEvent",
    "priority": "ERROR",
    "logger_name": "com.mycompany.east.driver.dvr.somecompany.ft2.events.FT2DvrEvent",
    "thread": "Incoming Data Listener",
    "class": "com.mycompany.east.driver.dvr.somecompany.ft2.events.FT2DvrEvent",
    "file": "FT2DvrEvent.java:152",
    "method": "parseAlarmDescription",
    "bundle.id": "84",
    "bundle.version": "2.1.3",
    "bundle.name": "com.mycompany.east.driver.dvr.somecompany"
  },
  "highlight": {
    "type": [
      "@start-highlight@mytype@end-highlight@"
    ],
    "type.raw": [
      "@start-highlight@mytype@end-highlight@"
    ]
  },
  "sort": [
    1443994862458
  ]
}

Re: multiline filter

Posted: Mon Oct 05, 2015 2:37 pm
by jolson
The filter you're defining looks proper - I took this to my lab box to verify that what you're doing with the multiline input codec is appropriate - and it is. Before I get ahead of myself I'd like to link to the excellent documentation regarding the multiline codec.

I built a more simple input, and it wound up looking like this:

Code: Select all

tcp {
  type => "mytype"
  port => 5522
  codec => multiline {
    pattern => "Received\san\sunknown\sevent\stype.*"
    what => "previous"
  }
}
Now, there's not a reason that I can see why your regex would not be working - but it might be worth trying mine out to see if it works for you. When it comes to Logstash, I like to keep things as simple as possible.

After I set the input up, I took a dashboard and filtered it by the 'mytype' tag. I generated a couple of logs from a different lab device and send them over:
2015-10-05 14_30_54-mRemoteNG - jesse.png
You can see that the multiline input codec works as expected. One issue that I encountered is that Logstash is keeping track of the multiline data via TCP streams - this means that if there is a seperate TCP handshake for each individual log sent over the wire, those logs will be counted as seperate and multiline will not apply properly. Ensure that all of your logs are sent in the _same_ TCP stream.

While I don't have experience with log4j, it's worth noting that it doesn't look much different than a tcp input. As a test, you could try sending your logs over the wire via TCP using the input I specified above to see if your results are any different than mine.

Using the above filter, any event matching Received\san\sunknown\sevent\stype.* will be added to the *previous* log that was collected on the same TCP stream. My guess is that either your regex isn't working properly for some reason (try mine out) or that your client device is using a seperate TCP stream for each log that it sends over.

Let me know, thanks!

Jesse

Re: multiline filter

Posted: Tue Oct 06, 2015 12:20 pm
by Envera IT
Changing the input to TCP didn't work, garbled the text and such.

Regardless I took a packet capture of the server sending logs to NLS and these are coming in on different TCP streams. I'm talking with our developers to see what they can do on their side of things.

Thanks for the help, I'll update if I get a resolution out of them.

Re: multiline filter

Posted: Tue Oct 06, 2015 12:28 pm
by jolson
Looking forward to your response!