Page 1 of 1

shipper.py problem's

Posted: Tue Feb 07, 2017 11:14 am
by bennyboy
HI,

I try to use shipper.py to push some log to analyze them.

Code: Select all

cat as_sq_prd_408_01_SystemOut_17.02.01_22.09.54.log | python ../shipper.py | nc nagioslogserver 2057
I have two problem. First is about

Code: Select all

'utf8' codec can't decode bytes in position 131-133: invalid data

Code: Select all

 cat as_sq_prd_408_01_SystemOut_17.02.01_22.09.54.log  | python shipper.py | nc nagioslogserver 2057
'utf8' codec can't decode bytes in position 131-133: invalid data
Traceback (most recent call last):
  File "shipper.py", line 242, in ?
    main()
  File "shipper.py", line 237, in main
    process_stream(sys.stdin, message)
  File "shipper.py", line 217, in process_stream
    print json.dumps(message)
  File "/usr/lib/python2.4/site-packages/simplejson/__init__.py", line 230, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python2.4/site-packages/simplejson/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.4/site-packages/simplejson/encoder.py", line 260, in iterencode
    return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 131-133: invalid data
The second problem is about multiline support. In that log file I have couple java stack trace and that stack trace is not in 1 message. I see it in multiple log entry in Nagios Log Server. Not really usefull for us.
It's possible to use JSON with multiline support or other option to support that type of log entry.

Thank you!

Re: shipper.py problem's

Posted: Tue Feb 07, 2017 11:25 am
by mcapra
Do you know the charset used in the as_sq_prd_408_01_SystemOut_17.02.01_22.09.54.log file? Can you share the outputs of the following commands (adjust /path/to for your system's logical path to the file):

Code: Select all

file -bi /path/to/as_sq_prd_408_01_SystemOut_17.02.01_22.09.54.log
locale
bennyboy wrote:It's possible to use JSON with multiline support or other option to support that type of log entry.
Yup! Although, if it's an awful lot of lines (100+) the multiline codec built-in to Logstash can fail. If you were receiving the JSON on a generic TCP input:

Code: Select all

## Input Rule
tcp {
	codec => multiline
	{
		pattern => '^\{'
		negate => true
		what => previous                
	}
	type => 'multiline_json'
	port => '2345'
}

## Filter Rule
if [type] == 'multiline_json' {
json {
    source => message
    remove_field => message
  }
}
If you run into issues with that input/filter set, please post a sample of your JSON here.

Re: shipper.py problem's

Posted: Tue Feb 07, 2017 12:07 pm
by bennyboy
The result of file -bi as_sq_prd_408_01_SystemOut_17.02.01_22.09.54.log is text/plain; charset=us-ascii
The locale of the server where I run the shipper.py

Code: Select all

LANG=en_CA.UTF-8
LC_CTYPE="en_CA.UTF-8"
LC_NUMERIC="en_CA.UTF-8"
LC_TIME="en_CA.UTF-8"
LC_COLLATE="en_CA.UTF-8"
LC_MONETARY="en_CA.UTF-8"
LC_MESSAGES="en_CA.UTF-8"
LC_PAPER="en_CA.UTF-8"
LC_NAME="en_CA.UTF-8"
LC_ADDRESS="en_CA.UTF-8"
LC_TELEPHONE="en_CA.UTF-8"
LC_MEASUREMENT="en_CA.UTF-8"
LC_IDENTIFICATION="en_CA.UTF-8"
LC_ALL=
Thank you for the json configuration :) I will try it.

I run cat as_sq_prd_408_01_SystemOut_17.02.01_22.09.54.log | python shipper.py
I see all the JSON entry shipper.py generate. I think the problem for multiline is about that shipper.py I see a message for each line of my stack trace not one message for the stack trace.

I cannot publish log content here because it not public data but I can explain what I find in that log.

I see the 'utf8' codec can't decode bytes in position 131-133: invalid data contain single quote and some french word like l'achat souhaités qu'un acheté le(s) produit(s) I think it would be useful to escape some string with a patch in shipper.py.

Re: shipper.py problem's

Posted: Tue Feb 07, 2017 2:52 pm
by mcapra
It's probably a case of shipper.py not correctly handling the special characters. The script by itself is not terrible sophisticated.

Another option would be to configure an agent (rsyslog, syslog-ng, etc) to be responsible for shipping that file's contents to Nagios Log Server. This is a much more robust option than shipper.py in my opinion.

If it's really just a one-off file that you don't care to continually monitor, another option would be to copy the entire file to the Nagios Log Server machine and accept it as a file input. The input configuration might look like this:

Code: Select all

file {
        codec => multiline
        {
            pattern => '^\{'
            negate => true
            what => previous                
        }
        path => ["/tmp/file.json"]
        start_position => "beginning"
        sincedb_path => "/dev/null"
        exclude => "*.gz"
    }
Though it's hard to offer specific advice without seeing the file myself. PMing or emailing the log is an option too if you don't mind sharing it with the techs but would rather keep it away from the general public.

Re: shipper.py problem's

Posted: Tue Feb 07, 2017 3:05 pm
by bennyboy
We evaluate the rsyslog solution and we already use it with Linux. The problem is those specific log reside in AIX Server. Those log rotate a lot and If I understand well I need to use a recent version of Rsyslog like 8.23 or 8.24 to make sure it's stable. Rsyslog just patch there code to be compatible with AIX. https://github.com/rsyslog/rsyslog/pull/1247 and those change are available in 8.24 but IBM don't publish at this time the package. We have to compile that and we don't want to manage that risk on our side. I will continue to evaluate that.

I want to use shipper.py to add couple log manually to demonstrate Nagios Log Server to our devel team to use it as our log server.

Re: shipper.py problem's

Posted: Tue Feb 07, 2017 3:28 pm
by mcapra
I've been unable to replicate this so far using the previously provided text. Do you know if the system's locale was changed after Python was set up? That can upset the python interpreter in some situations when it comes to certain characters.

Re: shipper.py problem's

Posted: Tue Feb 07, 2017 3:32 pm
by bennyboy
I use the utility recode in fedora and recode utf8 myfile after that little command I was able to process the file but the file make no sense in Nagios Log Server because of cat each line by line. I have to find a way to use Rsyslog on AIX.

Re: shipper.py problem's

Posted: Wed Feb 08, 2017 10:35 am
by mcapra
Do you know the answer to this question?
mcapra wrote:Do you know if the system's locale was changed after Python was set up?
Also, part of the problem is that shipper.py is trying to send everything as JSON. If you pass currently valid JSON through a function intended to escape input for JSON, bad things are likely to happen.

Using the following source file:

Code: Select all

{
	"prop1": "l'achat souhaités qu'un acheté le(s) produit(s)",
	"prop2": "Something in english!"
}
{
	"prop3": "l'achat souhaités qu'un acheté le(s) produit(s)",
	"prop4": "Something in english!"
}
{
	"prop5": "l'achat souhaités qu'un acheté le(s) produit(s)",
	"prop6": "Something in english!"
}
{
	"prop7": "l'achat souhaités qu'un acheté le(s) produit(s)",
	"prop8": "Something in english!"
}
The following input rule:

Code: Select all

tcp {
        codec => multiline
        {
            pattern => '^\{'
            negate => true
            what => previous                
        }
        port => 2534
        type => 'multiline_json'
    }
The following filter rule:

Code: Select all

if [type] == 'multiline_json' {
json {
    source => message
    remove_field => message
  }
}
And the following command from the source machine:

Code: Select all

cat /tmp/send.json |  nc 192.168.67.4 2534
I was able to receive the events in a correctly formatted fashion. One example entry:
2017_02_08_09_33_16_Dashboard_Nagios_Log_Server.png
Though again, it's worth mentioning that the logstash-codec-multiline plugin sometimes fails when it gets particularly large messages. Stuffing the entire cat of a very large log into a single nc message might cause such problems.