check_elasticsearch.py

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Locked
rocheryderm
Posts: 69
Joined: Fri Jul 13, 2018 1:09 pm

check_elasticsearch.py

Post by rocheryderm »

OK...

Trying to make heads or tails of the check_elasticsearch.py check as suggested here: https://support.nagios.com/forum/viewto ... 37&t=50451

One form of the command allows for use of saved "filters" -- does anyone know how to save a filter so as to be able to reference it with this command?

The Python code is working, but results are dodgy because of my inability to figure out how to create a saved filter, AND because I lack the Lucene-Fu to cobble together a query. Strange errors....

Here's an example of my failures:

Code: Select all

}[root@rbbusnagios1t libexec]#/usr/local/nagios/libexec/check_elasticsearch.py --host "http://rbbusnls1t:9200/" --index "filebeat-2018.10.10" --query "message:'ARCHIVE action failed' AND @timestamp:'[[now-24h TO now]]'" --warning 1 --critical 2

Error: Exception: TransportError(400, u'{"count":0,"_shards":{"total":5,"successful":0,"failed":5,"failures":[{"index":"filebeat-2018.10.10","shard":0,"status":400,"reason":"BroadcastShardOperationFailedException[[filebeat-2018.10.10][0] ]; nested: QueryParsingException[[filebeat-2018.10.10] Failed to parse]; nested: IllegalArgumentException[Invalid format: \\"\'\\"]; "},{"index":"filebeat-2018.10.10","shard":1,"status":400,"reason":"BroadcastShardOperationFailedException[[filebeat-2018.10.10][1] ]; nested: QueryParsingException[[filebeat-2018.10.10] Failed to parse]; nested: IllegalArgumentException[Invalid format: \\"\'\\"]; "},{"index":"filebeat-2018.10.10","shard":2,"status":400,"reason":"BroadcastShardOperationFailedException[[filebeat-2018.10.10][2] ]; nested: QueryParsingException[[filebeat-2018.10.10] Failed to parse]; nested: IllegalArgumentException[Invalid format: \\"\'\\"]; "},{"index":"filebeat-2018.10.10","shard":3,"status":400,"reason":"BroadcastShardOperationFailedException[[filebeat-2018.10.10][3] ]; nested: QueryParsingException[[filebeat-2018.10.10] Failed to parse]; nested: IllegalArgumentException[Invalid format: \\"\'\\"]; "},{"index":"filebeat-2018.10.10","shard":4,"status":400,"reason":"BroadcastShardOperationFailedException[[filebeat-2018.10.10][4] ]; nested: QueryParsingException[[filebeat-2018.10.10] Failed to parse]; nested: IllegalArgumentException[Invalid format: \\"\'\\"]; "}]}}')
[root@rbbusnagios1t libexec]#
rocheryderm
Posts: 69
Joined: Fri Jul 13, 2018 1:09 pm

Re: check_elasticsearch.py

Post by rocheryderm »

Well... I made some progress, but something's still not right and now I'm sure it's because I'm an Elasticsearch newbie.

THIS "works" (but it's a fail because I know there should be a single hit)

Code: Select all

# /usr/local/nagios/libexec/check_elasticsearch.py --host "http://nls1t:9200/" --index "filebeat-*" --query "message:'ARCHIVE action failed' AND timestamp_field: [now-24h TO now]" --warning 1 --critical 2
OK - Total hits: 0 | hits=0

If I use @timestamp instead of timestamp_field, I get this kind of error

Code: Select all

...SearchParseException[[filebeat-2018.10.11][4]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"sort": [{"timestamp": {"order": "desc"}}], "query": {"query_string": {"query": "message:\'ARCHIVE action failed\' AND @timestamp: [now-24h TO now]"}}}]]]; nested: SearchParseException[[filebeat-2018.10.11][4]: from[-1],size[-1]: Parse Failure [No mapping found for [timestamp] in order to sort on]]; }]')
I suspect I need to find a way to tell NLS ES that the @timestamp field in the filebeat indices is the "time filter field name" but I can't figure out how. Any thoughts?
rocheryderm
Posts: 69
Joined: Fri Jul 13, 2018 1:09 pm

Re: check_elasticsearch.py

Post by rocheryderm »

And solved this problem - I noticed that there were no templates for the beats indices I am creating so this particular problem went away as soon as I uploaded the "es2x" index template JSON files to NLS.

I realize that I'm solving this on my own, but am happy to document my findings here in case they are useful to anyone. By all means, if you know how to solve this, please let me know - I'd rather not spend my time reinventing the wheel.

But now a new bug to hunt - the query fails in a new, unexpected way and now I am really puzzled, because the "message" field is definitely there:

Code: Select all

# /usr/local/nagios/libexec/check_elasticsearch.py --host "http://nls1t:9200/" --index "filebeat-*" --query "message:'ARCHIVE action failed' AND @timestamp: [now-24h TO now]" --warning 1 --critical 2
Error: msgkey message does not exist. These msgkeys are available:
type
fields
tags
beat
input_type
@timestamp
source
host
offset
message
@version
"message" is right there, listed as an existing msgkey. Help!

Mike
rocheryderm
Posts: 69
Joined: Fri Jul 13, 2018 1:09 pm

Re: check_elasticsearch.py

Post by rocheryderm »

and... I did it again...

Code: Select all

# ~/check_elasticsearch.py --host "http://nls1t:9200/" --index "filebeat-*" --query "+message='ARCHIVE action failed' AND +@timestamp: [now-24h TO now]" --warning 1 --critical 2 --srckey message

WARNING - Total hits: 1 - Last message from: 20181011 22:21:32; [  2564]; [    WARN]; [ARC]; ARCHIVE action failed for 71575 of 71670 documents. [ 20181011 22:21:32; [  2564]; [    WARN]; [ARC]; ARCHIVE action failed for 71575 of 71670 documents. ] | hits=1
[root@rbbusnagios1t libexec]#
The response is a little ugly but this is happy news. And I spent a few hours learning Python object, class, array and string handling this morning, so not a complete loss...

I don't understand why I had to add the srckey argument, since the documentation for this service-check query is sparse, but it's one example of a working query against the elasticsearch database in NLS.

So... in summary... this works as long as you:
  • make sure the index templates for your data (like elastic beats) have been uploaded to elasticsearch (and make sure they are compatible with a "version 2" elasticsearch instance)
  • make sure to specify the srckey argument
Thanks again to @mcapra for pointing me in this direction on that other post.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: check_elasticsearch.py

Post by cdienger »

Glad to hear you were able to find a solution!
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked