Page 1 of 2
NagiosXI -> NagiosLog specific type of query broken
Posted: Tue Aug 09, 2016 10:26 am
by gsl_ops_practice
Hello,
We are having a strange issue between NagiosXI (2014R2.7 )and NagiosLog (1.4.0) server. Multiple monitors are set up to query NagiosLog and all return valid data, trying to create a new monitor for a new NagiosLog query and getting "UNKNOWN: Could not get data from Nagios Log Server". Your help to resolve this would be appreciated.
Query that produces the error as stated above:
Code: Select all
check_xi_service_nagioslogserver!--url='http://1.1.1.1/nagioslogserver/' --apikey='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' --minutes='2' --warn='500' --crit='1000' --query='{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"*"}}]}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1470754104641,"to":1470754404641}}},{"fquery":{"query":{"query_string":{"query":"*apache_access*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"response = 200"}},"_cache":true}}],"must_not":[{"fquery":{"query":{"query_string":{"query":"*staging*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"*10.30*"}},"_cache":true}}]}}}}}'
Query that produces a valid response:
Code: Select all
check_xi_service_nagioslogserver!--url='http://1.1.1.1/nagioslogserver/' --apikey='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' --minutes='5' --warn='500' --crit='1000' --query='{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"*"}}]}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1470755467932,"to":1470755767932}}},{"fquery":{"query":{"query_string":{"query":"*apache_access*"}},"_cache":true}}],"must_not":[{"fquery":{"query":{"query_string":{"query":"*staging*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"*10.30*"}},"_cache":true}}]}}}}}'
The only difference between the original queries in NagiosLog is the "Response = 200" condition.
Thanks,
Alex
Re: NagiosXI -> NagiosLog specific type of query broken
Posted: Tue Aug 09, 2016 11:53 am
by rkennedy
nls-query-diff.PNG
I've been running diff all morning trying to track down where the JSON is failing. I took out the addition, and it works -
Code: Select all
[nagios@localhost libexec]$ php check_nagioslogserver.php --url='http://192.168.3.190/nagioslogserver/' --apikey='53abc9d8fefd7d8d46e70b3a853acf6c10ffd637' --minutes='5' --warn='500' --crit='1000' --query='{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"*"}}]}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1470754104641,"to":1470754404641}}},{"fquery":{"query":{"query_string":{"query":"*apache_access*"}},"_cache":true}}],"must_not":[{"fquery":{"query":{"query_string":{"query":"*staging*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"*10.30*"}},"_cache":true}}]}}}}}'
So, we know that's the problematic part. I tried removing the =, and that fixed it.
Code: Select all
[nagios@localhost libexec]$ php check_nagioslogserver.php --url='http://192.168.3.190/nagioslogserver/' --apikey='53abc9d8fefd7d8d46e70b3a853acf6c10ffd637' --minutes='5' --warn='500' --crit='1000' --query='{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"*"}}]}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1470754104641,"to":1470754404641}}},{"fquery":{"query":{"query_string":{"query":"*apache_access*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"response 200"}},"_cache":true}}],"must_not":[{"fquery":{"query":{"query_string":{"query":"*staging*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"*10.30*"}},"_cache":true}}]}}}}}'
OK: 0 matching entries found |logs=0;500;1000
[nagios@localhost libexec]$ php check_nagioslogserver.php --url='http://192.168.3.190/nagioslogserver/' --apikey='53abc9d8fefd7d8d46e70b3a853acf6c10ffd637' --minutes='5' --warn='500' --crit='1000' --query='{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"*"}}]}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1470754104641,"to":1470754404641}}},{"fquery":{"query":{"query_string":{"query":"*apache_access*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"response = 200"}},"_cache":true}}],"must_not":[{"fquery":{"query":{"query_string":{"query":"*staging*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"*10.30*"}},"_cache":true}}]}}}}}'
UNKNOWN: Could not get data from Nagios Log Server
I have a remote coming up shortly, so I'll take a look at this a little bit later today. Just wanted to leave my findings for now. I'll follow up a bit later and do some more testing on my end.
Re: NagiosXI -> NagiosLog specific type of query broken
Posted: Tue Aug 09, 2016 12:07 pm
by gsl_ops_practice
Thanks for your feedback so far, please let me know how it goes.
Re: NagiosXI -> NagiosLog specific type of query broken
Posted: Tue Aug 09, 2016 1:44 pm
by rkennedy
The part the plugin is failing on is here -
Code: Select all
if (!is_object($result)) {
echo_and_exit("UNKNOWN3: Server returned invalid output", 3);
} else if (!empty($result->error)) {
echo_and_exit($result->message, 3);
} else {
echo_and_exit($result->output, 3);
}
Specifically, this part -
Code: Select all
echo_and_exit($result->output, 3);
Could you run the following commands on one of your NLS machines, and post back the output? I'd like to manually execute the two different queries and see what difference appears.
Code: Select all
curl -XGET 127.0.0.1:9200/_search -d '{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"*"}}]}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1470754104641,"to":1470754404641}}},{"fquery":{"query":{"query_string":{"query":"*apache_access*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"response 200"}},"_cache":true}}],"must_not":[{"fquery":{"query":{"query_string":{"query":"*staging*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"*10.30*"}},"_cache":true}}]}}}}}'
and -
Code: Select all
curl -XGET 127.0.0.1:9200/_search -d '{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"*"}}]}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1470754104641,"to":1470754404641}}},{"fquery":{"query":{"query_string":{"query":"*apache_access*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"response = 200"}},"_cache":true}}],"must_not":[{"fquery":{"query":{"query_string":{"query":"*staging*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"*10.30*"}},"_cache":true}}]}}}}'
Re: NagiosXI -> NagiosLog specific type of query broken
Posted: Tue Aug 09, 2016 2:31 pm
by gsl_ops_practice
Hello,
For the record, this query with the Response = 200 done manually in the NagiosLog GUI is successful.
The response to the first query was successful, but very long, one complete screen, I am attaching the first part of the response:
Code: Select all
{"took":2319,"timed_out":false,"_shards":{"total":611,"successful":611,"failed":0},"hits":{"total":2047,"max_score":1.0,"hits":[{"_index":"logstash-2016.08.09","_type":"apache_access","_id":"AVZvyHcme9XLoLAfJG82","_score":1.0,"_source":{"message":"111.111.111.111- - [09/Aug/2016:14:50:15 +0000] \"POST /serviceurl/webservice HTTP/1.1\" 200 1190 \
Second query produced an error 60 pages long, will paste the first bit here:
Code: Select all
{"error":"SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[o2hWlLTFS6yzUfnOPie3xg][kibana-int][0]: SearchParseException[[kibana-int][0]: query[filtered(ConstantScore(*:*))->BooleanFilter(+cache(@timestamp:[1470754104641 TO 1470754404641]) +cache(QueryWrapperFilter(_all:*apache_access*)) +cache(QueryWrapperFilter(_all:response _all:200)) -cache(QueryWrapperFilter(_all:*staging*)) -cache(QueryWrapperFilter(_all:*10.30*)))],from[-1],size[-1]: Parse Failure [Failed to parse source [{\"query\":{\"filtered\":{\"query\":{\"bool\":{\"should\":[{\"query_string\":{\"query\":\"*\"}}]
Re: NagiosXI -> NagiosLog specific type of query broken
Posted: Wed Aug 10, 2016 12:45 pm
by rkennedy
Odd, do you have free disk space / ram on each member still? What is the output of curl -XGET localhost:9200/_cluster/health/?pretty?
Was that second query just full of errors, or did it have any real information?
Re: NagiosXI -> NagiosLog specific type of query broken
Posted: Wed Aug 10, 2016 12:58 pm
by gsl_ops_practice
No, that query did not have any real information. Please find output of your query below:
Code: Select all
cluster_name":"XXXXXXXXXXXXXXXXXXXXXXX","status":"green","timed_out":false,"number_of_nodes":2,"number_of_data_nodes":2,"active_primary_shards":611,"active_shards":1222,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0}
Re: NagiosXI -> NagiosLog specific type of query broken
Posted: Thu Aug 11, 2016 10:05 am
by rkennedy
gsl_ops_practice wrote:No, that query did not have any real information. Please find output of your query below:
Code: Select all
cluster_name":"XXXXXXXXXXXXXXXXXXXXXXX","status":"green","timed_out":false,"number_of_nodes":2,"number_of_data_nodes":2,"active_primary_shards":611,"active_shards":1222,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0}
I actually missed a bracket in the command that I posted, could you run it again? I've added one more } to it. This should show us if Elastic's API is returning the result at all or not.
Code: Select all
curl -XGET 127.0.0.1:9200/_search?pretty -d '{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"*"}}]}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1470754104641,"to":1470754404641}}},{"fquery":{"query":{"query_string":{"query":"*apache_access*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"response = 200"}},"_cache":true}}],"must_not":[{"fquery":{"query":{"query_string":{"query":"*staging*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"*10.30*"}},"_cache":true}}]}}}}}'
Re: NagiosXI -> NagiosLog specific type of query broken
Posted: Thu Aug 11, 2016 12:42 pm
by gsl_ops_practice
Just ran this query - this is the output, please note there are 9 result blocks and I pasting the first one. Due to having to remove sensitive data the spacing and line breaks are a bit off.
Code: Select all
{
"took" : 1467,
"timed_out" : false,
"_shards" : {
"total" : 616,
"successful" : 616,
"failed" : 0
},
"hits" : {
"total" : 2352,
"max_score" : 1.0,
"hits" : [ {
"_index" : "logstash-2016.08.09",
"_type" : "apache_access",
"_id" : "AVZvydh7e9XLoLAfJHwc",
"_score" : 1.0,
"_source":{"message":"111.111.111.111- - [09/Aug/2016:14:51:50 +0000] \"POST /serviceurl/webapp HTTP/1.1\ " 200 2662 \"-\" \"Java/1.5.0\" inbytes=2099 outbytes=2976\n","@version":"1","@timestamp":"2016-08-09T14:51:50.000Z","type":"apach e_access","host":"1.1.1.1","priority":133,"timestamp":["Aug 9 14:51:53","09/Aug/2016:14:51:50 +0000"],"logsource":webserver","program":"apache_access","severity":5,"facility":16,"facility_label":"local0","severity_label":"Notice","clientip":"111.111.111.111","ident":"-","auth":"-","verb":"POST","request":"/serviceurl/webapp","httpversion":"1.1","response":200,"bytes":2662,"referrer":"\"-\"","agent":"\"Java/1.5.0\"","inbytes":2099,"outbytes":2976}
}
Re: NagiosXI -> NagiosLog specific type of query broken
Posted: Thu Aug 11, 2016 3:30 pm
by rkennedy
Thanks for that output. Just to make sure, is that the result you expect from the XI interface and the same result you're seeing when running the query through NLS?
I have a feeling there is a bug when it's utilizing NLS's API vs Elasticsearch's, as the Elasticsearch API seems to be returning results.