NagiosXI -> NagiosLog specific type of query broken

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
gsl_ops_practice
Posts: 151
Joined: Thu Apr 09, 2015 9:14 pm

NagiosXI -> NagiosLog specific type of query broken

Post by gsl_ops_practice »

Hello,

We are having a strange issue between NagiosXI (2014R2.7 )and NagiosLog (1.4.0) server. Multiple monitors are set up to query NagiosLog and all return valid data, trying to create a new monitor for a new NagiosLog query and getting "UNKNOWN: Could not get data from Nagios Log Server". Your help to resolve this would be appreciated.

Query that produces the error as stated above:

Code: Select all

check_xi_service_nagioslogserver!--url='http://1.1.1.1/nagioslogserver/' --apikey='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' --minutes='2' --warn='500' --crit='1000' --query='{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"*"}}]}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1470754104641,"to":1470754404641}}},{"fquery":{"query":{"query_string":{"query":"*apache_access*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"response = 200"}},"_cache":true}}],"must_not":[{"fquery":{"query":{"query_string":{"query":"*staging*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"*10.30*"}},"_cache":true}}]}}}}}'
Query that produces a valid response:

Code: Select all

check_xi_service_nagioslogserver!--url='http://1.1.1.1/nagioslogserver/' --apikey='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' --minutes='5' --warn='500' --crit='1000' --query='{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"*"}}]}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1470755467932,"to":1470755767932}}},{"fquery":{"query":{"query_string":{"query":"*apache_access*"}},"_cache":true}}],"must_not":[{"fquery":{"query":{"query_string":{"query":"*staging*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"*10.30*"}},"_cache":true}}]}}}}}'
The only difference between the original queries in NagiosLog is the "Response = 200" condition.

Thanks,
Alex
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: NagiosXI -> NagiosLog specific type of query broken

Post by rkennedy »

nls-query-diff.PNG
I've been running diff all morning trying to track down where the JSON is failing. I took out the addition, and it works -

Code: Select all

[nagios@localhost libexec]$ php check_nagioslogserver.php --url='http://192.168.3.190/nagioslogserver/' --apikey='53abc9d8fefd7d8d46e70b3a853acf6c10ffd637' --minutes='5' --warn='500' --crit='1000' --query='{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"*"}}]}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1470754104641,"to":1470754404641}}},{"fquery":{"query":{"query_string":{"query":"*apache_access*"}},"_cache":true}}],"must_not":[{"fquery":{"query":{"query_string":{"query":"*staging*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"*10.30*"}},"_cache":true}}]}}}}}'
So, we know that's the problematic part. I tried removing the =, and that fixed it.

Code: Select all

[nagios@localhost libexec]$ php check_nagioslogserver.php --url='http://192.168.3.190/nagioslogserver/' --apikey='53abc9d8fefd7d8d46e70b3a853acf6c10ffd637' --minutes='5' --warn='500' --crit='1000' --query='{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"*"}}]}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1470754104641,"to":1470754404641}}},{"fquery":{"query":{"query_string":{"query":"*apache_access*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"response 200"}},"_cache":true}}],"must_not":[{"fquery":{"query":{"query_string":{"query":"*staging*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"*10.30*"}},"_cache":true}}]}}}}}'
OK: 0 matching entries found |logs=0;500;1000

[nagios@localhost libexec]$ php check_nagioslogserver.php --url='http://192.168.3.190/nagioslogserver/' --apikey='53abc9d8fefd7d8d46e70b3a853acf6c10ffd637' --minutes='5' --warn='500' --crit='1000' --query='{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"*"}}]}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1470754104641,"to":1470754404641}}},{"fquery":{"query":{"query_string":{"query":"*apache_access*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"response = 200"}},"_cache":true}}],"must_not":[{"fquery":{"query":{"query_string":{"query":"*staging*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"*10.30*"}},"_cache":true}}]}}}}}'
UNKNOWN: Could not get data from Nagios Log Server
I have a remote coming up shortly, so I'll take a look at this a little bit later today. Just wanted to leave my findings for now. I'll follow up a bit later and do some more testing on my end.
You do not have the required permissions to view the files attached to this post.
Former Nagios Employee
gsl_ops_practice
Posts: 151
Joined: Thu Apr 09, 2015 9:14 pm

Re: NagiosXI -> NagiosLog specific type of query broken

Post by gsl_ops_practice »

Thanks for your feedback so far, please let me know how it goes.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: NagiosXI -> NagiosLog specific type of query broken

Post by rkennedy »

The part the plugin is failing on is here -

Code: Select all

        if (!is_object($result)) {
                echo_and_exit("UNKNOWN3: Server returned invalid output", 3);
        } else if (!empty($result->error)) {
                echo_and_exit($result->message, 3);
        } else {
                echo_and_exit($result->output, 3);
        }
Specifically, this part -

Code: Select all

                echo_and_exit($result->output, 3);
Could you run the following commands on one of your NLS machines, and post back the output? I'd like to manually execute the two different queries and see what difference appears.

Code: Select all

curl -XGET 127.0.0.1:9200/_search -d '{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"*"}}]}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1470754104641,"to":1470754404641}}},{"fquery":{"query":{"query_string":{"query":"*apache_access*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"response 200"}},"_cache":true}}],"must_not":[{"fquery":{"query":{"query_string":{"query":"*staging*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"*10.30*"}},"_cache":true}}]}}}}}'
and -

Code: Select all

curl -XGET 127.0.0.1:9200/_search -d '{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"*"}}]}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1470754104641,"to":1470754404641}}},{"fquery":{"query":{"query_string":{"query":"*apache_access*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"response = 200"}},"_cache":true}}],"must_not":[{"fquery":{"query":{"query_string":{"query":"*staging*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"*10.30*"}},"_cache":true}}]}}}}'
Former Nagios Employee
gsl_ops_practice
Posts: 151
Joined: Thu Apr 09, 2015 9:14 pm

Re: NagiosXI -> NagiosLog specific type of query broken

Post by gsl_ops_practice »

Hello,

For the record, this query with the Response = 200 done manually in the NagiosLog GUI is successful.

The response to the first query was successful, but very long, one complete screen, I am attaching the first part of the response:

Code: Select all

{"took":2319,"timed_out":false,"_shards":{"total":611,"successful":611,"failed":0},"hits":{"total":2047,"max_score":1.0,"hits":[{"_index":"logstash-2016.08.09","_type":"apache_access","_id":"AVZvyHcme9XLoLAfJG82","_score":1.0,"_source":{"message":"111.111.111.111- - [09/Aug/2016:14:50:15 +0000] \"POST /serviceurl/webservice HTTP/1.1\" 200 1190 \
Second query produced an error 60 pages long, will paste the first bit here:

Code: Select all

{"error":"SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[o2hWlLTFS6yzUfnOPie3xg][kibana-int][0]: SearchParseException[[kibana-int][0]: query[filtered(ConstantScore(*:*))->BooleanFilter(+cache(@timestamp:[1470754104641 TO 1470754404641]) +cache(QueryWrapperFilter(_all:*apache_access*)) +cache(QueryWrapperFilter(_all:response _all:200)) -cache(QueryWrapperFilter(_all:*staging*)) -cache(QueryWrapperFilter(_all:*10.30*)))],from[-1],size[-1]: Parse Failure [Failed to parse source [{\"query\":{\"filtered\":{\"query\":{\"bool\":{\"should\":[{\"query_string\":{\"query\":\"*\"}}]
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: NagiosXI -> NagiosLog specific type of query broken

Post by rkennedy »

Odd, do you have free disk space / ram on each member still? What is the output of curl -XGET localhost:9200/_cluster/health/?pretty?

Was that second query just full of errors, or did it have any real information?
Former Nagios Employee
gsl_ops_practice
Posts: 151
Joined: Thu Apr 09, 2015 9:14 pm

Re: NagiosXI -> NagiosLog specific type of query broken

Post by gsl_ops_practice »

No, that query did not have any real information. Please find output of your query below:

Code: Select all

cluster_name":"XXXXXXXXXXXXXXXXXXXXXXX","status":"green","timed_out":false,"number_of_nodes":2,"number_of_data_nodes":2,"active_primary_shards":611,"active_shards":1222,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0}
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: NagiosXI -> NagiosLog specific type of query broken

Post by rkennedy »

gsl_ops_practice wrote:No, that query did not have any real information. Please find output of your query below:

Code: Select all

cluster_name":"XXXXXXXXXXXXXXXXXXXXXXX","status":"green","timed_out":false,"number_of_nodes":2,"number_of_data_nodes":2,"active_primary_shards":611,"active_shards":1222,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0}
I actually missed a bracket in the command that I posted, could you run it again? I've added one more } to it. This should show us if Elastic's API is returning the result at all or not.

Code: Select all

curl -XGET 127.0.0.1:9200/_search?pretty -d '{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"*"}}]}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1470754104641,"to":1470754404641}}},{"fquery":{"query":{"query_string":{"query":"*apache_access*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"response = 200"}},"_cache":true}}],"must_not":[{"fquery":{"query":{"query_string":{"query":"*staging*"}},"_cache":true}},{"fquery":{"query":{"query_string":{"query":"*10.30*"}},"_cache":true}}]}}}}}'
Former Nagios Employee
gsl_ops_practice
Posts: 151
Joined: Thu Apr 09, 2015 9:14 pm

Re: NagiosXI -> NagiosLog specific type of query broken

Post by gsl_ops_practice »

Just ran this query - this is the output, please note there are 9 result blocks and I pasting the first one. Due to having to remove sensitive data the spacing and line breaks are a bit off.

Code: Select all

{
  "took" : 1467,
  "timed_out" : false,
  "_shards" : {
    "total" : 616,
    "successful" : 616,
    "failed" : 0
  },
  "hits" : {
    "total" : 2352,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "logstash-2016.08.09",
      "_type" : "apache_access",
      "_id" : "AVZvydh7e9XLoLAfJHwc",
      "_score" : 1.0,
      "_source":{"message":"111.111.111.111- - [09/Aug/2016:14:51:50 +0000] \"POST /serviceurl/webapp HTTP/1.1\                                                              " 200 2662 \"-\" \"Java/1.5.0\" inbytes=2099 outbytes=2976\n","@version":"1","@timestamp":"2016-08-09T14:51:50.000Z","type":"apach                                                              e_access","host":"1.1.1.1","priority":133,"timestamp":["Aug  9 14:51:53","09/Aug/2016:14:51:50 +0000"],"logsource":webserver","program":"apache_access","severity":5,"facility":16,"facility_label":"local0","severity_label":"Notice","clientip":"111.111.111.111","ident":"-","auth":"-","verb":"POST","request":"/serviceurl/webapp","httpversion":"1.1","response":200,"bytes":2662,"referrer":"\"-\"","agent":"\"Java/1.5.0\"","inbytes":2099,"outbytes":2976}
    }
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: NagiosXI -> NagiosLog specific type of query broken

Post by rkennedy »

Thanks for that output. Just to make sure, is that the result you expect from the XI interface and the same result you're seeing when running the query through NLS?

I have a feeling there is a bug when it's utilizing NLS's API vs Elasticsearch's, as the Elasticsearch API seems to be returning results.
Former Nagios Employee
Locked