Page 1 of 3

Nagios Log Server query problem on Nagios XI

Posted: Mon Aug 22, 2016 6:05 am
by comfone
Hi All
We are using Nagios XI Server Version 5.2.3 (VM appliance).
I have configured a Nagios Log Server query.
The query gives me once a day 0 matching entries, even tough there are matching entries in the NLS. (see attached pic).

The query used is:

check_xi_service_nagioslogserver!--url='http://IP.IP.IP.IP/nagioslogserver/' --apikey='a-key' --minutes='150' --warn=' ' --crit='1:' --query='{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"*"}}]}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1470647331246,"to":1470648231246}}},{"fquery":{"query":{"query_string":{"query":"type:(\"SSG-STATISTICS\")"}},"_cache":true}},{"terms":{"logsource":["servername"]}},{"terms":{"Data.raw":["OK"]}}]}}}}}'!!!!!!!

Does anybody has the same or similar problem?

Thank you.

Re: Nagios Log Server query problem on Nagios XI

Posted: Mon Aug 22, 2016 10:23 am
by rkennedy
It looks like you are only querying for the last 150 minutes, but your check appears to be running only every 24h. You'll want to adjust this to 1440 (or the same amount of time), that matches how often your XI check is running.

Re: Nagios Log Server query problem on Nagios XI

Posted: Mon Aug 22, 2016 10:43 am
by eloyd
Your "from" and "to" timestamps are also odd. Make sure you don't have a specific time range in your query when you copy the URL.

Re: Nagios Log Server query problem on Nagios XI

Posted: Mon Aug 22, 2016 4:46 pm
by tmcdonald
Keep us posted!

Re: Nagios Log Server query problem on Nagios XI

Posted: Wed Aug 24, 2016 1:24 am
by comfone
@rkennedy -> The check is not running every 24 hours. It's running every 5 Minutes. (see attachment)
@elyd -> Don't understand what you mean with timestamps are odd. I want the query to look back for the last 3 hours. Therefore I do need a range don't I?

As you can see on the pic above, this seems to happen only once a day always around the same time.
Any other idea what could lead to this problem?


--url='http://IP.IP.IP.IP/nagioslogserver/' --apikey='a-key' --minutes='150' --warn=' ' --crit='1:' --query='{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"*"}}]}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1470647331246,"to": 1470658131246}}},{"fquery":{"query":{"query_string":{"query":"type:(\"SSG-STATISTICS\")"}},"_cache":true}},{"terms":{"logsource":["servername"]}},{"terms":{"Data.raw":["OK"]}}]}}}}}'

Re: Nagios Log Server query problem on Nagios XI

Posted: Wed Aug 24, 2016 11:08 am
by mcapra
From the usage menu of the check_nagioslogserver.php plugin:

Code: Select all

--minutes=<MINUTES>     The number of minutes to perform the query over
So if you wanted to perform the query over a period of 3 hours, you would just use --minutes='180'. It would be redundant and potentially confusing (for elasticsearch) to try and pass it a separate timeframe via the query at that point. Say I wanted to run a query that checks how many log entries have the pid field equal to 16979 in the last 3 hours:

Code: Select all

./check_nagioslogserver.php --url='http://192.168.67.3/nagioslogserver/' --apikey='08129fbb34b8a197dc2e8b5f93713b0be27d61ae' --minutes=180 --query='{"query":{"constant_score":{"filter":{"term":{"pid":16979}}}}}'

Re: Nagios Log Server query problem on Nagios XI

Posted: Wed Aug 24, 2016 3:14 pm
by comfone
I removed the rage as suggested, but now the query returns 21 instead of the expected 2 or max 3 entries as the log is only appended hourly!

--url='http://IP.IP.IP.IP/nagioslogserver/' --apikey='a-key' --minutes='150' --warn=' ' --crit='1:' --query='{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"*"}}]}},"filter":{"bool":{"must":[{"fquery":{"query":{"query_string":{"query":"type:(\"SSG-STATISTICS\")"}},"_cache":true}},{"terms":{"logsource":["csvdb009"]}},{"terms":{"Data.raw":["OK"]}}]}}}}}'

Before removing the range I was getting 3 entries. See picture above.
Moreover the range parameter in the query is "suggested/created" by the Nagios XI Configuration Wizard for Nagios Log Server!
And how can you explain that it happens only once a day?
Only once a day, Nagios XI can't find any entry in NLS, while all rest of the time 3 entries are found as expected.
Can it be that there is a job running on NLS and instead of getting the answer no feed back received it just say 0 entries found!
But I can't pinpoint which background job could cause that issue. Very frustrating problem!!!

Re: Nagios Log Server query problem on Nagios XI

Posted: Wed Aug 24, 2016 3:55 pm
by mcapra
Ah, a little gotch-ya i've discovered in the API. You must pass *some sort of timestamp* in the query in order to use the --minutes argument properly.

Revising the query once more, give this one a shot:

Code: Select all

--url='http://IP.IP.IP.IP/nagioslogserver/' --apikey='a-key' --minutes='180' --warn=' ' --crit='1:' --query='{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"*"}}]}},"filter":{"bool":{"must":[{"fquery":{"query":{"query_string":{"query":"type:(\"SSG-STATISTICS\")"}},"_cache":true}},{"terms":{"logsource":["csvdb009"]}},{"terms":{"Data.raw":["OK"]}},{"range":{"@timestamp":{"from":0,"to":0}}}]}}}}}'

Re: Nagios Log Server query problem on Nagios XI

Posted: Mon Aug 29, 2016 5:33 am
by comfone
I tried the query with the new range but still the same problem.
--url='http://IP.IP.IP.IP/nagioslogserver/' --apikey='a-key' --minutes='180' --warn=' ' --crit='1:' --query='{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"*"}}]}},"filter":{"bool":{"must":[{"fquery":{"query":{"query_string":{"query":"type:(\"SSG-STATISTICS\")"}},"_cache":true}},{"terms":{"logsource":["csvdb009"]}},{"terms":{"Data.raw":["OK"]}},{"range":{"@timestamp":{"from":0,"to":0}}}]}}}}}'

As mentioned above. I can't explain, why it happens only once a day? (see pic attached)

Re: Nagios Log Server query problem on Nagios XI

Posted: Mon Aug 29, 2016 11:12 am
by mcapra
My best guess is that elasticsearch is doing some housekeeping (optimizing/closing/opening indexes) around that time which could be causing the queries to fail in the interim.

Can you share your elasticsearch log? This is usually located at /var/log/elasticsearch/<cluster id>.log. Feel free to PM it to me if you have security concerns.

Is scheduling a regular period of downtime for the problem period (02:00 - 03:00) an option? This would prevent false positive notifications from being sent during this time.