Page 1 of 2

Alerts buggy

Posted: Wed Nov 27, 2019 1:36 pm
by CameronWP
Hi:

I am running the latest version of the log server and continue to have strange issues with the alerting portion of the app. Here is an example of a query and the result I am interested in:
AlertIssues1.png
I saved it as a query and created an alert for it:
AlertIssues2.png
The first time it runs, I get this result:
AlertIssues3.png
Thanks!

Re: Alerts buggy

Posted: Wed Nov 27, 2019 3:38 pm
by cdienger
Edit the alert and click the "Advanced (Manage Query)" link to show the details of the query. Please provide this query as well as the query downloaded by going to Dashboards > Manage Queries(magnifiying glass icon). These two should be the same. Going back to the Advanced section on the alert, you can use the Load button to reload the query. Give that a go and see if that gets the dashboard and alert results to line up.

Re: Alerts buggy

Posted: Wed Nov 27, 2019 4:08 pm
by CameronWP
Thanks for the response. I did a side by side comparison of the queries and they are exactly the same but return two entirely different results.

Re: Alerts buggy

Posted: Mon Dec 02, 2019 2:52 pm
by cdienger
Please provide copies of both as well as a profile from the system. A profile can be gathered under Admin > System > System Status > Download System Profile or from the command line with:

Code: Select all

/usr/local/nagioslogserver/scripts/profile.sh
This will create /tmp/system-profile.tar.gz.

Note that this file can be very large and may not be able to be uploaded through the PM system. This is usually due to the logs in the Logstash and/or Elasticsearch directories found in it. If it is too large, please open the profile, extract these directories/files and send them separately.

Re: Alerts buggy

Posted: Mon Dec 02, 2019 3:16 pm
by CameronWP
I have attached the requested files. Do you still want the log files as well?

Re: Alerts buggy

Posted: Mon Dec 02, 2019 4:42 pm
by cdienger
Is the there any indication that the dashboard page isn't fully loading when you run this query? Try using the browsers dev tools(F12) to see if any errors come up when you load the page.

Are you you able to most of the filters and then drill down to the data by adding them one at a time?

Upping the memory allocated in php.ini may help the interface load if that is the problem - https://support.nagios.com/kb/article/n ... e-611.html

And I assume the behavior is happening for more than just a single query, but the queries provided in the last post differ from the query seen in the initial screenshot that was provided. I want to make sure we're looking at the 'right' data, so please confirm.

Re: Alerts buggy

Posted: Thu Dec 05, 2019 1:43 pm
by CameronWP
Hi:

That is a safe assumption, I am seeing this in queries where there are parenthesis. If I rebuild the query with filters and leave the query as simple as possible the issue goes away but I can't build out all the logic I want for some queries I am using with filters alone I feel. My PHP memory setting was updated troubleshooting a different issue and everything appears to be loading fine (F12 showed no errors).

Re: Alerts buggy

Posted: Thu Dec 05, 2019 4:13 pm
by cdienger
Try putting a \ before the parentheses and see if that works for the dashboard.

I'd also be curios to see what is getting submitted on the back end when you run the dashboard without using the \ in the query. We can enable this kind of logging by editing /usr/local/nagioslogserver/elasticsearch/config/elasticsearch.yml. Towards the bottom you'll see a section like:

Code: Select all

#index.search.slowlog.threshold.query.warn: 10s
#index.search.slowlog.threshold.query.info: 5s
#index.search.slowlog.threshold.query.debug: 2s
#index.search.slowlog.threshold.query.trace: 500ms

#index.search.slowlog.threshold.fetch.warn: 1s
#index.search.slowlog.threshold.fetch.info: 800ms
#index.search.slowlog.threshold.fetch.debug: 500ms
#index.search.slowlog.threshold.fetch.trace: 200ms

#index.indexing.slowlog.threshold.index.warn: 10s
#index.indexing.slowlog.threshold.index.info: 5s
#index.indexing.slowlog.threshold.index.debug: 2s
#index.indexing.slowlog.threshold.index.trace: 500ms
Uncomment and change two of lines so it looks like:

Code: Select all

#index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 1ms
#index.search.slowlog.threshold.query.debug: 2s
#index.search.slowlog.threshold.query.trace: 500ms

#index.search.slowlog.threshold.fetch.warn: 1s
index.search.slowlog.threshold.fetch.info: 1ms
#index.search.slowlog.threshold.fetch.debug: 500ms
#index.search.slowlog.threshold.fetch.trace: 200ms

#index.indexing.slowlog.threshold.index.warn: 10s
#index.indexing.slowlog.threshold.index.info: 5s
#index.indexing.slowlog.threshold.index.debug: 2s
#index.indexing.slowlog.threshold.index.trace: 500ms
Then restart elasticsearch:

Code: Select all

service elasticsearch restart
Then reload the dashboard. You'll only want load the dashboard once and then disable logging since this can generate a lot of logging. I'd like to get a screenshot of the dashboard so we can see the query and a copy of the UUID_index_search_slowlog.log that gets generated.

Re: Alerts buggy

Posted: Tue Dec 10, 2019 10:08 am
by CameronWP
Hi:

Please find the requested screenshot and log attached. Thanks!

Re: Alerts buggy

Posted: Tue Dec 10, 2019 4:41 pm
by cdienger
No obvious errors stick out but it looks like it takes about 30 seconds to query the Elasticsearch backend and get the results. Not sure at this point if this is of significance, but I'd appreciate running through the same steps to get a new log to see if it is consistent. I'd also like to get a log taken while the alert is triggered to compare the two.