Nagios Support Forum

Posted: **Wed Nov 27, 2019 1:36 pm**

Hi:

I am running the latest version of the log server and continue to have strange issues with the alerting portion of the app. Here is an example of a query and the result I am interested in:

AlertIssues1.png

I saved it as a query and created an alert for it:

AlertIssues2.png

The first time it runs, I get this result:

AlertIssues3.png

Thanks!

Posted: **Wed Nov 27, 2019 3:38 pm**

Edit the alert and click the "Advanced (Manage Query)" link to show the details of the query. Please provide this query as well as the query downloaded by going to Dashboards > Manage Queries(magnifiying glass icon). These two should be the same. Going back to the Advanced section on the alert, you can use the Load button to reload the query. Give that a go and see if that gets the dashboard and alert results to line up.

Posted: **Wed Nov 27, 2019 4:08 pm**

Thanks for the response. I did a side by side comparison of the queries and they are exactly the same but return two entirely different results.

Posted: **Mon Dec 02, 2019 2:52 pm**

Please provide copies of both as well as a profile from the system. A profile can be gathered under Admin > System > System Status > Download System Profile or from the command line with:

Code: Select all

/usr/local/nagioslogserver/scripts/profile.sh

This will create /tmp/system-profile.tar.gz.

Note that this file can be very large and may not be able to be uploaded through the PM system. This is usually due to the logs in the Logstash and/or Elasticsearch directories found in it. If it is too large, please open the profile, extract these directories/files and send them separately.

Posted: **Mon Dec 02, 2019 3:16 pm**

I have attached the requested files. Do you still want the log files as well?

Posted: **Mon Dec 02, 2019 4:42 pm**

Is the there any indication that the dashboard page isn't fully loading when you run this query? Try using the browsers dev tools(F12) to see if any errors come up when you load the page.

Are you you able to most of the filters and then drill down to the data by adding them one at a time?

Upping the memory allocated in php.ini may help the interface load if that is the problem - https://support.nagios.com/kb/article/n ... e-611.html

And I assume the behavior is happening for more than just a single query, but the queries provided in the last post differ from the query seen in the initial screenshot that was provided. I want to make sure we're looking at the 'right' data, so please confirm.

Posted: **Thu Dec 05, 2019 1:43 pm**

Hi:

That is a safe assumption, I am seeing this in queries where there are parenthesis. If I rebuild the query with filters and leave the query as simple as possible the issue goes away but I can't build out all the logic I want for some queries I am using with filters alone I feel. My PHP memory setting was updated troubleshooting a different issue and everything appears to be loading fine (F12 showed no errors).

Posted: **Thu Dec 05, 2019 4:13 pm**

Try putting a \ before the parentheses and see if that works for the dashboard.

I'd also be curios to see what is getting submitted on the back end when you run the dashboard without using the \ in the query. We can enable this kind of logging by editing /usr/local/nagioslogserver/elasticsearch/config/elasticsearch.yml. Towards the bottom you'll see a section like:

Code: Select all

#index.search.slowlog.threshold.query.warn: 10s
#index.search.slowlog.threshold.query.info: 5s
#index.search.slowlog.threshold.query.debug: 2s
#index.search.slowlog.threshold.query.trace: 500ms

#index.search.slowlog.threshold.fetch.warn: 1s
#index.search.slowlog.threshold.fetch.info: 800ms
#index.search.slowlog.threshold.fetch.debug: 500ms
#index.search.slowlog.threshold.fetch.trace: 200ms

#index.indexing.slowlog.threshold.index.warn: 10s
#index.indexing.slowlog.threshold.index.info: 5s
#index.indexing.slowlog.threshold.index.debug: 2s
#index.indexing.slowlog.threshold.index.trace: 500ms

Uncomment and change two of lines so it looks like:

Code: Select all

#index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 1ms
#index.search.slowlog.threshold.query.debug: 2s
#index.search.slowlog.threshold.query.trace: 500ms

#index.search.slowlog.threshold.fetch.warn: 1s
index.search.slowlog.threshold.fetch.info: 1ms
#index.search.slowlog.threshold.fetch.debug: 500ms
#index.search.slowlog.threshold.fetch.trace: 200ms

#index.indexing.slowlog.threshold.index.warn: 10s
#index.indexing.slowlog.threshold.index.info: 5s
#index.indexing.slowlog.threshold.index.debug: 2s
#index.indexing.slowlog.threshold.index.trace: 500ms

Then restart elasticsearch:

Code: Select all

service elasticsearch restart

Then reload the dashboard. You'll only want load the dashboard once and then disable logging since this can generate a lot of logging. I'd like to get a screenshot of the dashboard so we can see the query and a copy of the UUID_index_search_slowlog.log that gets generated.

Posted: **Tue Dec 10, 2019 10:08 am**

Hi:

Please find the requested screenshot and log attached. Thanks!

Posted: **Tue Dec 10, 2019 4:41 pm**

No obvious errors stick out but it looks like it takes about 30 seconds to query the Elasticsearch backend and get the results. Not sure at this point if this is of significance, but I'd appreciate running through the same steps to get a new log to see if it is consistent. I'd also like to get a log taken while the alert is triggered to compare the two.

Nagios Support Forum

Alerts buggy

Alerts buggy

Re: Alerts buggy

Re: Alerts buggy

Re: Alerts buggy

Re: Alerts buggy

Re: Alerts buggy

Re: Alerts buggy

Re: Alerts buggy

Re: Alerts buggy

Re: Alerts buggy