Very Sluggish Web Interface

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Locked
NCATmax
Posts: 24
Joined: Mon Jan 14, 2019 10:22 am

Very Sluggish Web Interface

Post by NCATmax »

Hello,

I am not even sure where to start on this issue. I would be glad to provide any additional information that may be useful.

The biggest visible issue is that the whole web interface is very slow. It takes over 60 seconds to log in. It takes another 20-30 seconds to pull up the Dashboards page, and it takes even longer for data to actually show up in the graphs on the default dashboard. Any searches or queries are also extremely slow.

It also appears that as of a few hours ago, NLS is no longer recording any data.

There was an event a few weeks back where this server ran out of disk space. I am not sure if that is related, but I did want to mention that.

I see many errors while looking through the elasticsearch logs. I think some of them are related to some bad filters putting the wrong type of data into certain fields, but there are also some others that I do not recognize.

I have attached the elasticsearch log from today that contains many errors.


Thank you for any insight you may be able to provide,
Max Farrior
You do not have the required permissions to view the files attached to this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Very Sluggish Web Interface

Post by scottwilkerson »

The log you posted is reporting many out of memory errors.

How much memory do you have allocated to this server? You may need to increase this
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
NCATmax
Posts: 24
Joined: Mon Jan 14, 2019 10:22 am

Re: Very Sluggish Web Interface

Post by NCATmax »

Thank you for the response.

The VM currently has 16 GB of memory.

I looked at the VM's memory usage in Nagios XI, the memory seems to start steadily increasing since around 8:30am this morning. I believe a colleague was using it at that time, he was the one that discovered these issues. Is the steady increase normal? I would think it should stop increasing once the query finishes. (Maybe the query didn't finish because of memory issues?)

I will make a request for the additional memory. However, we are currently running thin on memory, my request for 16GB received some resistance (the VM previously had 8GB).

Is there a way to reduce memory usage? Does having complex Logstash filters have a significant effect on memory usage?
You do not have the required permissions to view the files attached to this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Very Sluggish Web Interface

Post by scottwilkerson »

NCATmax wrote:(Maybe the query didn't finish because of memory issues?)
This is a very high likelihood, especially of they were running a search spanning a large amount of data.
NCATmax wrote:Is there a way to reduce memory usage?
The only real way to reduce this would be to close any indexes that aren't necessary for your searches, or increase the number of instances in your cluster (although this will require a larger license)
NCATmax wrote:Does having complex Logstash filters have a significant effect on memory usage?
This does have an impact but not too significant, the bigger impact is the amount of data you have in open indexes and the complexity of the queries you are running through the UI
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
NCATmax
Posts: 24
Joined: Mon Jan 14, 2019 10:22 am

Re: Very Sluggish Web Interface

Post by NCATmax »

I do believe my colleage was searching over the last 30 days.

Let's assume I am limited to 16 GB of memory. Is there any best practice advice about how to minimize memory issues?

For example, NLS is currently configured to keep 30 days of indices open. (Each index is approximately 20-25 GB.) Would it make sense to reduce this to say 14 days?

A few of the filters that are being used create many new fields. Does having many new fields increase index size? And does it require more memory to search these larger indices? (I am wondering if I should use simpler filters so we can keep more indices open for searching.)
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Very Sluggish Web Interface

Post by scottwilkerson »

NCATmax wrote:Would it make sense to reduce this to say 14 days?
This would help a lot
NCATmax wrote:A few of the filters that are being used create many new fields. Does having many new fields increase index size? And does it require more memory to search these larger indices? (I am wondering if I should use simpler filters so we can keep more indices open for searching.)
Yes and Yes.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
NCATmax
Posts: 24
Joined: Mon Jan 14, 2019 10:22 am

Re: Very Sluggish Web Interface

Post by NCATmax »

I see. In the short term, I will reduce the number of open indices to 14. And in the medium term, I'll look at reducing the number of additional fields the filters create, as well as trying to get some more memory added.

Speaking generally, is 14 days of data still useful or within a typical range? I know this completely depends on how NLS is used, but I remember thinking that 30 days sounded a little small. I have no point of reference.

Is the lack of memory the cause of the issue in my first post? Would simply restarting the NLS services address the issue? (Logstash, Elasticsearch, Apache)
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Very Sluggish Web Interface

Post by scottwilkerson »

NCATmax wrote:Speaking generally, is 14 days of data still useful or within a typical range? I know this completely depends on how NLS is used, but I remember thinking that 30 days sounded a little small. I have no point of reference.
as you mentioned it all depends what your use case is, we have users that only keep one day, and we have others that need ready access to a years worth, albeit the later usually has a large amount of instances in their cluster with 64GB RAM in each instance and fast disks.
NCATmax wrote:Is the lack of memory the cause of the issue in my first post? Would simply restarting the NLS services address the issue? (Logstash, Elasticsearch, Apache)
Yes, that was what the error was showing in the log, and yes, that is what I would recommend to resolve it.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
NCATmax
Posts: 24
Joined: Mon Jan 14, 2019 10:22 am

Re: Very Sluggish Web Interface

Post by NCATmax »

I just wanted to follow up on this.

Restarting all of the NLS services fixed the issue.

To limit memory consumption, I have configured NLS to only keep 14 indices open at once. As a secondary measure, I am going to revisit the filters I had created and try to remove any new fields that aren't really necessary.

Thank you for your help in resolving this.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Very Sluggish Web Interface

Post by scottwilkerson »

NCATmax wrote:I just wanted to follow up on this.

Restarting all of the NLS services fixed the issue.

To limit memory consumption, I have configured NLS to only keep 14 indices open at once. As a secondary measure, I am going to revisit the filters I had created and try to remove any new fields that aren't really necessary.

Thank you for your help in resolving this.
Glad to hear it is resolved

Locking thread
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked