NLS web interface very slow

mejokj · Post by **mejokj** » Tue Feb 27, 2018 1:25 pm

Hi,

I am having an issue with NLS 2.0.2 single instance setup. The web interface is very slow and under the command subsystem I see that snapshots_maintenance system job has last run time "never" . I tried running it manually from the web interface and it shows running but nothing really happens.

I can also see 2 x java procs using 700% and 200% CPU on top command.

We had this same issue while on the older version and that's why we upgraded to 2.0.2.

Going through the elasticsearch logs sometimes I see the following and then the web interface is not accessible:
[e8a605f9-0c8e-4f9c-8bac-db39f8cda6d3] All shards failed for phase: [query_fetch] org.elasticsearch.action.NoShardAvailableActionException: [nagioslogserver][0] null

At the same time in logstash log:
{:timestamp=>"2018-02-27T22:22:29.422000+0400", :message=>"Attempted to send a bulk request to Elasticsearch configured at '[\"http://localhost:9200\"]', but Elasticsearch appears to be unreachable or down!", :error_message=>"Connection refused (Connection refused)", :class=>"Manticore::SocketException", :level=>:error}

I have attached profile.
Thanks.

scottwilkerson · Post by **scottwilkerson** » Tue Feb 27, 2018 4:00 pm

Did the high CPU, slow web interface follow the manual execution of snapshots_maintenance ?

This could be quite intensive on a system that has a lot of data none of which has been previously backed up

mejokj · Post by **mejokj** » Tue Feb 27, 2018 11:55 pm

Not really, it has been slow for weeks now. The snapshots_maintenance system job shows running but last run time is "Never".

Where can I find logs for this job and how can I trigger it from the commandline so that I can see what's going on?

scottwilkerson · Post by **scottwilkerson** » Wed Feb 28, 2018 10:21 am

You cannot envoke it from the command line.

However I just looked closer at your profile and the machine is extremely overloaded.

having 5.4 TB of open indexes on a single instance with 28GB of RAM is WAY under powered.

Additionally with that amount of data, you would need extremely fast disks that the data would need to be stored on.

My recommendation would be to close some of the indexes to get the load down to a reasonable level (under 2 load on this single system)

Then from the Command Subsystem page I would click the "Reset All Jobs" to clear the running status of this job.

Finally, I strongly recommend adding an additional instance to the cluster to help with the loading (this instance should use different disks the the existing instance).

mejokj · Post by **mejokj** » Wed Feb 28, 2018 1:07 pm

Alright, what's the best way to close the indexes and how do I know which ones to close?

Thanks.

scottwilkerson · Post by **scottwilkerson** » Wed Feb 28, 2018 1:48 pm

mejokj wrote:Alright, what's the best way to close the indexes and how do I know which ones to close?

Thanks.

From Admin -> Indexes

you can close index by clicking close next to the index you want to close.

A "closed" index still exists on the hard drive, however is not searchable and does not take resources form the elasticsearch process. However it is not searchable while closed.

It can be re-opened and then it would use resources from elasticsearch and would be searchable.

Close indexes you don't see the need to immediate searches.

Once you get in a good place the backup and maintenance section of Admin allows you to specify how old index should be before they are automatically closed.

For example, if you only ever search current logs, or logs over the last week, there is no point in keeping months worth of log open, your system will love you for having ad few indexes as required open.

mejokj · Post by **mejokj** » Wed Feb 28, 2018 3:39 pm

Thanks, I closed all but the latest 14 indexes and I could immediately feel a difference on web interface. Things are working much faster now.

However the load average is still at around 15. Is there anything further I could do?

scottwilkerson · Post by **scottwilkerson** » Wed Feb 28, 2018 4:49 pm

mejokj wrote:Thanks, I closed all but the latest 14 indexes and I could immediately feel a difference on web interface. Things are working much faster now.

However the load average is still at around 15. Is there anything further I could do?

More memory (up to 64GB), faster disks (non-shared SSD's preferred) and more instances is really all that is left.

Nagios Support Forum

NLS web interface very slow

NLS web interface very slow

Re: NLS web interface very slow

Re: NLS web interface very slow

Re: NLS web interface very slow

Re: NLS web interface very slow

Re: NLS web interface very slow

Re: NLS web interface very slow

Re: NLS web interface very slow