NLS web interface very slow

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Locked
mejokj
Posts: 353
Joined: Mon Jul 22, 2013 10:31 pm

NLS web interface very slow

Post by mejokj »

Hi,

I am having an issue with NLS 2.0.2 single instance setup. The web interface is very slow and under the command subsystem I see that snapshots_maintenance system job has last run time "never" . I tried running it manually from the web interface and it shows running but nothing really happens.

I can also see 2 x java procs using 700% and 200% CPU on top command.

We had this same issue while on the older version and that's why we upgraded to 2.0.2.

Going through the elasticsearch logs sometimes I see the following and then the web interface is not accessible:
[e8a605f9-0c8e-4f9c-8bac-db39f8cda6d3] All shards failed for phase: [query_fetch] org.elasticsearch.action.NoShardAvailableActionException: [nagioslogserver][0] null

At the same time in logstash log:
{:timestamp=>"2018-02-27T22:22:29.422000+0400", :message=>"Attempted to send a bulk request to Elasticsearch configured at '[\"http://localhost:9200\"]', but Elasticsearch appears to be unreachable or down!", :error_message=>"Connection refused (Connection refused)", :class=>"Manticore::SocketException", :level=>:error}


I have attached profile.
Thanks.
You do not have the required permissions to view the files attached to this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: NLS web interface very slow

Post by scottwilkerson »

Did the high CPU, slow web interface follow the manual execution of snapshots_maintenance ?

This could be quite intensive on a system that has a lot of data none of which has been previously backed up
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
mejokj
Posts: 353
Joined: Mon Jul 22, 2013 10:31 pm

Re: NLS web interface very slow

Post by mejokj »

Not really, it has been slow for weeks now. The snapshots_maintenance system job shows running but last run time is "Never".

Where can I find logs for this job and how can I trigger it from the commandline so that I can see what's going on?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: NLS web interface very slow

Post by scottwilkerson »

You cannot envoke it from the command line.

However I just looked closer at your profile and the machine is extremely overloaded.

having 5.4 TB of open indexes on a single instance with 28GB of RAM is WAY under powered.

Additionally with that amount of data, you would need extremely fast disks that the data would need to be stored on.

My recommendation would be to close some of the indexes to get the load down to a reasonable level (under 2 load on this single system)

Then from the Command Subsystem page I would click the "Reset All Jobs" to clear the running status of this job.

Finally, I strongly recommend adding an additional instance to the cluster to help with the loading (this instance should use different disks the the existing instance).
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
mejokj
Posts: 353
Joined: Mon Jul 22, 2013 10:31 pm

Re: NLS web interface very slow

Post by mejokj »

Alright, what's the best way to close the indexes and how do I know which ones to close?

Thanks.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: NLS web interface very slow

Post by scottwilkerson »

mejokj wrote:Alright, what's the best way to close the indexes and how do I know which ones to close?

Thanks.
From Admin -> Indexes

you can close index by clicking close next to the index you want to close.

A "closed" index still exists on the hard drive, however is not searchable and does not take resources form the elasticsearch process. However it is not searchable while closed.

It can be re-opened and then it would use resources from elasticsearch and would be searchable.

Close indexes you don't see the need to immediate searches.

Once you get in a good place the backup and maintenance section of Admin allows you to specify how old index should be before they are automatically closed.

For example, if you only ever search current logs, or logs over the last week, there is no point in keeping months worth of log open, your system will love you for having ad few indexes as required open.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
mejokj
Posts: 353
Joined: Mon Jul 22, 2013 10:31 pm

Re: NLS web interface very slow

Post by mejokj »

Thanks, I closed all but the latest 14 indexes and I could immediately feel a difference on web interface. Things are working much faster now.

However the load average is still at around 15. Is there anything further I could do?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: NLS web interface very slow

Post by scottwilkerson »

mejokj wrote:Thanks, I closed all but the latest 14 indexes and I could immediately feel a difference on web interface. Things are working much faster now.

However the load average is still at around 15. Is there anything further I could do?
More memory (up to 64GB), faster disks (non-shared SSD's preferred) and more instances is really all that is left.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked