Hi,
I am having an issue with NLS 2.0.2 single instance setup. The web interface is very slow and under the command subsystem I see that snapshots_maintenance system job has last run time "never" . I tried running it manually from the web interface and it shows running but nothing really happens.
I can also see 2 x java procs using 700% and 200% CPU on top command.
We had this same issue while on the older version and that's why we upgraded to 2.0.2.
Going through the elasticsearch logs sometimes I see the following and then the web interface is not accessible:
[e8a605f9-0c8e-4f9c-8bac-db39f8cda6d3] All shards failed for phase: [query_fetch] org.elasticsearch.action.NoShardAvailableActionException: [nagioslogserver][0] null
At the same time in logstash log:
{:timestamp=>"2018-02-27T22:22:29.422000+0400", :message=>"Attempted to send a bulk request to Elasticsearch configured at '[\"http://localhost:9200\"]', but Elasticsearch appears to be unreachable or down!", :error_message=>"Connection refused (Connection refused)", :class=>"Manticore::SocketException", :level=>:error}
I have attached profile.
Thanks.
NLS web interface very slow
NLS web interface very slow
You do not have the required permissions to view the files attached to this post.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: NLS web interface very slow
Did the high CPU, slow web interface follow the manual execution of snapshots_maintenance ?
This could be quite intensive on a system that has a lot of data none of which has been previously backed up
This could be quite intensive on a system that has a lot of data none of which has been previously backed up
Re: NLS web interface very slow
Not really, it has been slow for weeks now. The snapshots_maintenance system job shows running but last run time is "Never".
Where can I find logs for this job and how can I trigger it from the commandline so that I can see what's going on?
Where can I find logs for this job and how can I trigger it from the commandline so that I can see what's going on?
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: NLS web interface very slow
You cannot envoke it from the command line.
However I just looked closer at your profile and the machine is extremely overloaded.
having 5.4 TB of open indexes on a single instance with 28GB of RAM is WAY under powered.
Additionally with that amount of data, you would need extremely fast disks that the data would need to be stored on.
My recommendation would be to close some of the indexes to get the load down to a reasonable level (under 2 load on this single system)
Then from the Command Subsystem page I would click the "Reset All Jobs" to clear the running status of this job.
Finally, I strongly recommend adding an additional instance to the cluster to help with the loading (this instance should use different disks the the existing instance).
However I just looked closer at your profile and the machine is extremely overloaded.
having 5.4 TB of open indexes on a single instance with 28GB of RAM is WAY under powered.
Additionally with that amount of data, you would need extremely fast disks that the data would need to be stored on.
My recommendation would be to close some of the indexes to get the load down to a reasonable level (under 2 load on this single system)
Then from the Command Subsystem page I would click the "Reset All Jobs" to clear the running status of this job.
Finally, I strongly recommend adding an additional instance to the cluster to help with the loading (this instance should use different disks the the existing instance).
Re: NLS web interface very slow
Alright, what's the best way to close the indexes and how do I know which ones to close?
Thanks.
Thanks.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: NLS web interface very slow
From Admin -> Indexesmejokj wrote:Alright, what's the best way to close the indexes and how do I know which ones to close?
Thanks.
you can close index by clicking close next to the index you want to close.
A "closed" index still exists on the hard drive, however is not searchable and does not take resources form the elasticsearch process. However it is not searchable while closed.
It can be re-opened and then it would use resources from elasticsearch and would be searchable.
Close indexes you don't see the need to immediate searches.
Once you get in a good place the backup and maintenance section of Admin allows you to specify how old index should be before they are automatically closed.
For example, if you only ever search current logs, or logs over the last week, there is no point in keeping months worth of log open, your system will love you for having ad few indexes as required open.
Re: NLS web interface very slow
Thanks, I closed all but the latest 14 indexes and I could immediately feel a difference on web interface. Things are working much faster now.
However the load average is still at around 15. Is there anything further I could do?
However the load average is still at around 15. Is there anything further I could do?
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: NLS web interface very slow
More memory (up to 64GB), faster disks (non-shared SSD's preferred) and more instances is really all that is left.mejokj wrote:Thanks, I closed all but the latest 14 indexes and I could immediately feel a difference on web interface. Things are working much faster now.
However the load average is still at around 15. Is there anything further I could do?