Page 1 of 5
Nagios Log Server listening port abruptly halts v2
Posted: Wed May 10, 2017 9:31 pm
by james.liew
Referring back to here:
https://support.nagios.com/forum/viewto ... 37&t=43502
The same server has once again halted listening on port 3515
Re: Nagios Log Server listening port abruptly halts v2
Posted: Wed May 10, 2017 9:37 pm
by james.liew
Log file attached
Re: Nagios Log Server listening port abruptly halts v2
Posted: Thu May 11, 2017 11:53 am
by mcapra
Here's something that sticks out:
Code: Select all
{:timestamp=>"2017-05-11T02:26:06.903000+0200", :message=>"Got error to send bulk of actions: None of the configured nodes are available: []", :level=>:error}
{:timestamp=>"2017-05-11T02:26:06.922000+0200", :message=>"Failed to flush outgoing items", :outgoing_count=>9, :exception=>org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available etc etc ... }
Can you also share the Elasticsearch logs from this machine? Or all machines if there are multiple instances.
Re: Nagios Log Server listening port abruptly halts v2
Posted: Thu May 11, 2017 4:04 pm
by cdienger
We're also seeing more memory related issues when it appears ES is trying to do a merge:
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:391)
at org.elasticsearch.index.merge.EnableMergeScheduler.merge(EnableMergeScheduler.java:50)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1985)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1979)
at org.elasticsearch.index.engine.InternalEngine.maybeMerge(InternalEngine.java:793)
at org.elasticsearch.index.shard.IndexShard$EngineMerger$1.run(IndexShard.java:1237)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
Check the size of the primary indices by running:
curl -XGET http://localhost:9200/_cat/indices?v
and looking at the pri.store.size column. Merging large indices may be too taxing for the system and you can try disabling optimization under Administration > System > Backup & Maintenance, by setting "Optimize Indexes older than" to 0.
Re: Nagios Log Server listening port abruptly halts v2
Posted: Fri May 12, 2017 3:52 am
by james.liew
mcapra wrote:Here's something that sticks out:
Code: Select all
{:timestamp=>"2017-05-11T02:26:06.903000+0200", :message=>"Got error to send bulk of actions: None of the configured nodes are available: []", :level=>:error}
{:timestamp=>"2017-05-11T02:26:06.922000+0200", :message=>"Failed to flush outgoing items", :outgoing_count=>9, :exception=>org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available etc etc ... }
Can you also share the Elasticsearch logs from this machine? Or all machines if there are multiple instances.
Already shared both elasticsearch and logstash logs above.
I will include our 2nd node in this post.
Re: Nagios Log Server listening port abruptly halts v2
Posted: Fri May 12, 2017 3:57 am
by james.liew
cdienger wrote:We're also seeing more memory related issues when it appears ES is trying to do a merge:
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:391)
at org.elasticsearch.index.merge.EnableMergeScheduler.merge(EnableMergeScheduler.java:50)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1985)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1979)
at org.elasticsearch.index.engine.InternalEngine.maybeMerge(InternalEngine.java:793)
at org.elasticsearch.index.shard.IndexShard$EngineMerger$1.run(IndexShard.java:1237)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
Check the size of the primary indices by running:
curl -XGET http://localhost:9200/_cat/indices?v
and looking at the pri.store.size column. Merging large indices may be too taxing for the system and you can try disabling optimization under Administration > System > Backup & Maintenance, by setting "Optimize Indexes older than" to 0.
I took a screencap as attached. Not of everything but what I could get from the top.
Re: Nagios Log Server listening port abruptly halts v2
Posted: Fri May 12, 2017 2:00 pm
by mcapra
Have you given this a shot?
cdienger wrote:Merging large indices may be too taxing for the system and you can try disabling optimization under Administration > System > Backup & Maintenance, by setting "Optimize Indexes older than" to 0.
Given that the last crash occurred during an optimization of indices, I think disabling this might be helpful. If you disable that job and the system is still unstable, could you provide fresh Elasticsearch logs?
Re: Nagios Log Server listening port abruptly halts v2
Posted: Sun May 14, 2017 8:33 pm
by james.liew
Is it okay to do without the indice optimization?
Re: Nagios Log Server listening port abruptly halts v2
Posted: Mon May 15, 2017 2:00 pm
by cdienger
The benefit of optimization from what I've gathered is mainly reducing the amount of time needed during restarts and has little impact on search performance. It is okay to disable it.
Re: Nagios Log Server listening port abruptly halts v2
Posted: Fri May 19, 2017 3:19 am
by james.liew
Changed it 3 days ago, still monitoring as of now.