Nagios Log Server listening port abruptly halts v2

james.liew · Post by **james.liew** » Wed May 10, 2017 9:31 pm

Referring back to here: https://support.nagios.com/forum/viewto ... 37&t=43502

The same server has once again halted listening on port 3515

james.liew · Post by **james.liew** » Wed May 10, 2017 9:37 pm

Log file attached

Post by **mcapra** » Thu May 11, 2017 11:53 am

Here's something that sticks out:

Code: Select all

{:timestamp=>"2017-05-11T02:26:06.903000+0200", :message=>"Got error to send bulk of actions: None of the configured nodes are available: []", :level=>:error}
{:timestamp=>"2017-05-11T02:26:06.922000+0200", :message=>"Failed to flush outgoing items", :outgoing_count=>9, :exception=>org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available etc etc ... }

Can you also share the Elasticsearch logs from this machine? Or all machines if there are multiple instances.

Post by **cdienger** » Thu May 11, 2017 4:04 pm

We're also seeing more memory related issues when it appears ES is trying to do a merge:

java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:391)
at org.elasticsearch.index.merge.EnableMergeScheduler.merge(EnableMergeScheduler.java:50)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1985)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1979)
at org.elasticsearch.index.engine.InternalEngine.maybeMerge(InternalEngine.java:793)
at org.elasticsearch.index.shard.IndexShard$EngineMerger$1.run(IndexShard.java:1237)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

Check the size of the primary indices by running:

curl -XGET http://localhost:9200/_cat/indices?v

and looking at the pri.store.size column. Merging large indices may be too taxing for the system and you can try disabling optimization under Administration > System > Backup & Maintenance, by setting "Optimize Indexes older than" to 0.

james.liew · Post by **james.liew** » Fri May 12, 2017 3:52 am

mcapra wrote:Here's something that sticks out:

Code: Select all

{:timestamp=>"2017-05-11T02:26:06.903000+0200", :message=>"Got error to send bulk of actions: None of the configured nodes are available: []", :level=>:error}
{:timestamp=>"2017-05-11T02:26:06.922000+0200", :message=>"Failed to flush outgoing items", :outgoing_count=>9, :exception=>org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available etc etc ... }

Can you also share the Elasticsearch logs from this machine? Or all machines if there are multiple instances.

Already shared both elasticsearch and logstash logs above.

I will include our 2nd node in this post.

james.liew · Post by **james.liew** » Fri May 12, 2017 3:57 am

cdienger wrote:We're also seeing more memory related issues when it appears ES is trying to do a merge:

java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:391)
at org.elasticsearch.index.merge.EnableMergeScheduler.merge(EnableMergeScheduler.java:50)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1985)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1979)
at org.elasticsearch.index.engine.InternalEngine.maybeMerge(InternalEngine.java:793)
at org.elasticsearch.index.shard.IndexShard$EngineMerger$1.run(IndexShard.java:1237)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

Check the size of the primary indices by running:

curl -XGET http://localhost:9200/_cat/indices?v

and looking at the pri.store.size column. Merging large indices may be too taxing for the system and you can try disabling optimization under Administration > System > Backup & Maintenance, by setting "Optimize Indexes older than" to 0.

I took a screencap as attached. Not of everything but what I could get from the top.

Post by **mcapra** » Fri May 12, 2017 2:00 pm

Have you given this a shot?

cdienger wrote:Merging large indices may be too taxing for the system and you can try disabling optimization under Administration > System > Backup & Maintenance, by setting "Optimize Indexes older than" to 0.

Given that the last crash occurred during an optimization of indices, I think disabling this might be helpful. If you disable that job and the system is still unstable, could you provide fresh Elasticsearch logs?

james.liew · Post by **james.liew** » Sun May 14, 2017 8:33 pm

Is it okay to do without the indice optimization?

Post by **cdienger** » Mon May 15, 2017 2:00 pm

The benefit of optimization from what I've gathered is mainly reducing the amount of time needed during restarts and has little impact on search performance. It is okay to disable it.

james.liew · Post by **james.liew** » Fri May 19, 2017 3:19 am

Changed it 3 days ago, still monitoring as of now.

Nagios Support Forum

Nagios Log Server listening port abruptly halts v2

Nagios Log Server listening port abruptly halts v2

Re: Nagios Log Server listening port abruptly halts v2

Re: Nagios Log Server listening port abruptly halts v2

Re: Nagios Log Server listening port abruptly halts v2

Re: Nagios Log Server listening port abruptly halts v2

Re: Nagios Log Server listening port abruptly halts v2

Re: Nagios Log Server listening port abruptly halts v2

Re: Nagios Log Server listening port abruptly halts v2

Re: Nagios Log Server listening port abruptly halts v2

Re: Nagios Log Server listening port abruptly halts v2