Nagios Support Forum

Posted: **Mon Mar 20, 2017 3:01 pm**

Hello:

When I start my Log server, the elasticsearch log is filled with these errors:

[netty.channel.socket.nio.AbstractNioSelector] Failed to accept a connection.
java.io.IOException: Too many open files

I have tried the pool and connection file hotfix, raised the system file limit and raised the logstash file limit all to no avail. Are there any other things I can try?

Thanks!

Posted: **Mon Mar 20, 2017 3:06 pm**

Can you try raising the MAX_OPEN_FILES setting in /etc/sysconfig/elasticsearch? Try doubling it, restarting the elasticsearch service, and see if that error is still produced. You'll need to do this for each instance if you have multiples.

Posted: **Tue Mar 21, 2017 7:33 am**

I made that change and restarted and I am having the same issues:

[2017-03-21 08:29:04,213][WARN ][netty.channel.socket.nio.AbstractNioSelector] Failed to accept a connection.
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at org.elasticsearch.common.netty.channel.socket.nio.NioServerBoss.process(NioServerBoss.java:100)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.elasticsearch.common.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Another error:

[2017-03-21 08:28:55,202][WARN ][indices.cluster ] [330efcd2-34fc-4f7f-9cba-df89a1374eee] [[logstash-2017.01.23][1]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [logstash-2017.01.23][1] failed recovery
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:162)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.engine.EngineCreationFailureException: [logstash-2017.01.23][1] failed to open reader on writer
at org.elasticsearch.index.engine.InternalEngine.createSearcherManager(InternalEngine.java:211)
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:156)
at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:32)
at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:1351)
at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:1346)
at org.elasticsearch.index.shard.IndexShard.prepareForTranslogRecovery(IndexShard.java:866)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:233)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:112)

Another:

[2017-03-21 08:28:55,208][WARN ][indices.cluster ] [330efcd2-34fc-4f7f-9cba-df89a1374eee] [[logstash-2017.02.12][1]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [logstash-2017.02.12][1] failed recovery
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:162)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.engine.EngineCreationFailureException: [logstash-2017.02.12][1] failed to open reader on writer
at org.elasticsearch.index.engine.InternalEngine.createSearcherManager(InternalEngine.java:211)
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:156)
at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:32)
at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:1351)
at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:1346)
at org.elasticsearch.index.shard.IndexShard.prepareForTranslogRecovery(IndexShard.java:866)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:233)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:112)
... 3 more
Caused by: java.nio.file.FileSystemException: /NagLogs/data/46670a84-8052-4f7e-8810-e4bbd8dfdacf/nodes/0/indices/logstash-2017.02.12/1/index/_pdg.cfs: Too many open files
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
at java.nio.channels.FileChannel.open(FileChannel.java:287)
at java.nio.channels.FileChannel.open(FileChannel.java:334)
at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81)
at org.apache.lucene.store.FileSwitchDirectory.openInput(FileSwitchDirectory.java:172)
at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:80)
at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:80)
at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:80)
at org.elasticsearch.index.store.Store$StoreDirectory.openInput(Store.java:733)
at org.apache.lucene.store.CompoundFileDirectory.<init>(CompoundFileDirectory.java:104)
at org.apache.lucene.index.SegmentReader.readFieldInfos(SegmentReader.java:274)
at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:107)
at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145)
at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:239)
at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:109)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:421)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:112)
at org.apache.lucene.search.SearcherManager.<init>(SearcherManager.java:89)
at org.elasticsearch.index.engine.InternalEngine.createSearcherManager(InternalEngine.java:196)
... 10 more

Posted: **Tue Mar 21, 2017 11:39 am**

Can you share the outputs of:

Code: Select all

#yum install lsof if needed
lsof | grep logstash | wc -l
netstat -an | wc -l

ulimit -Sn

su nagios
ulimit -Sn
exit

Posted: **Tue Mar 21, 2017 12:05 pm**

Hi:

Here are the results:

ulimit.JPG

Thanks!

Posted: **Tue Mar 21, 2017 3:30 pm**

Hmm, how about the output of:

Code: Select all

ulimit -Hn

I think we may still be going over this system's hard limit for open files. I just want to rule that out.

Posted: **Wed Mar 22, 2017 7:14 am**

Hi, it is 4096.

Thanks!

Posted: **Wed Mar 22, 2017 2:09 pm**

If the hard nofiles limit is 4096, I would suggest increasing that substantially since it appears as though there's over 4000000 files open by Elasticsearch currently. That doesn't mean 4000000 descriptors, but I would wager the hard limit of 4096 is prohibitive in this case.

Posted: **Wed Mar 22, 2017 2:42 pm**

mcapra wrote:If the hard nofiles limit is 4096, I would suggest increasing that substantially since it appears as though there's over 4000000 files open by Elasticsearch currently. That doesn't mean 4000000 descriptors, but I would wager the hard limit of 4096 is prohibitive in this case.

I am worried about this bug:

https://access.redhat.com/solutions/43926

Is there a sane value you have set before?

Thanks!

Posted: **Wed Mar 22, 2017 4:25 pm**

You should be fine as long as you follow that article's solution. You can register an account for free to view the solution. Otherwise:

Code: Select all

Login denied after setting nofile limits. Secure log shows the error "error: PAM: pam_open_session(): Permission denied"
Solution Verified - Updated June 30 2016 at 11:43 AM - English
Environment

    Red Hat Enterprise Linux 5
    Red Hat Enterprise Linux 6

Issue

    Setting a hard limit in ulimit on nofile to anything higher than 1048576 (including unlimited) fails, and prevents logins from working.

    After setting the following in /etc/security/limits.conf
    Raw

    * soft nofile 5000000
    * hard nofile 5000000 

no one can login, and /var/log/secure file shows
Raw

sshd[3889]: error: PAM: pam_open_session(): Permission denied

Resolution

Raise the value of the sysctl setting fs.nr_open (/proc/sys/fs/nr_open) to something greater than or equal to the value being set in ulimit.
Raw

# echo "5000000" > /proc/sys/fs/nr_open

(temporarily)

OR
Raw

# echo "fs.nr_open = 5000000" >> /etc/sysctl.conf
# sysctl -p

(persistently)

Now ulimit should allow a nofile value less than or equal to the fs.nr_open value.

Note
If all login sessions are terminated, it's required to fix /etc/security/limits.conf file by rescue mode.
Root Cause

setrlimit(2) does not allow for the nofile limit to be set to more than fs.nr_open.

    Product(s) Red Hat Enterprise Linux 

    Component pam 

    Category Learn more 

    Tags rhel_5 

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Nagios Support Forum

Too many files open error

Too many files open error

Re: Too many files open error

Re: Too many files open error

Re: Too many files open error

Re: Too many files open error

Re: Too many files open error

Re: Too many files open error

Re: Too many files open error

Re: Too many files open error

Re: Too many files open error