Too many files open error

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
CameronWP
Posts: 134
Joined: Fri Apr 17, 2015 2:17 pm

Too many files open error

Post by CameronWP »

Hello:

When I start my Log server, the elasticsearch log is filled with these errors:

[netty.channel.socket.nio.AbstractNioSelector] Failed to accept a connection.
java.io.IOException: Too many open files

I have tried the pool and connection file hotfix, raised the system file limit and raised the logstash file limit all to no avail. Are there any other things I can try?

Thanks!
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Too many files open error

Post by mcapra »

Can you try raising the MAX_OPEN_FILES setting in /etc/sysconfig/elasticsearch? Try doubling it, restarting the elasticsearch service, and see if that error is still produced. You'll need to do this for each instance if you have multiples.
Former Nagios employee
https://www.mcapra.com/
CameronWP
Posts: 134
Joined: Fri Apr 17, 2015 2:17 pm

Re: Too many files open error

Post by CameronWP »

I made that change and restarted and I am having the same issues:

[2017-03-21 08:29:04,213][WARN ][netty.channel.socket.nio.AbstractNioSelector] Failed to accept a connection.
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at org.elasticsearch.common.netty.channel.socket.nio.NioServerBoss.process(NioServerBoss.java:100)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.elasticsearch.common.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Another error:

[2017-03-21 08:28:55,202][WARN ][indices.cluster ] [330efcd2-34fc-4f7f-9cba-df89a1374eee] [[logstash-2017.01.23][1]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [logstash-2017.01.23][1] failed recovery
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:162)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.engine.EngineCreationFailureException: [logstash-2017.01.23][1] failed to open reader on writer
at org.elasticsearch.index.engine.InternalEngine.createSearcherManager(InternalEngine.java:211)
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:156)
at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:32)
at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:1351)
at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:1346)
at org.elasticsearch.index.shard.IndexShard.prepareForTranslogRecovery(IndexShard.java:866)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:233)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:112)

Another:

[2017-03-21 08:28:55,208][WARN ][indices.cluster ] [330efcd2-34fc-4f7f-9cba-df89a1374eee] [[logstash-2017.02.12][1]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [logstash-2017.02.12][1] failed recovery
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:162)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.engine.EngineCreationFailureException: [logstash-2017.02.12][1] failed to open reader on writer
at org.elasticsearch.index.engine.InternalEngine.createSearcherManager(InternalEngine.java:211)
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:156)
at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:32)
at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:1351)
at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:1346)
at org.elasticsearch.index.shard.IndexShard.prepareForTranslogRecovery(IndexShard.java:866)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:233)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:112)
... 3 more
Caused by: java.nio.file.FileSystemException: /NagLogs/data/46670a84-8052-4f7e-8810-e4bbd8dfdacf/nodes/0/indices/logstash-2017.02.12/1/index/_pdg.cfs: Too many open files
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
at java.nio.channels.FileChannel.open(FileChannel.java:287)
at java.nio.channels.FileChannel.open(FileChannel.java:334)
at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81)
at org.apache.lucene.store.FileSwitchDirectory.openInput(FileSwitchDirectory.java:172)
at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:80)
at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:80)
at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:80)
at org.elasticsearch.index.store.Store$StoreDirectory.openInput(Store.java:733)
at org.apache.lucene.store.CompoundFileDirectory.<init>(CompoundFileDirectory.java:104)
at org.apache.lucene.index.SegmentReader.readFieldInfos(SegmentReader.java:274)
at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:107)
at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145)
at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:239)
at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:109)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:421)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:112)
at org.apache.lucene.search.SearcherManager.<init>(SearcherManager.java:89)
at org.elasticsearch.index.engine.InternalEngine.createSearcherManager(InternalEngine.java:196)
... 10 more
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Too many files open error

Post by mcapra »

Can you share the outputs of:

Code: Select all

#yum install lsof if needed
lsof | grep logstash | wc -l
netstat -an | wc -l

ulimit -Sn

su nagios
ulimit -Sn
exit
Former Nagios employee
https://www.mcapra.com/
CameronWP
Posts: 134
Joined: Fri Apr 17, 2015 2:17 pm

Re: Too many files open error

Post by CameronWP »

Hi:

Here are the results:
ulimit.JPG
Thanks!
You do not have the required permissions to view the files attached to this post.
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Too many files open error

Post by mcapra »

Hmm, how about the output of:

Code: Select all

ulimit -Hn
I think we may still be going over this system's hard limit for open files. I just want to rule that out.
Former Nagios employee
https://www.mcapra.com/
CameronWP
Posts: 134
Joined: Fri Apr 17, 2015 2:17 pm

Re: Too many files open error

Post by CameronWP »

Hi, it is 4096.

Thanks!
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Too many files open error

Post by mcapra »

If the hard nofiles limit is 4096, I would suggest increasing that substantially since it appears as though there's over 4000000 files open by Elasticsearch currently. That doesn't mean 4000000 descriptors, but I would wager the hard limit of 4096 is prohibitive in this case.
Former Nagios employee
https://www.mcapra.com/
CameronWP
Posts: 134
Joined: Fri Apr 17, 2015 2:17 pm

Re: Too many files open error

Post by CameronWP »

mcapra wrote:If the hard nofiles limit is 4096, I would suggest increasing that substantially since it appears as though there's over 4000000 files open by Elasticsearch currently. That doesn't mean 4000000 descriptors, but I would wager the hard limit of 4096 is prohibitive in this case.

I am worried about this bug:

https://access.redhat.com/solutions/43926

Is there a sane value you have set before?

Thanks!
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Too many files open error

Post by mcapra »

You should be fine as long as you follow that article's solution. You can register an account for free to view the solution. Otherwise:

Code: Select all

Login denied after setting nofile limits. Secure log shows the error "error: PAM: pam_open_session(): Permission denied"
Solution Verified - Updated June 30 2016 at 11:43 AM - English
Environment

    Red Hat Enterprise Linux 5
    Red Hat Enterprise Linux 6

Issue

    Setting a hard limit in ulimit on nofile to anything higher than 1048576 (including unlimited) fails, and prevents logins from working.

    After setting the following in /etc/security/limits.conf
    Raw

    * soft nofile 5000000
    * hard nofile 5000000 

no one can login, and /var/log/secure file shows
Raw

sshd[3889]: error: PAM: pam_open_session(): Permission denied

Resolution

Raise the value of the sysctl setting fs.nr_open (/proc/sys/fs/nr_open) to something greater than or equal to the value being set in ulimit.
Raw

# echo "5000000" > /proc/sys/fs/nr_open

(temporarily)

OR
Raw

# echo "fs.nr_open = 5000000" >> /etc/sysctl.conf
# sysctl -p

(persistently)

Now ulimit should allow a nofile value less than or equal to the fs.nr_open value.

Note
If all login sessions are terminated, it's required to fix /etc/security/limits.conf file by rescue mode.
Root Cause

setrlimit(2) does not allow for the nofile limit to be set to more than fs.nr_open.

    Product(s) Red Hat Enterprise Linux 

    Component pam 

    Category Learn more 

    Tags rhel_5 

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Former Nagios employee
https://www.mcapra.com/
Locked