Rising amount of open files

Post by **WillemDH** » Wed Nov 02, 2016 3:57 am

Just noticed a surprisingly growing trend in open files on my NLS nodes.

Any tips how to troubleshoot what's going on here before the whole thing explodes are welcome. Check the screenshot:

The only thing that really changed lately was that we added 16 GB on each node, so they have 32 GB each now. During this process, the servers were rebooted. So it could be normal that elasticsearch has more open fiels, as it has more space?

It seem a reboot of a node brings the number of opened files back to a very low level again. The next days will show if it starts rising again.

Post by **mcapra** » Wed Nov 02, 2016 10:13 am

WillemDH wrote:It seem a reboot of a node brings the number of opened files back to a very low level again.

I would suggest seeing if restarting the logstash service specifically brings the file count down. There is a known issue with a large number of tcp inputs in logstash:
https://github.com/elastic/logstash/issues/4225
https://github.com/elastic/logstash/issues/4815

Post by **WillemDH** » Fri Nov 04, 2016 4:33 am

Restarting Logstash service doesn't rteally seem to change something. It would be nice to be able to compare my amount of open files with another environment about the same size (35-50 GB / day and 32 Gb RAM / node).

Post by **mcapra** » Fri Nov 04, 2016 1:55 pm

WillemDH wrote:Restarting Logstash service doesn't rteally seem to change something.

That is good.

On the topic of adding instances/nodes, an increase in the number of open files is expected. Mostly because elasticsearch needs to keep better track of shard allocation on the back-end of things.

Post by **WillemDH** » Wed Nov 16, 2016 8:29 am

Mcapra,

Fyi, we did have an actual problem on our ELK/NLS stack. I didn't immediately notice the day I made this thread, but in the meantime it became clear that the system job "backup_maintenance" was stuck again somehow. The issue was again solved after resetting all jobs in the gui. I forgot to execute the curator commands as asked in https://support.nagios.com/forum/viewto ... 38&t=40346.

Please leave this thread open. I highly suspenct this issue to reoccur, so I will execute the curator jobs manually next time and post the output.

What is generally the reason the maintenance jobs fail? I can't be the only person havong this issue.

Willem

Post by **mcapra** » Wed Nov 16, 2016 11:24 am

The elasticsearch log from the last run of backup_maintenance (Around Oct 22nd by the looks of it) might shed some light on this issue. That particular job isn't doing anything more sophisticated than running a few curator commands, but if elasticsearch was having issues during that period that could be one source of the problem. If the job failed for some reason, it should be flagged as such and reschedule in the immediate future. Perhaps the job didn't properly detect a failure.

Post by **WillemDH** » Wed Nov 16, 2016 11:32 am

And where can I find this log?

Post by **mcapra** » Wed Nov 16, 2016 12:19 pm

/var/log/elasticsearch/<cluster_id>.log is where it is usually located. I'm not sure if there will still be a copy from October 24th, but it would be labled something like <cluster_id>.log-20161024.gz. We may need to examine other days as well. I haven't heard of that particular job hanging before but it is something that is definitely troubling.

Post by **WillemDH** » Tue Jan 17, 2017 1:24 pm

Didn't have this issue for some time now.. See screenshot. Thread can be closed.

Nagios Support Forum

Rising amount of open files

Rising amount of open files

Re: Rising amount of open files

Re: Rising amount of open files

Re: Rising amount of open files

Re: Rising amount of open files

Re: Rising amount of open files

Re: Rising amount of open files

Re: Rising amount of open files

Re: Rising amount of open files