Just noticed a surprisingly growing trend in open files on my NLS nodes.
Any tips how to troubleshoot what's going on here before the whole thing explodes are welcome. Check the screenshot:
The only thing that really changed lately was that we added 16 GB on each node, so they have 32 GB each now. During this process, the servers were rebooted. So it could be normal that elasticsearch has more open fiels, as it has more space?
It seem a reboot of a node brings the number of opened files back to a very low level again. The next days will show if it starts rising again.
Rising amount of open files
Rising amount of open files
You do not have the required permissions to view the files attached to this post.
Nagios XI 5.8.1
https://outsideit.net
https://outsideit.net
Re: Rising amount of open files
I would suggest seeing if restarting the logstash service specifically brings the file count down. There is a known issue with a large number of tcp inputs in logstash:WillemDH wrote:It seem a reboot of a node brings the number of opened files back to a very low level again.
https://github.com/elastic/logstash/issues/4225
https://github.com/elastic/logstash/issues/4815
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
Re: Rising amount of open files
Restarting Logstash service doesn't rteally seem to change something. It would be nice to be able to compare my amount of open files with another environment about the same size (35-50 GB / day and 32 Gb RAM / node).
Nagios XI 5.8.1
https://outsideit.net
https://outsideit.net
Re: Rising amount of open files
That is good.WillemDH wrote:Restarting Logstash service doesn't rteally seem to change something.
On the topic of adding instances/nodes, an increase in the number of open files is expected. Mostly because elasticsearch needs to keep better track of shard allocation on the back-end of things.
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
Re: Rising amount of open files
Mcapra,
Fyi, we did have an actual problem on our ELK/NLS stack. I didn't immediately notice the day I made this thread, but in the meantime it became clear that the system job "backup_maintenance" was stuck again somehow. The issue was again solved after resetting all jobs in the gui. I forgot to execute the curator commands as asked in https://support.nagios.com/forum/viewto ... 38&t=40346.
Please leave this thread open. I highly suspenct this issue to reoccur, so I will execute the curator jobs manually next time and post the output.
What is generally the reason the maintenance jobs fail? I can't be the only person havong this issue.
Willem
Fyi, we did have an actual problem on our ELK/NLS stack. I didn't immediately notice the day I made this thread, but in the meantime it became clear that the system job "backup_maintenance" was stuck again somehow. The issue was again solved after resetting all jobs in the gui. I forgot to execute the curator commands as asked in https://support.nagios.com/forum/viewto ... 38&t=40346.
Please leave this thread open. I highly suspenct this issue to reoccur, so I will execute the curator jobs manually next time and post the output.
What is generally the reason the maintenance jobs fail? I can't be the only person havong this issue.
Willem
You do not have the required permissions to view the files attached to this post.
Nagios XI 5.8.1
https://outsideit.net
https://outsideit.net
Re: Rising amount of open files
The elasticsearch log from the last run of backup_maintenance (Around Oct 22nd by the looks of it) might shed some light on this issue. That particular job isn't doing anything more sophisticated than running a few curator commands, but if elasticsearch was having issues during that period that could be one source of the problem. If the job failed for some reason, it should be flagged as such and reschedule in the immediate future. Perhaps the job didn't properly detect a failure.
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
Re: Rising amount of open files
And where can I find this log?
Nagios XI 5.8.1
https://outsideit.net
https://outsideit.net
Re: Rising amount of open files
/var/log/elasticsearch/<cluster_id>.log is where it is usually located. I'm not sure if there will still be a copy from October 24th, but it would be labled something like <cluster_id>.log-20161024.gz. We may need to examine other days as well. I haven't heard of that particular job hanging before but it is something that is definitely troubling.
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
Re: Rising amount of open files
Didn't have this issue for some time now.. See screenshot. Thread can be closed.
You do not have the required permissions to view the files attached to this post.
Nagios XI 5.8.1
https://outsideit.net
https://outsideit.net