Rising amount of open files

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Locked
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Rising amount of open files

Post by WillemDH »

Just noticed a surprisingly growing trend in open files on my NLS nodes.

Any tips how to troubleshoot what's going on here before the whole thing explodes are welcome. Check the screenshot:

The only thing that really changed lately was that we added 16 GB on each node, so they have 32 GB each now. During this process, the servers were rebooted. So it could be normal that elasticsearch has more open fiels, as it has more space?

It seem a reboot of a node brings the number of opened files back to a very low level again. The next days will show if it starts rising again.
You do not have the required permissions to view the files attached to this post.
Nagios XI 5.8.1
https://outsideit.net
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Rising amount of open files

Post by mcapra »

WillemDH wrote:It seem a reboot of a node brings the number of opened files back to a very low level again.
I would suggest seeing if restarting the logstash service specifically brings the file count down. There is a known issue with a large number of tcp inputs in logstash:
https://github.com/elastic/logstash/issues/4225
https://github.com/elastic/logstash/issues/4815
Former Nagios employee
https://www.mcapra.com/
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Rising amount of open files

Post by WillemDH »

Restarting Logstash service doesn't rteally seem to change something. It would be nice to be able to compare my amount of open files with another environment about the same size (35-50 GB / day and 32 Gb RAM / node).
Nagios XI 5.8.1
https://outsideit.net
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Rising amount of open files

Post by mcapra »

WillemDH wrote:Restarting Logstash service doesn't rteally seem to change something.
That is good.

On the topic of adding instances/nodes, an increase in the number of open files is expected. Mostly because elasticsearch needs to keep better track of shard allocation on the back-end of things.
Former Nagios employee
https://www.mcapra.com/
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Rising amount of open files

Post by WillemDH »

Mcapra,

Fyi, we did have an actual problem on our ELK/NLS stack. I didn't immediately notice the day I made this thread, but in the meantime it became clear that the system job "backup_maintenance" was stuck again somehow. The issue was again solved after resetting all jobs in the gui. I forgot to execute the curator commands as asked in https://support.nagios.com/forum/viewto ... 38&t=40346.

Please leave this thread open. I highly suspenct this issue to reoccur, so I will execute the curator jobs manually next time and post the output.

What is generally the reason the maintenance jobs fail? I can't be the only person havong this issue.

Willem
You do not have the required permissions to view the files attached to this post.
Nagios XI 5.8.1
https://outsideit.net
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Rising amount of open files

Post by mcapra »

The elasticsearch log from the last run of backup_maintenance (Around Oct 22nd by the looks of it) might shed some light on this issue. That particular job isn't doing anything more sophisticated than running a few curator commands, but if elasticsearch was having issues during that period that could be one source of the problem. If the job failed for some reason, it should be flagged as such and reschedule in the immediate future. Perhaps the job didn't properly detect a failure.
Former Nagios employee
https://www.mcapra.com/
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Rising amount of open files

Post by WillemDH »

And where can I find this log?
Nagios XI 5.8.1
https://outsideit.net
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Rising amount of open files

Post by mcapra »

/var/log/elasticsearch/<cluster_id>.log is where it is usually located. I'm not sure if there will still be a copy from October 24th, but it would be labled something like <cluster_id>.log-20161024.gz. We may need to examine other days as well. I haven't heard of that particular job hanging before but it is something that is definitely troubling.
Former Nagios employee
https://www.mcapra.com/
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Rising amount of open files

Post by WillemDH »

Didn't have this issue for some time now.. See screenshot. Thread can be closed.
You do not have the required permissions to view the files attached to this post.
Nagios XI 5.8.1
https://outsideit.net
Locked