Page 1 of 1

Logstash/Elasticsearch Services Crashing

Posted: Wed Apr 29, 2020 12:23 pm
by surunqu
Hello,

The elasticsearch and logstash services keep crashing. I am noticing the used memory on the server keeps increasing until one or both services crash, then returns to normal. Sometimes just the elasticsearch service will crash, and sometimes both the elasticsearch and then the logstash service will crash. If I reboot the server, everything will run fine until the memory get low again. I am using version 2.1.6 of Nagios Log Server. The vm is running Centos7, with 16GB RAM, and 16 vCPU. The system has a 4TB disk for the indexes. I have 705 devices sending logs to a single instance of Nagios Log Server. Any suggestions on how to troubleshoot further?

Last entry in /var/log/logstash/logstash.log:
{:timestamp=>"2020-04-28T19:16:49.402000-0700", :message=>"syslog listener died", :protocol=>:udp, :address=>"0.0.0.0:1514", :exception=>#<SocketError: recvfrom: name or service not known>, :backtrace=>["/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:138:in `udp_listener'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:117:in `server'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:97:in `run'"], :level=>:warn}

Last entry in /var/log/elasticsearch/c2180ab3-41d3-42ab-bbd4-dfd9f9f655fb.log
[2020-04-28 18:58:48,225][WARN ][index.merge.scheduler ] [7a763a0a-1a65-4738-9485-b96612b66187] [logstash-2020.04.24][1] failed to merge
org.apache.lucene.store.AlreadyClosedException: refusing to delete any files: this IndexWriter hit an unrecoverable exception
at org.apache.lucene.index.IndexFileDeleter.ensureOpen(IndexFileDeleter.java:354)
at org.apache.lucene.index.IndexFileDeleter.deleteFile(IndexFileDeleter.java:719)
at org.apache.lucene.index.IndexFileDeleter.refresh(IndexFileDeleter.java:451)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3826)
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:409)
at org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:107)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:486)
Caused by: java.lang.OutOfMemoryError: Java heap space

Thanks.

Re: Logstash/Elasticsearch Services Crashing

Posted: Wed Apr 29, 2020 4:47 pm
by cdienger
How many indices are open? How large are they on average? You can find this information under Admin > System > Cluster Status. The number of indices that are open can be controlled by setting the 'Close indexes older than' field under Admin > Snapshots & Maintenance. I suggest setting this to only keep open the indices typically needed for day to day searches - you can always open closed indices if you need to search further back. This will help keep only necessary data in memory.

Keep in mind that Elasticsearch will only take up to half of the systems memory for itself so in this case it is limited to 8GB which can be used pretty quickly depending on the amount of data being searched. You may benefit from increasing it but don't go over 64GB total on the system. After that there can be a hit to performance.