Page 1 of 2
Nagios Log server stops writing at 96GB limit.
Posted: Wed Sep 21, 2016 7:58 am
by mulo
Hi,
We've encoutered an issue with the Nagios Logserver.
We are running it on one instance and the primary size is around 95GB. We've expanded the filesystem size to 500GB and its visible in the Nagios log server GUI but Nagios Log server stops writing at 96GB.
Any idea how we can fix this?
Thanks in advance!
Re: Nagios Log server stops writing at 96GB limit.
Posted: Wed Sep 21, 2016 11:17 am
by rkennedy
What is the output of df -H on the NLS machine?
Re: Nagios Log server stops writing at 96GB limit.
Posted: Wed Sep 21, 2016 6:34 pm
by Box293
You may also be hitting the "watermark" which is explained in this KB article:
https://support.nagios.com/kb/article.php?id=469
Re: Nagios Log server stops writing at 96GB limit.
Posted: Mon Oct 03, 2016 3:26 am
by mulo
Hi guys,
Unfortunately, Its neither of the above.
The machine is wel passed the 96 GB limit. Now on 126GB and it was working fine till today.
According the logging of elasticsearch:
[2016-10-03 08:11:24,995][DEBUG][action.search.type ] [752da908-2510-4b59-bdd3-b1b48ed5134a] All shards failed for phase: [query]
org.elasticsearch.search.query.QueryPhaseExecutionException: [logstash-2016.10.03][4]: query[ConstantScore(*:*)],from[0],size[0]: Query Failed [Failed to execute global facets]
at org.elasticsearch.search.facet.FacetPhase.execute(FacetPhase.java:193)
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:171)
at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:289)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:300)
at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:231)
at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:228)
at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:559)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.ElasticsearchException: org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [@timestamp] would be larger than limit of [2440298496/2.2gb]
at org.elasticsearch.index.fielddata.plain.AbstractIndexFieldData.load(AbstractIndexFieldData.java:80)
at org.elasticsearch.search.facet.datehistogram.CountDateHistogramFacetExecutor$Collector.setNextReader(CountDateHistogramFacetExecutor.java:88)
at org.elasticsearch.common.lucene.search.FilteredCollector.setNextReader(FilteredCollector.java:67)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)
at org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
at org.elasticsearch.search.facet.FacetPhase.execute(FacetPhase.java:186)
... 9 more
This looks like an memory error, but I cant find the specific config file. Any ideas?
Re: Nagios Log server stops writing at 96GB limit.
Posted: Mon Oct 03, 2016 12:40 pm
by rkennedy
Your machine is indeed out of memory. A couple of things I'll need from you -
1. How much memory do you have allocated to his machine?
2. Please post a screenshot of your 'Backup & Maintenance' page.
It may just be a matter of closing your indexes sooner, or adding more ram to the machine. The elasticsearch / logstash startup scripts automatically change based on how much memory is available on the machine.
Re: Nagios Log server stops writing at 96GB limit.
Posted: Tue Oct 04, 2016 2:35 am
by mulo
Hi rkennedy,
See below for the requested items.
1.
: total used free shared buff/cache available
Mem: 7.6G 2.9G 725M 48M 4.0G 4.4G
Swap: 2.0G 7.4M 1.9G
2: See attachment.
Thanks!
Re: Nagios Log server stops writing at 96GB limit.
Posted: Tue Oct 04, 2016 9:15 am
by rkennedy
Ack, I forgot to ask - how large are all of your indexes for the past 25 days?
Re: Nagios Log server stops writing at 96GB limit.
Posted: Wed Oct 05, 2016 7:48 am
by mulo
Usually between 2Gb and 5GB.
Re: Nagios Log server stops writing at 96GB limit.
Posted: Wed Oct 05, 2016 9:01 am
by rkennedy
This is going to be the problem. Here's a quick overview of how NLS is working.
- Logs come in, get stored to disk and memory until the index is closed.
- You have 2-5GB per day, for 25 days, so let's look at this on the most minimal side, 2GBx25 days is 50GB.
- The machine only has 8GB of ram. While ES can compress, I usually recommend to our customers a 2x ratio at max. Currently, at your minimum, 50GB / 8GB = 6.25x ratio.
You'll need to either A. close indexes sooner, or B. expand the memory on the machine. Remember, after closing an index you can always re-open it.
Re: Nagios Log server stops writing at 96GB limit.
Posted: Fri Oct 07, 2016 3:58 am
by mulo
What is the down side for closing an index sooner?