Nagios Log server stops writing at 96GB limit.

mulo · Post by **mulo** » Wed Sep 21, 2016 7:58 am

Hi,

We've encoutered an issue with the Nagios Logserver.
We are running it on one instance and the primary size is around 95GB. We've expanded the filesystem size to 500GB and its visible in the Nagios log server GUI but Nagios Log server stops writing at 96GB.
Any idea how we can fix this?

Thanks in advance!

rkennedy · Post by **rkennedy** » Wed Sep 21, 2016 11:17 am

What is the output of df -H on the NLS machine?

Post by **Box293** » Wed Sep 21, 2016 6:34 pm

You may also be hitting the "watermark" which is explained in this KB article:

https://support.nagios.com/kb/article.php?id=469

mulo · Post by **mulo** » Mon Oct 03, 2016 3:26 am

Hi guys,

Unfortunately, Its neither of the above.
The machine is wel passed the 96 GB limit. Now on 126GB and it was working fine till today.
According the logging of elasticsearch:
[2016-10-03 08:11:24,995][DEBUG][action.search.type ] [752da908-2510-4b59-bdd3-b1b48ed5134a] All shards failed for phase: [query]
org.elasticsearch.search.query.QueryPhaseExecutionException: [logstash-2016.10.03][4]: query[ConstantScore(*:*)],from[0],size[0]: Query Failed [Failed to execute global facets]
at org.elasticsearch.search.facet.FacetPhase.execute(FacetPhase.java:193)
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:171)
at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:289)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:300)
at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:231)
at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:228)
at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:559)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.ElasticsearchException: org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [@timestamp] would be larger than limit of [2440298496/2.2gb]
at org.elasticsearch.index.fielddata.plain.AbstractIndexFieldData.load(AbstractIndexFieldData.java:80)
at org.elasticsearch.search.facet.datehistogram.CountDateHistogramFacetExecutor$Collector.setNextReader(CountDateHistogramFacetExecutor.java:88)
at org.elasticsearch.common.lucene.search.FilteredCollector.setNextReader(FilteredCollector.java:67)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)
at org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
at org.elasticsearch.search.facet.FacetPhase.execute(FacetPhase.java:186)
... 9 more

This looks like an memory error, but I cant find the specific config file. Any ideas?

rkennedy · Post by **rkennedy** » Mon Oct 03, 2016 12:40 pm

Your machine is indeed out of memory. A couple of things I'll need from you -
1. How much memory do you have allocated to his machine?
2. Please post a screenshot of your 'Backup & Maintenance' page.

It may just be a matter of closing your indexes sooner, or adding more ram to the machine. The elasticsearch / logstash startup scripts automatically change based on how much memory is available on the machine.

mulo · Post by **mulo** » Tue Oct 04, 2016 2:35 am

Hi rkennedy,

See below for the requested items.
1.
: total used free shared buff/cache available
Mem: 7.6G 2.9G 725M 48M 4.0G 4.4G
Swap: 2.0G 7.4M 1.9G

2: See attachment.

Thanks!

rkennedy · Post by **rkennedy** » Tue Oct 04, 2016 9:15 am

Ack, I forgot to ask - how large are all of your indexes for the past 25 days?

mulo · Post by **mulo** » Wed Oct 05, 2016 7:48 am

Usually between 2Gb and 5GB.

rkennedy · Post by **rkennedy** » Wed Oct 05, 2016 9:01 am

This is going to be the problem. Here's a quick overview of how NLS is working.

- Logs come in, get stored to disk and memory until the index is closed.
- You have 2-5GB per day, for 25 days, so let's look at this on the most minimal side, 2GBx25 days is 50GB.
- The machine only has 8GB of ram. While ES can compress, I usually recommend to our customers a 2x ratio at max. Currently, at your minimum, 50GB / 8GB = 6.25x ratio.

You'll need to either A. close indexes sooner, or B. expand the memory on the machine. Remember, after closing an index you can always re-open it.

mulo · Post by **mulo** » Fri Oct 07, 2016 3:58 am

What is the down side for closing an index sooner?

Nagios Support Forum

Nagios Log server stops writing at 96GB limit.

Nagios Log server stops writing at 96GB limit.

Re: Nagios Log server stops writing at 96GB limit.

Re: Nagios Log server stops writing at 96GB limit.

Re: Nagios Log server stops writing at 96GB limit.

Re: Nagios Log server stops writing at 96GB limit.

Re: Nagios Log server stops writing at 96GB limit.

Re: Nagios Log server stops writing at 96GB limit.

Re: Nagios Log server stops writing at 96GB limit.

Re: Nagios Log server stops writing at 96GB limit.

Re: Nagios Log server stops writing at 96GB limit.