CircuitBreakingException with >30days indexes open

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Locked
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

CircuitBreakingException with >30days indexes open

Post by vAJ »

Tried having more than 30 days of indexes open on our 4-node cluster at the request of our engineering team. Previous 2-node cluster couldn't handle more than 14 days. Really hoping there's something more I can do in perf tweaks on ES to be able to crunch more data on this 4-node cluster.

Code: Select all

[2018-05-24 16:41:02,462][DEBUG][action.search.type       ] [41b94e87-cf97-48ac-a5c6-ed795f9e33f2] All shards failed for phase: [query]
org.elasticsearch.transport.RemoteTransportException: [a1d229b5-12fa-4c5e-8d5b-746ece0d27aa][inet[/10.50.30.107:9300]][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.ElasticsearchException: org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [host.raw] would be larger than limit of [13450084352/12.5gb]
        at org.elasticsearch.index.fielddata.plain.AbstractIndexFieldData.load(AbstractIndexFieldData.java:80)
        at org.elasticsearch.search.aggregations.support.ValuesSource$MetaData.load(ValuesSource.java:88)
        at org.elasticsearch.search.aggregations.support.AggregationContext.bytesField(AggregationContext.java:180)
        at org.elasticsearch.search.aggregations.support.AggregationContext.valuesSource(AggregationContext.java:143)
        at org.elasticsearch.search.aggregations.support.ValuesSourceAggregatorFactory.create(ValuesSourceAggregatorFactory.java:53)
        at org.elasticsearch.search.aggregations.AggregatorFactories.createAndRegisterContextAware(AggregatorFactories.java:53)
        at org.elasticsearch.search.aggregations.AggregatorFactories.createTopLevelAggregators(AggregatorFactories.java:157)
        at org.elasticsearch.search.aggregations.AggregationPhase.preProcess(AggregationPhase.java:79)
        at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:100)
        at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:301)
        at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:312)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:776)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:767)
        at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1152)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.common.util.concurrent.UncheckedExecutionException: org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [host.raw] would be larger than limit of [13450084352/12.5gb]
        at org.elasticsearch.common.cache.LocalCache$Segment.get(LocalCache.java:2203)
        at org.elasticsearch.common.cache.LocalCache.get(LocalCache.java:3937)
        at org.elasticsearch.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4739)
        at org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache.load(IndicesFieldDataCache.java:167)
        at org.elasticsearch.index.fielddata.plain.AbstractIndexFieldData.load(AbstractIndexFieldData.java:74)
        ... 17 more
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [host.raw] would be larger than limit of [13450084352/12.5gb]
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.circuitBreak(ChildMemoryCircuitBreaker.java:97)
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:148)
        at org.elasticsearch.index.fielddata.plain.PagedBytesIndexFieldData$PagedBytesEstimator.beforeLoad(PagedBytesIndexFieldData.java:217)
        at org.elasticsearch.index.fielddata.plain.PagedBytesIndexFieldData.loadDirect(PagedBytesIndexFieldData.java:89)
        at org.elasticsearch.index.fielddata.plain.PagedBytesIndexFieldData.loadDirect(PagedBytesIndexFieldData.java:43)
        at org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache$1.call(IndicesFieldDataCache.java:180)
        at org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache$1.call(IndicesFieldDataCache.java:167)
        at org.elasticsearch.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4742)
        at org.elasticsearch.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
        at org.elasticsearch.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
        at org.elasticsearch.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
        at org.elasticsearch.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
        ... 21 more

Code: Select all

Date       # Docs	      Index Size (GB)
2018.04.24	80,123,435	23.80
2018.04.25	79,154,864	23.30
2018.04.26	75,573,485	21.90
2018.04.27	72,603,365	20.90
2018.04.28	13,752,469	3.90
2018.04.29	11,210,996	3.30
2018.04.30	34,017,591	12.00
2018.05.01	30,140,906	10.70
2018.05.02	27,746,199	9.80
2018.05.03	21,758,601	7.60
2018.05.04	31,733,987	12.10
2018.05.05	8,342,020	2.90
2018.05.06	8,754,301	3.10
2018.05.07	39,043,424	15.10
2018.05.08	37,447,506	14.40
2018.05.09	36,286,876	13.70
2018.05.10	39,211,018	14.60
2018.05.11	34,709,762	12.40
2018.05.12	11,404,047	3.50
2018.05.13	11,151,603	3.30
2018.05.14	44,699,775	16.70
2018.05.15	46,745,086	17.50
2018.05.16	33,705,539	12.10
2018.05.17	30,380,948	10.90
2018.05.18	27,627,349	10.10
2018.05.19	8,345,161 	2.80
2018.05.20	7,643,582 	2.60
2018.05.21	36,876,143	13.70
2018.05.22	36,335,297	13.20
2018.05.23	37,057,332	13.10
2018.05.24	17,505,744	 6.60
Nodes are all 8cpu/64GB
Andrew J. - Do you even grok?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: CircuitBreakingException with >30days indexes open

Post by scottwilkerson »

You should be able to counter this by setting the following in
/usr/local/nagioslogserver/elasticsearch/config/elasticsearch.yml
and then restarting elasticsearch on each server

Code: Select all

indices.fielddata.cache.size:  20%
reference: https://www.elastic.co/guide/en/elastic ... usage.html
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: CircuitBreakingException with >30days indexes open

Post by vAJ »

Thanks, Scott!

Checking my config, I realized that this setting was only on one node (we recently rebuilt the cluster with 3 other new servers). Getting that set across the board and restarting appears to be allowing for large search parameters. Still a little slow in building the charts, but that's livable.

I'll keep an eye on it, but we can close this thread for now.

Eagerly awaiting your next release... ;)
Andrew J. - Do you even grok?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: CircuitBreakingException with >30days indexes open

Post by scottwilkerson »

vAJ wrote:Thanks, Scott!

Checking my config, I realized that this setting was only on one node (we recently rebuilt the cluster with 3 other new servers). Getting that set across the board and restarting appears to be allowing for large search parameters. Still a little slow in building the charts, but that's livable.

I'll keep an eye on it, but we can close this thread for now.

Eagerly awaiting your next release... ;)
Excellent! Glad to be of assistance.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked