Hi,
We've encoutered an issue with the Nagios Logserver.
We are running it on one instance and the primary size is around 95GB. We've expanded the filesystem size to 500GB and its visible in the Nagios log server GUI but Nagios Log server stops writing at 96GB.
Any idea how we can fix this?
Thanks in advance!
Nagios Log server stops writing at 96GB limit.
Re: Nagios Log server stops writing at 96GB limit.
What is the output of df -H on the NLS machine?
Former Nagios Employee
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Nagios Log server stops writing at 96GB limit.
You may also be hitting the "watermark" which is explained in this KB article:
https://support.nagios.com/kb/article.php?id=469
https://support.nagios.com/kb/article.php?id=469
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Nagios Log server stops writing at 96GB limit.
Hi guys,
Unfortunately, Its neither of the above.
The machine is wel passed the 96 GB limit. Now on 126GB and it was working fine till today.
According the logging of elasticsearch:
[2016-10-03 08:11:24,995][DEBUG][action.search.type ] [752da908-2510-4b59-bdd3-b1b48ed5134a] All shards failed for phase: [query]
org.elasticsearch.search.query.QueryPhaseExecutionException: [logstash-2016.10.03][4]: query[ConstantScore(*:*)],from[0],size[0]: Query Failed [Failed to execute global facets]
at org.elasticsearch.search.facet.FacetPhase.execute(FacetPhase.java:193)
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:171)
at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:289)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:300)
at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:231)
at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:228)
at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:559)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.ElasticsearchException: org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [@timestamp] would be larger than limit of [2440298496/2.2gb]
at org.elasticsearch.index.fielddata.plain.AbstractIndexFieldData.load(AbstractIndexFieldData.java:80)
at org.elasticsearch.search.facet.datehistogram.CountDateHistogramFacetExecutor$Collector.setNextReader(CountDateHistogramFacetExecutor.java:88)
at org.elasticsearch.common.lucene.search.FilteredCollector.setNextReader(FilteredCollector.java:67)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)
at org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
at org.elasticsearch.search.facet.FacetPhase.execute(FacetPhase.java:186)
... 9 more
This looks like an memory error, but I cant find the specific config file. Any ideas?
Unfortunately, Its neither of the above.
The machine is wel passed the 96 GB limit. Now on 126GB and it was working fine till today.
According the logging of elasticsearch:
[2016-10-03 08:11:24,995][DEBUG][action.search.type ] [752da908-2510-4b59-bdd3-b1b48ed5134a] All shards failed for phase: [query]
org.elasticsearch.search.query.QueryPhaseExecutionException: [logstash-2016.10.03][4]: query[ConstantScore(*:*)],from[0],size[0]: Query Failed [Failed to execute global facets]
at org.elasticsearch.search.facet.FacetPhase.execute(FacetPhase.java:193)
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:171)
at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:289)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:300)
at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:231)
at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:228)
at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:559)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.ElasticsearchException: org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [@timestamp] would be larger than limit of [2440298496/2.2gb]
at org.elasticsearch.index.fielddata.plain.AbstractIndexFieldData.load(AbstractIndexFieldData.java:80)
at org.elasticsearch.search.facet.datehistogram.CountDateHistogramFacetExecutor$Collector.setNextReader(CountDateHistogramFacetExecutor.java:88)
at org.elasticsearch.common.lucene.search.FilteredCollector.setNextReader(FilteredCollector.java:67)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)
at org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
at org.elasticsearch.search.facet.FacetPhase.execute(FacetPhase.java:186)
... 9 more
This looks like an memory error, but I cant find the specific config file. Any ideas?
Re: Nagios Log server stops writing at 96GB limit.
Your machine is indeed out of memory. A couple of things I'll need from you -
1. How much memory do you have allocated to his machine?
2. Please post a screenshot of your 'Backup & Maintenance' page.
It may just be a matter of closing your indexes sooner, or adding more ram to the machine. The elasticsearch / logstash startup scripts automatically change based on how much memory is available on the machine.
1. How much memory do you have allocated to his machine?
2. Please post a screenshot of your 'Backup & Maintenance' page.
It may just be a matter of closing your indexes sooner, or adding more ram to the machine. The elasticsearch / logstash startup scripts automatically change based on how much memory is available on the machine.
Former Nagios Employee
Re: Nagios Log server stops writing at 96GB limit.
Hi rkennedy,
See below for the requested items.
1.
: total used free shared buff/cache available
Mem: 7.6G 2.9G 725M 48M 4.0G 4.4G
Swap: 2.0G 7.4M 1.9G
2: See attachment.
Thanks!
See below for the requested items.
1.
: total used free shared buff/cache available
Mem: 7.6G 2.9G 725M 48M 4.0G 4.4G
Swap: 2.0G 7.4M 1.9G
2: See attachment.
Thanks!
You do not have the required permissions to view the files attached to this post.
Re: Nagios Log server stops writing at 96GB limit.
Ack, I forgot to ask - how large are all of your indexes for the past 25 days?
Former Nagios Employee
Re: Nagios Log server stops writing at 96GB limit.
Usually between 2Gb and 5GB.
Re: Nagios Log server stops writing at 96GB limit.
This is going to be the problem. Here's a quick overview of how NLS is working.
- Logs come in, get stored to disk and memory until the index is closed.
- You have 2-5GB per day, for 25 days, so let's look at this on the most minimal side, 2GBx25 days is 50GB.
- The machine only has 8GB of ram. While ES can compress, I usually recommend to our customers a 2x ratio at max. Currently, at your minimum, 50GB / 8GB = 6.25x ratio.
You'll need to either A. close indexes sooner, or B. expand the memory on the machine. Remember, after closing an index you can always re-open it.
- Logs come in, get stored to disk and memory until the index is closed.
- You have 2-5GB per day, for 25 days, so let's look at this on the most minimal side, 2GBx25 days is 50GB.
- The machine only has 8GB of ram. While ES can compress, I usually recommend to our customers a 2x ratio at max. Currently, at your minimum, 50GB / 8GB = 6.25x ratio.
You'll need to either A. close indexes sooner, or B. expand the memory on the machine. Remember, after closing an index you can always re-open it.
Former Nagios Employee
Re: Nagios Log server stops writing at 96GB limit.
What is the down side for closing an index sooner?