Nagios Log server stops writing at 96GB limit.

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
mulo
Posts: 6
Joined: Wed Sep 21, 2016 7:48 am

Nagios Log server stops writing at 96GB limit.

Post by mulo »

Hi,

We've encoutered an issue with the Nagios Logserver.
We are running it on one instance and the primary size is around 95GB. We've expanded the filesystem size to 500GB and its visible in the Nagios log server GUI but Nagios Log server stops writing at 96GB.
Any idea how we can fix this?

Thanks in advance!
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Nagios Log server stops writing at 96GB limit.

Post by rkennedy »

What is the output of df -H on the NLS machine?
Former Nagios Employee
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Nagios Log server stops writing at 96GB limit.

Post by Box293 »

You may also be hitting the "watermark" which is explained in this KB article:

https://support.nagios.com/kb/article.php?id=469
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
mulo
Posts: 6
Joined: Wed Sep 21, 2016 7:48 am

Re: Nagios Log server stops writing at 96GB limit.

Post by mulo »

Hi guys,

Unfortunately, Its neither of the above.
The machine is wel passed the 96 GB limit. Now on 126GB and it was working fine till today.
According the logging of elasticsearch:
[2016-10-03 08:11:24,995][DEBUG][action.search.type ] [752da908-2510-4b59-bdd3-b1b48ed5134a] All shards failed for phase: [query]
org.elasticsearch.search.query.QueryPhaseExecutionException: [logstash-2016.10.03][4]: query[ConstantScore(*:*)],from[0],size[0]: Query Failed [Failed to execute global facets]
at org.elasticsearch.search.facet.FacetPhase.execute(FacetPhase.java:193)
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:171)
at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:289)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:300)
at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:231)
at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:228)
at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:559)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.ElasticsearchException: org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [@timestamp] would be larger than limit of [2440298496/2.2gb]
at org.elasticsearch.index.fielddata.plain.AbstractIndexFieldData.load(AbstractIndexFieldData.java:80)
at org.elasticsearch.search.facet.datehistogram.CountDateHistogramFacetExecutor$Collector.setNextReader(CountDateHistogramFacetExecutor.java:88)
at org.elasticsearch.common.lucene.search.FilteredCollector.setNextReader(FilteredCollector.java:67)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)
at org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
at org.elasticsearch.search.facet.FacetPhase.execute(FacetPhase.java:186)
... 9 more

This looks like an memory error, but I cant find the specific config file. Any ideas?
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Nagios Log server stops writing at 96GB limit.

Post by rkennedy »

Your machine is indeed out of memory. A couple of things I'll need from you -
1. How much memory do you have allocated to his machine?
2. Please post a screenshot of your 'Backup & Maintenance' page.

It may just be a matter of closing your indexes sooner, or adding more ram to the machine. The elasticsearch / logstash startup scripts automatically change based on how much memory is available on the machine.
Former Nagios Employee
mulo
Posts: 6
Joined: Wed Sep 21, 2016 7:48 am

Re: Nagios Log server stops writing at 96GB limit.

Post by mulo »

Hi rkennedy,

See below for the requested items.
1.
: total used free shared buff/cache available
Mem: 7.6G 2.9G 725M 48M 4.0G 4.4G
Swap: 2.0G 7.4M 1.9G

2: See attachment.

Thanks!
You do not have the required permissions to view the files attached to this post.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Nagios Log server stops writing at 96GB limit.

Post by rkennedy »

Ack, I forgot to ask - how large are all of your indexes for the past 25 days?
Former Nagios Employee
mulo
Posts: 6
Joined: Wed Sep 21, 2016 7:48 am

Re: Nagios Log server stops writing at 96GB limit.

Post by mulo »

Usually between 2Gb and 5GB.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Nagios Log server stops writing at 96GB limit.

Post by rkennedy »

This is going to be the problem. Here's a quick overview of how NLS is working.

- Logs come in, get stored to disk and memory until the index is closed.
- You have 2-5GB per day, for 25 days, so let's look at this on the most minimal side, 2GBx25 days is 50GB.
- The machine only has 8GB of ram. While ES can compress, I usually recommend to our customers a 2x ratio at max. Currently, at your minimum, 50GB / 8GB = 6.25x ratio.

You'll need to either A. close indexes sooner, or B. expand the memory on the machine. Remember, after closing an index you can always re-open it.
Former Nagios Employee
mulo
Posts: 6
Joined: Wed Sep 21, 2016 7:48 am

Re: Nagios Log server stops writing at 96GB limit.

Post by mulo »

What is the down side for closing an index sooner?
Locked