scaling issues/too many indices

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Locked
jvestrum
Posts: 46
Joined: Tue Mar 03, 2015 10:45 am

scaling issues/too many indices

Post by jvestrum »

We have been trying to pull in 5 years worth of old logs, and keep running into scaling issues. We've worked past some of them - the memlock ulimit, php running out of memory, backups getting stuck. The current issue is elasticsearch running out of file descriptors. We've raised the "nofile" ulimit to 262144 and elasticsearch still runs out of file descriptors and crashes. So we had to drop the past indicies to get it to respond again.

It seems like all these problems lead back to having too many indices. Is there some way we can go to 1 index per month instead of 1 per day? Has anyone else been successful in pulling in several years of logs? It's not all that much data, in the tens of GB.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: scaling issues/too many indices

Post by jolson »

How are you importing this information? Would you be alright with one Index with todays date holding all of your data, or does it need to be sorted by date?
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
jvestrum
Posts: 46
Joined: Tue Mar 03, 2015 10:45 am

Re: scaling issues/too many indices

Post by jvestrum »

jolson wrote:How are you importing this information? Would you be alright with one Index with todays date holding all of your data, or does it need to be sorted by date?
We are using a logstash agent to ship the logs into elasticsearch, applying some custom grok filters along the way.

It should be okay having all old logs go into one index, but the dates/timestamps on each record do need to be preserved. And I think that will break date-based filtering from the NLS web interface, because of how it builds the queries. For this past data we mostly plan to mine it directly with our own elasticsearch queries so that might be okay.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: scaling issues/too many indices

Post by jolson »

According to Elastic:
The date filter is especially important for sorting events and for backfilling old data. If you don’t get the date correct in your event, then searching for them later will likely sort out of order.
In the absence of this filter, logstash will choose a timestamp based on the first time it sees the event (at input time), if the timestamp is not already set in the event. For example, with file input, the timestamp is set to the time of each read.
I am assuming that your timestamp field is currently being set by a 'date' filter. In the absence of a date filter, your logs will be stamped with the arrival time, as opposed to the date present in the log. In short order you could also write some grok that would allow you to parse out the date into a different field for sorting purposes.

If that doesn't work for you, let us know and we can try to come up with a different way to approach this.
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
jvestrum
Posts: 46
Joined: Tue Mar 03, 2015 10:45 am

Re: scaling issues/too many indices

Post by jvestrum »

Ideally I'd like our past, present, and future data to all "look the same", with standard timestamps and arranged into consistent, logical indices. What we're trying now is re-indexing the data into one-per-month indices named logstash-YYYY.MM. I found the config option in the Dashboard interface where I can set the timestamping interval to monthly, and that seems to work - queries are using the new indexes. I've also lowered the primary shards per index to 2 (with 1 replica). I'll update later on the status of the reindexing.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: scaling issues/too many indices

Post by jolson »

Sounds good - be sure to let us know. Thank you!

For anyone who is looking over this post, the option that jvestrum is using can be found in 'Dashboards -> Configure Dashboard -> Index':
2015-06-09 15_43_57-Dashboard • Nagios Log Server - Firefox Developer Edition.png
You do not have the required permissions to view the files attached to this post.
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
Locked