Storage

gregwhite · Post by **gregwhite** » Mon Jun 28, 2021 4:21 pm

To help manage storage usage, can de-duplication/and or compression be used? If yes, how would you set that up. Would this be done at the storage level independent of the log server?

Thanks,
Greg

Post by **vtrac** » Tue Jun 29, 2021 11:04 am

Hi Greg,
How are you doing?
There is no mechanism for additional compression, the data in log server is compressed.
However it also contains an index of the data for fast searching which can cause the total size to be a little larger.

Best Regards,
Vinh

gregwhite · Post by **gregwhite** » Tue Jun 29, 2021 12:30 pm

Trying to stay cool!
Thanks. Is there be any way to de-dupe the data since they are flat files? Our manager is concerned that the data will grow exponentially over time since the log server creates a second copy of the primary data for node 2. Each time we add a terabyte of data we have to add 2tb.

Thanks,
Greg

Post by **vtrac** » Tue Jun 29, 2021 2:01 pm

Hi Greg,
Yes, trying to stay cool is a good idea .... haha!!

I checked with development about your question.

Here's what I got from development team:

If they're really concerned about disk space usage, they can take a look at https://assets.nagios.com/downloads/nag ... enance.pdf
specifically the "optimize/close/delete" indices options will change how long the data lives on their servers

For maintaining a large amount of logs, go with closing indexes earlier, and using shared storage for snapshots. Do not, do not, do not use shared storage for indexes. Elasticsearch has a process where if the disk is a certain percentage full, it will stop receiving new data, and expect another node in the cluster to receive the data. If both nodes in the cluster have the same storage, it will fill at the same rate, and data will instead just be dropped.

Hope this helps!!

Regards,
Vinh

gregwhite · Post by **gregwhite** » Tue Jun 29, 2021 4:32 pm

Thank you for your response. I will go through the document you provided a link to.
The question that the developers didn't answer was about using inline de-duplication. Another vendor (not open source) talked about using inline de-duplication so that is where my question originated from. You can do source or target de-duplication. (Deduped either before or after being sent to the log server. De-duplication can significantly reduce storage usage. I didn't know if this was something that the log server could use since the logs are flat files. (If not, maybe a feature request)?

After reading the document you linked to, I will let you know if I have additional questions.

Thanks for your help,
Greg

Post by **vtrac** » Tue Jun 29, 2021 4:59 pm

Great!! ....

gregwhite · Post by **gregwhite** » Tue Jul 06, 2021 5:23 pm

I need to verify if I am understanding how the storage is used. If you have 2TB of actual data. To be replicated you need 2 nodes each with 2 TB of storage. Now if you add more nodes, the shards that make up the the primary and replicated copies get distributed among the 3, 4 or 5 nodes that make up the cluster. It's not that you need 10 TB of data, 2 TB on each of the 5 nodes? The log server only replicates the data once, then distributes the shards across however many nodes you have in the cluster.
Is this correct?

Thanks,
Greg

Post by **vtrac** » Wed Jul 07, 2021 10:27 am

Hi Greg,
Good morning ...

Here are the KB articles I found hopefully will answer any question(s) you might have on Nagios Log Server clusters.

managing indices:
https://assets.nagios.com/downloads/nag ... ndices.pdf

Managing Clusters:
https://assets.nagios.com/downloads/nag ... usters.pdf

Best Regards,
Vinh

gregwhite · Post by **gregwhite** » Wed Jul 07, 2021 12:55 pm

Read both documents and yes it helped in understanding how the shards and indexes worked but my question is still answered. But first, in the document on indices it says "Closing an index means that the log data will no longer be searched in queries and will not be
replicated across instances." - Does this mean that when you close an index the replicated one gets deleted? Does the primary index remain on which ever node it originally resides?

Ok, my primary question was about how much storage the cluster uses. My manager is quite concerned thinking that the data growth is going to be unsustainable and is about to cancel the project. As I understand it, if we have 2TB of data and a 2 node cluster, the 2TB gets replicated therefore using 4TB of data. (2tb per node) If we add a node, the 4TB of data will be distributed across all three nodes, NOT that you need to add additional storage. Each node will have less than 2tb of data. If you add a 4th node the data will be distributed across all 4 nodes. This is assuming that your data has not grown beyond the original 2 TBs. Now if you need to add an additional 1TB, do you double that amount to account for replication and then divide 2tb by the number of nodes and add that amount to each one?
I'm looking at adding nodes and storage as two different tasks. You add nodes to gain performance in your searches and you add storage as your log data increases.
Hope this is clear,
Thanks,
Greg

Post by **vtrac** » Wed Jul 07, 2021 1:36 pm

Hi Greg,
You are correct.

You add nodes to gain performance in your searches and you add storage as your log data increases.

In a 2 instance cluster, for each index the 5 primary shards will exist on one (Master) instance and the 5 replica shards will exist on the other (Replica) instance.

In a 3 or more instance cluster the shards are distributed across the instances as evenly as possible. A replica shard will never exist on the same instance that has the primary shard.

As nodes join or leave a cluster, the cluster automatically reorganizes itself to evenly distribute the data across the available nodes.

Best Regards,
Vinh

Nagios Support Forum

Storage

Storage

Re: Storage

Re: Storage

Re: Storage

Re: Storage

Re: Storage

Re: Storage

Re: Storage

Re: Storage

Re: Storage