number_of_shards

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Locked
rocheryderm
Posts: 69
Joined: Fri Jul 13, 2018 1:09 pm

number_of_shards

Post by rocheryderm »

Hello

My apologies if this is covered elsewhere - if so, I couldn't locate it.

Why is it that on a 4 instance NLS cluster the default is to create 10 shards for every index?

5 primaries and replica of 1 = 10 shards.

Isn't that a colossal waste of disk space? (20%?)

Wouldn't it be more appropriate that the total primary shards was equivalent to the number of instances?

If this is not already covered elsewhere, what is the recommendation?

Mike
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: number_of_shards

Post by scottwilkerson »

It it because the system is designed to be redundant, and if one of the servers in your 4 node cluster fails, you will not lose any data\, the system will still be online.

Not only that, it will automatically re-distribute to create a new replica of any of the missing shards from the failure.

Finally, both the primary and replica shards participate in returning results for queries, which one reason why Nagios Log Server can deliver lightening fast search results when searching through terabytes, or petabytes of logs, the load is distributes across the various machines in the cluster.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: number_of_shards

Post by scottwilkerson »

Adding one more thing about your query
rocheryderm wrote:Isn't that a colossal waste of disk space? (20%?)
On a 2 node cluster the hit is even worse 100%

But as your organization grows to larger cluster sized this diminishes, very similar to having a RAID 5 array.

Additionally, and you turn off live indexes and create snaphots of the data for archiving purposes, these are not duplicated any longer, so you only have the replication in your live environment.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
rocheryderm
Posts: 69
Joined: Fri Jul 13, 2018 1:09 pm

Re: number_of_shards

Post by rocheryderm »

Well, I'm not exactly happy with the answer.

I understand the need for multiple primary shards.

Now that you've confirmed the number of 5, I will manage it down to 4. 1 primary shard per instance seems reasonable to me - if we need to scale out in the future and add more nodes, we can re-evaluate and add more primary shards at that time.

Thank you
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: number_of_shards

Post by scottwilkerson »

You are missing something, the 5 primary shards, are not duplicates of one another. Each primary shard contains 1/5 of the logs in that index. You cannot "manage it down to 4" as this would be missing 1/5 of your data.

It is the BACKUP shards that contain the duplicate of the data.

any number (multiple) of these primary shards for each index may reside on the same server...
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
rocheryderm
Posts: 69
Joined: Fri Jul 13, 2018 1:09 pm

Re: number_of_shards

Post by rocheryderm »

Yes, I get that.

If I have 4 instances/nodes, with 5 primaries and 1 replica, I could have the following data arrangement.

Clock strikes midnight and a new day-based index is created for incoming data. 5 primary shards and their replicas are created.

server1 : P1, P5, R2
server2 : P2, R3
server3 : P3, R4
server4 : P4, R5, R1

Why would I want the extra primary shard and it's replica?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: number_of_shards

Post by scottwilkerson »

rocheryderm wrote:Why would I want the extra primary shard and it's replica?
I guess I don't understand the question.

One thing is that when a server goes offline, or a rebalance needs to occur, only the shards that are affected need to be moved. so more shards breaks things up into smaller pieces preventing the whole day's index needing to be moved/copied.

More shards is actually better to a point, the downside is that each open shard has a slight cost in terms of memory allocated. which is why it isn't split up even more (like 100 shards/index).

5 is the default deemed to have the best balance by the creators of Logstash, and after testing we tend to agree, and it is used in this configuration for Millions of Logstash and Elasticsearch deployments.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked