Database sharding vs partitioning

Amit Sharma picture Amit Sharma · Dec 25, 2013 · Viewed 89.5k times · Source

I have been reading about scalable architectures recently. In that context, two words that keep on showing up with regards to databases are sharding and partitioning. I looked up descriptions but still ended up confused.

Could the experts at stackoverflow help me get the basics right?

  • What is the difference between sharding and partitioning ?
  • Is it true that 'all sharded databases are essentially partitioned (over different nodes), but all partitioned databases are not necessarily sharded' ?

Answer

Canis picture Canis · Dec 25, 2013

Partitioning is more a generic term for dividing data across tables or databases. Sharding is one specific type of partitioning, part of what is called horizontal partitioning.

Here you replicate the schema across (typically) multiple instances or servers, using some kind of logic or identifier to know which instance or server to look for the data. An identifier of this kind is often called a "Shard Key".

A common, key-less logic is to use the alphabet to divide the data. A-D is instance 1, E-G is instance 2 etc. Customer data is well suited for this, but will be somewhat misrepresented in size across instances if the partitioning does not take in to account that some letters are more common than others.

Another common technique is to use a key-synchronization system or logic that ensures unique keys across the instances.

A well known example you can study is how Instagram solved their partitioning in the early days (see link below). They started out partitioned on very few servers, using Postgres to divide the data from the get-go. I believe it was several thousand logical shards on those few physical shards. Read their awesome writeup from 2012 here: Instagram Engineering - Sharding & IDs

See here as well: http://www.quora.com/Whats-the-difference-between-sharding-and-partition