For scaling/failover mongodb uses a “replica set” where there is a primary and one or more secondary servers. Primary is used for writes. Secondaries are used for reads. This is pretty much master slave pattern used in SQL programming. If the primary goes down a secondary in the cluster of secondaries takes its place. So the issue of horizontally scaling and failover is taken care of. However, this is not a solution which allows for sharding it seems. A true shard holds only a portion of the entire data, so if the secondary in a replica set is shard how can it qualify as primary when it doesn’t have all of the data needed to service the requests ?
Wouldn't we have to have a replica set for each one of the shards?
This obviously a beginner question so a link that visually or otherwise illustrates how this is done would be helpful.
Your assumption is correct, each shard contains a separate replica set. When a write request comes in, MongoS finds the right shard for it based on the shard key, and the data is written to the Primary of the replica set contained in that shard. This results in write scaling, as a (well chosen) shard key should distribute writes over all your shards.