I am looking at http://mongodb.github.io/node-mongodb-native/driver-articles/mongoclient.html and when you scroll to the section "A replicaset connect using no acknowledgment by default and readPreference for secondary"
it stated a connection string to replica set like this:
MongoClient.connect("mongodb://localhost:30000,localhost:30001/integration_test_?w=0&readPreference=secondary", function(err, db) {
}
I do not understand why we need to specify 2 hosts
. I thought the MongoDB documentation already stated that the replica set is transparent to client. That means, the client just need to connect to the primary replica set and MongoDB will do the job. Hence, the connection should just contain 1 host. MongoDB doc stated that there must be at least 3 hosts
in a replica set and this connection string only specified 2 hosts
.
In addition, why the connection string is not stating the "replicaSet" ?
The multiple servers in the connection string serve as a seed list for discovering the connection mode. You are correct in that you could just specify the primary server and things would work perfectly. That is, until the primary server goes down or is very busy. By specifying multiple machines in the connection string, you give the client more than one location to query for the replica set configuration.
When the connection mode resolves to a replica set (see more below), the driver will find the primary server even if it is not in the seed list, as long as at least one of the servers in the seed list responds (the response will contain the full replica set and the name of the current primary). In addition, other secondaries will also be discovered and added (or removed) into the mix automatically, even after initial connection. This will enable you to add and remove servers from the replica set and the driver will handle the changes automatically.
To answer your final question, because specifying multiple servers is ambiguous as to whether or not it is a replica set or multiple mongos (in a sharded setup), the driver will go through a discovery phase of connecting to the servers to determine their type. This has a little overhead at connection time and can be avoided by specifying a connection mode in the connection string - hence the replicaSet
keyword. So while it is not necessary, it can speed up your connection times to explicitly state the servers are in a replica set in the connection string.