How to configure MongoDB Java driver MongoOptions for production use?

mongodb production-environment database-performance database-tuning

Dan Polites · Jun 29, 2011 · Viewed 57.8k times · Source

I've been searching the web looking for best practices for configuring MongoOptions for the MongoDB Java driver and I haven't come up with much other than the API. This search started after I ran into the "com.mongodb.DBPortPool$SemaphoresOut: Out of semaphores to get db connection" error and by increasing the connections/multiplier I was able to solve that problem. I'm looking for links to or your best practices in configuring these options for production.

The options for the 2.4 driver include: http://api.mongodb.org/java/2.4/com/mongodb/MongoOptions.html

autoConnectRetry
connectionsPerHost
connectTimeout
maxWaitTime
socketTimeout
threadsAllowedToBlockForConnectionMultiplier

The newer drivers have more options and I would be interested in hearing about those as well.

Answer

Updated to 2.9 :

autoConnectRetry simply means the driver will automatically attempt to reconnect to the server(s) after unexpected disconnects. In production environments you usually want this set to true.
connectionsPerHost are the amount of physical connections a single Mongo instance (it's singleton so you usually have one per application) can establish to a mongod/mongos process. At time of writing the java driver will establish this amount of connections eventually even if the actual query throughput is low (in order words you will see the "conn" statistic in mongostat rise until it hits this number per app server).

There is no need to set this higher than 100 in most cases but this setting is one of those "test it and see" things. Do note that you will have to make sure you set this low enough so that the total amount of connections to your server do not exceed

db.serverStatus().connections.available

In production we currently have this at 40.
connectTimeout. As the name suggest number of milliseconds the driver will wait before a connection attempt is aborted. Set timeout to something long (15-30 seconds) unless there's a realistic, expected chance this will be in the way of otherwise succesful connection attempts. Normally if a connection attempt takes longer than a couple of seconds your network infrastructure isn't capable of high throughput.
maxWaitTime. Number of ms a thread will wait for a connection to become available on the connection pool, and raises an exception if this does not happen in time. Keep default.
socketTimeout. Standard socket timeout value. Set to 60 seconds (60000).
threadsAllowedToBlockForConnectionMultiplier. Multiplier for connectionsPerHost that denotes the number of threads that are allowed to wait for connections to become available if the pool is currently exhausted. This is the setting that will cause the "com.mongodb.DBPortPool$SemaphoresOut: Out of semaphores to get db connection" exception. It will throw this exception once this thread queue exceeds the threadsAllowedToBlockForConnectionMultiplier value. For example, if the connectionsPerHost is 10 and this value is 5 up to 50 threads can block before the aforementioned exception is thrown.

If you expect big peaks in throughput that could cause large queues temporarily increase this value. We have it at 1500 at the moment for exactly that reason. If your query load consistently outpaces the server you should just improve your hardware/scaling situation accordingly.
readPreference. (UPDATED, 2.8+) Used to determine the default read preference and replaces "slaveOk". Set up a ReadPreference through one of the class factory method. A full description of the most common settings can be found at the end of this post
w. (UPDATED, 2.6+) This value determines the "safety" of the write. When this value is -1 the write will not report any errors regardless of network or database errors. WriteConcern.NONE is the appropriate predefined WriteConcern for this. If w is 0 then network errors will make the write fail but mongo errors will not. This is typically referred to as "fire and forget" writes and should be used when performance is more important than consistency and durability. Use WriteConcern.NORMAL for this mode.

If you set w to 1 or higher the write is considered safe. Safe writes perform the write and follow it up by a request to the server to make sure the write succeeded or retrieve an error value if it did not (in other words, it sends a getLastError() command after you write). Note that until this getLastError() command is completed the connection is reserved. As a result of that and the additional command the throughput will be signficantly lower than writes with w <= 0. With a w value of exactly 1 MongoDB guarantees the write succeeded (or verifiably failed) on the instance you sent the write to.

In the case of replica sets you can use higher values for w whcih tell MongoDB to send the write to at least "w" members of the replica set before returning (or more accurately, wait for the replication of your write to "w" members). You can also set w to the string "majority" which tells MongoDB to perform the write to the majority of replica set members (WriteConcern.MAJORITY). Typicall you should set this to 1 unless you need raw performance (-1 or 0) or replicated writes (>1). Values higher than 1 have a considerable impact on write throughput.
fsync. Durability option that forces mongo to flush to disk after each write when enabled. I've never had any durability issues related to a write backlog so we have this on false (the default) in production.
j *(NEW 2.7+)*. Boolean that when set to true forces MongoDB to wait for a successful journaling group commit before returning. If you have journaling enabled you can enable this for additional durability. Refer to http://www.mongodb.org/display/DOCS/Journaling to see what journaling gets you (and thus why you might want to enable this flag).

ReadPreference The ReadPreference class allows you to configure to what mongod instances queries are routed if you are working with replica sets. The following options are available :

ReadPreference.primary() : All reads go to the repset primary member only. Use this if you require all queries to return consistent (the most recently written) data. This is the default.
ReadPreference.primaryPreferred() : All reads go to the repset primary member if possible but may query secondary members if the primary node is not available. As such if the primary becomes unavailable reads become eventually consistent, but only if the primary is unavailable.
ReadPreference.secondary() : All reads go to secondary repset members and the primary member is used for writes only. Use this only if you can live with eventually consistent reads. Additional repset members can be used to scale up read performance although there are limits to the amount of (voting) members a repset can have.
ReadPreference.secondaryPreferred() : All reads go to secondary repset members if any of them are available. The primary member is used exclusively for writes unless all secondary members become unavailable. Other than the fallback to the primary member for reads this is the same as ReadPreference.secondary().
ReadPreference.nearest() : Reads go to the nearest repset member available to the database client. Use only if eventually consistent reads are acceptable. The nearest member is the member with the lowest latency between the client and the various repset members. Since busy members will eventually have higher latencies this should also automatically balance read load although in my experience secondary(Preferred) seems to do so better if member latencies are relatively consistent.

Note : All of the above have tag enabled versions of the same method which return TaggableReadPreference instances instead. A full description of replica set tags can be found here : Replica Set Tags

How to configure MongoDB Java driver MongoOptions for production use?

Answer

Related questions