Does mongoDB have reconnect issues or am i doing it wrong?

rob_james picture rob_james · Dec 20, 2012 · Viewed 23.1k times · Source

I'm using nodejs and a mongoDB - and I'm having some connection issues.

Well, actually "wake" issues! It connects perfectly well - is super fast and I'm generally happy with the results.

My problem: If i don't use the connection for a while (i say while, because the timeframe varies 5+ mins) it seems to stall. I don't get disconnection events fired - it just hangs.

Eventually i get a response like Error: failed to connect to [ * .mongolab.com: * ] - ( * = masked values)

A quick restart of the app, and the connection's great again. Sometimes, if i don't restart the app, i can refresh and it reconnects happily.

This is why i think it is "wake" issues.

Rough outline of code:

I've not included the code - I don't think it's needed. It works (apart from the connection dropout)

Things to note: There is just the one "connect" - i never close it. I never reopen.

I'm using mongoose, socketio.

/* constants */

var mongoConnect = 'myworkingconnectionstring-includingDBname';


/* includes */

/* settings */

/* Schema */

var db = mongoose.connect(mongoConnect);

    /* Socketio */

io.configure(function (){
    io.set('authorization', function (handshakeData, callback) {

    });
});

io.sockets.on('connection', function (socket) {

});//sockets

io.sockets.on('disconnect', function(socket) {
    console.log('socket disconnection')
});

/* The Routing */

app.post('/login', function(req, res){  

});

app.get('/invited', function(req, res){

});

app.get('/', function(req, res){

});

app.get('/logout', function(req, res){

});

app.get('/error', function(req, res){

});

server.listen(port);
console.log('Listening on port '+port);

db.connection.on('error', function(err) {
    console.log("DB connection Error: "+err);
});
db.connection.on('open', function() {
    console.log("DB connected");
});
db.connection.on('close', function(str) {
    console.log("DB disconnected: "+str);
});

I have tried various configs here, like opening and closing all the time - I believe though, the general consensus is to do as i am with one open wrapping the lot. ??

I have tried a connection tester, that keeps checking the status of the connection... even though this appears to say everthing's ok - the issue still happens.

I have had this issue from day one. I have always hosted the MongoDB with MongoLab. The problem appears to be worse on localhost. But i still have the issue on Azure and now nodejit.su.

As it happens everywhere - it must be me, MongoDB, or mongolab.

Incidentally i have had a similar experience with the php driver too. (to confirm this is on nodejs though)

It would be great for some help - even if someone just says "this is normal"

thanks in advance

Rob

Answer

jared picture jared · Jan 19, 2013

UPDATE: Our support article for this topic (essentially a copy of this post) has moved to our connection troubleshooting doc.

There is a known issue that the Azure IaaS network enforces an idle timeout of roughly thirteen minutes (empirically arrived at). We are working with Azure to see if we can't make things more user-friendly, but in the meantime others have had success by configuring their driver options to work around the issue.

Max connection idle time

The most effective workaround we've found in working with Azure and our customers has been to set the max connection idle time below four minutes. The idea is to make the driver recycle idle connections before the firewall forces the issue. For example, one customer, who is using the C# driver, set MongoDefaults.MaxConnectionIdleTime to one minute and it cleared up their issues.

MongoDefaults.MaxConnectionIdleTime = TimeSpan.FromMinutes(1);

The application code itself didn't change, but now behind the scenes the driver aggressively recycles idle connections. The result can be seen in the server logs as well: lots of connection churn during idle periods in the app.

There are more details on this approach in the related mongo-user thread, SocketException using C# driver on azure.

Keepalive

You can also work around the issue by making your connections less idle with some kind of keepalive. This is a little tricky to implement unless your driver supports it out of the box, usually by taking advantage of TCP Keepalive. If you need to roll your own, make sure to grab each idle connection from the pool every couple minutes and issue some simple and cheap command, probably a ping.

Handling disconnects

Disconnects can happen from time to time even without an aggressive firewall setup. Before you get into production you want to be sure to handle them correctly.

First, be sure to enable auto reconnect. How to do so varies from driver to driver, but when the driver detects that an operation failed because the connection was bad turning on auto reconnect tells the driver to attempt to reconnect.

But this doesn't completely solve the problem. You still have the issue of what to do with the failed operation that triggered the reconnect. Auto reconnect doesn't automatically retry failed operations. That would be dangerous, especially for writes. So usually an exception is thrown and the app is asked to handle it. Often retrying reads is a no-brainer. But retrying writes should be carefully considered.

The mongo shell session below demonstrates the issue. The mongo shell by default has auto reconnect enabled. I insert a document in a collection named stuff then find all the documents in that collection. I then set a timer for thirty minutes and tried the same find again. It failed, but the shell automatically reconnected and when I immediately retried my find it worked as expected.

% mongo ds012345.mongolab.com:12345/mydatabase -u *** -p *** 
MongoDB shell version: 2.2.2 
connecting to: ds012345.mongolab.com:12345/mydatabase 
> db.stuff.insert({}) 
> db.stuff.find() 
{ "_id" : ObjectId("50f9b77c27b2e67041fd2245") } 
> db.stuff.find() 
Fri Jan 18 13:29:28 Socket recv() errno:60 Operation timed out 192.168.1.111:12345 
Fri Jan 18 13:29:28 SocketException: remote: 192.168.1.111:12345 error: 9001 socket exception [1] server [192.168.1.111:12345] 
Fri Jan 18 13:29:28 DBClientCursor::init call() failed 
Fri Jan 18 13:29:28 query failed : mydatabase.stuff {} to: ds012345.mongolab.com:12345 
Error: error doing query: failed 
Fri Jan 18 13:29:28 trying reconnect to ds012345.mongolab.com:12345 
Fri Jan 18 13:29:28 reconnect ds012345.mongolab.com:12345 ok 
> db.stuff.find() 
{ "_id" : ObjectId("50f9b77c27b2e67041fd2245") }

We're here to help

Of course, if you have any questions please feel free to contact us at [email protected]. We're here to help.