KeepAlive with WCF and TCP?

Banshee picture Banshee · Nov 4, 2014 · Viewed 15.5k times · Source

I have a Windows Service hosting an advanced WCF service that communicates over TCP(netTCP) with protobuf.net, some times also with certificates.

The receiveTimeout is set to infinite to never drop the connection due to inactivity. But from what I understand the connection could be dropped anyway so I have created a simple two way keepalive service method that the client is calling every 9 min to keep the connection alive. It's very important that the connection never breaks.

Is this the correct way? Or could I simply remove my keep live because the receiveTimout is set to infinite?

Edit : Current app.config for WCF service : http://1drv.ms/1uEVKIt

Answer

Erik Funkenbusch picture Erik Funkenbusch · Nov 4, 2014

No. This is widely misunderstood, and unfortunately there is much misinformation out there.

First, "Infinite" is a sort of semi-valid value. There is are two special config serializers that convert "Infinite" to either TimeSpan.MaxValue or int.MaxValue (so they're not really "infinite" anyways), but not everything in WCF seems to recognize this. So it's always best to specify your timeouts explicitly with time values.

Second, you don't need a "keepalive" method in your service, since WCF provides what's called a "reliable session". If you add <reliableSession enabled="true" /> then WCF will provide it's own keep alive mechanism through "infrastructure messages".

By having your own "keepalive" mechanism, you're effectively doubling the load on your service and you can actually create more problems than it solves.

Third, when using a reliable session, you use the inactivityTimeout setting of reliableSession. This does two things. First, it controls how frequently infrastructure (keepalive) messages are sent. They are sent at half the timeout value, so if you set it to 18 minutes, then they will be sent every 9 minutes. Secondly, if no infrastructure or operation messages (ie messages that are part of your data contract) are received within the inactivity timeout, the connection is aborted because there has likely been a problem (one side has crashed, there's a network problem, etc..).

receiveTimeout is the maximum amount of time in which no operation messages can be received before the connection is aborted (the default is 10 minutes). Setting this to a large value (Int32.MaxValue is somewhere in the vicinity of 24 days) keeps the connection tacked up, setting inactivityTimeout to a smaller value (again, the default is 10 minutes) (to a time that is smaller than 2x the maximum amount of time before network routers will drop a connection from inactivity) keeps the connection alive.

WCF handles all this for you. You can then simply subscribe to the Connection Aborted messages to know when the connection is dropped for real reasons (app crashes, network timeouts, clients losing power, etc..) and allows you to recreate the connections.

Additionally, if you don't need ordered messages, set ordered="false", as this greatly reduces the overhead of reliable sessions. The default is true.

Note: You may not receive a connection aborted event until the inactivityTimeout has expired (or you try to use the connection). Be aware of this, and set your timeouts accordingly.

Most recommendations on the internet are to set both receiveTimeout and inactivityTimeout to Infinite. This has two problems, first infrastructure messages don't get sent in a timely manner, so routers will drop the connection... forcing you to do your own keepalives. Second, the large inactivity timeout means it won't recognize when a connection legitimately drops, and you have to rely on on that ping aborting to know when a failure occurs. This is all completely unnecessary, and can in fact even make your service even more unreliable.

See also this: How do I correctly configure a WCF NetTcp Duplex Reliable Session?