How do you deal with transport-level errors in SqlConnection?

Eric Z Beard picture Eric Z Beard · Aug 19, 2008 · Viewed 23.6k times · Source

Every now and then in a high volume .NET application, you might see this exception when you try to execute a query:

System.Data.SqlClient.SqlException: A transport-level error has occurred when sending the request to the server.

According to my research, this is something that "just happens" and not much can be done to prevent it. It does not happen as a result of a bad query, and generally cannot be duplicated. It just crops up maybe once every few days in a busy OLTP system when the TCP connection to the database goes bad for some reason.

I am forced to detect this error by parsing the exception message, and then retrying the entire operation from scratch, to include using a new connection. None of that is pretty.

Anybody have any alternate solutions?

Answer

Tim Farley picture Tim Farley · Oct 16, 2008

I posted an answer on another question on another topic that might have some use here. That answer involved SMB connections, not SQL. However it was identical in that it involved a low-level transport error.

What we found was that in a heavy load situation, it was fairly easy for the remote server to time out connections at the TCP layer simply because the server was busy. Part of the reason was the defaults for how many times TCP will retransmit data on Windows weren't appropriate for our situation.

Take a look at the registry settings for tuning TCP/IP on Windows. In particular you want to look at TcpMaxDataRetransmissions and maybe TcpMaxConnectRetransmissions. These default to 5 and 2 respectively, try upping them a little bit on the client system and duplicate the load situation.

Don't go crazy! TCP doubles the timeout with each successive retransmission, so the timeout behavior for bad connections can go exponential on you if you increase these too much. As I recall upping TcpMaxDataRetransmissions to 6 or 7 solved our problem in the vast majority of cases.