How to determine root cause for Communication link failure TCP Provider: The specified network name is no longer available?

Jon Jaussi picture Jon Jaussi · May 21, 2013 · Viewed 43.4k times · Source

Here is my latest effort at revising this question. But this time, I am trying to follow the good counsel given by Oded in his article Getting good answers on StackOverflow.

I need to find out how I can determine the root cause for the following error:

Communication link failure

TCP Provider: The specified network name is no longer available

From time to time, I am seeing this error when running a set of SSIS packages. This error can occur when one to many packages are run from:

  1. A SQL Server Agent Job
  2. A batch file
  3. In debug mode from BIDS

The full error message I see is as follows:

SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005. An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "Communication link failure". An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "TCP Provider: The specified network name is no longer available. ".

SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005. An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "Protocol error in TDS stream". An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "Communication link failure". An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "TCP Provider: An existing connection was forcibly closed by the remote host."

This is an overview of how I have designed the ETL process:

  • Two servers
  • Both are virtual machines
  • The SSIS packages run on an application server
  • The SQL Server database lives on a database server

I use an OLE DB connection manager to connect from the SSIS package on the application server to the SQL Server database on the database server.

The packages run as a file system deployment on the application server and not as a database deployment on the database server.

The main reason for this is that the ETL is integrated with a set of tools no found on and drives not accessible to the database server. These tools include Apex Data Loader for Salesforce and pgAdmin III.

So far I cannot consistently reproduce this error. However, this is what I have observed:

  • Failure occurs more frequently during regular business hours
  • Failure occurs less frequently during off hours

For about a two hour period on a Friday morning I was able to successfully reproduce the error on a specific package.

The error occurred during a large data flow if a child package call that precedes the large data flow was enabled.

The error did not occur during the same large data flow if the child package call that precedes the large data flow was disabled.

The child package in question calls back to the database to retrieve a tiny amount of information for use in an email body and then sends the email.

It feels like maybe a resource limit is being exceeded?

Maybe a connection limit?

I am wondering what tools I should be using to try and determine the root cause of the error.

Technical details about the two servers involved are listed below:

SQL Server and Database Server info:
Microsoft SQL Server 2008 R2 (SP1) - 10.50.2500.0 (X64) Jun 17 2011 00:54:03 Copyright (c) Microsoft Corporation Enterprise Edition (64-bit) on Windows NT 6.1 (Build 7601: Service Pack 1) (Hypervisor)

SSIS info:
Microsoft Visual Studio 2008 Version 9.0.30729.1 SP Microsoft .NET Framework Version 3.5 SP1

Application Server info:
OS name: Microsoft Windows Server 2008 R2 Standard Version: 6.1.7601 Service Pack 1 Build 7601

I have researched the error message online and found these, but would really like to get an expert's insight before proceeding:

Any help is appreciated.

Thanks

UPDATE:

Further testing shows that this is not "an SSIS thing" as the same error is seen at the same rate when using SQL Server Management Studio. The complexity of the query does not make the error more or less likely. In an attempt to resolve, we have tried one fix (below):

This was our first attempt. TCP Chimney is now disabled on the Application Server and the Database Server. Testing shows that the same error occurs at the same rate.

So where to go from here? Honestly I am not sure. One seemingly good option remains:

  • Application Server and Database Server SQL Server installations do not exactly match
  • Application Server = SQL Server 2008 (SP1) - 10.0.2531.0 (X64)
  • Database Server = SQL Server 2008 R2 (SP1) - 10.50.2500.0 (X64)

The plan is to upgrade the SQL Server installation on the Application server. Its kind of a hit and hope, but at this point this seems like the best option. Something in my brain tells me that this might be solved by fixing a hardware issue (by that I mean a repair or a replace) and that there might not be anything that hardware and software configuration can do about it.

However, I am still not sure how to go about determining a root cause. I am still left wondering what tools I should be using to diagnose the root cause.

Answer

Serge picture Serge · May 23, 2018

Do you have AV software on Application Server side? If yes, try to disable AV - sometime AV blocks TCP/IP traffic. Issue with "The specified network name is no longer available" was solved by disabling AV here: https://community.spiceworks.com/topic/239423-the-specified-network-name-is-no-longer-available-while-writing-to-shared-dir