Oracle Data Provider for .NET: Connection request timed out

DjD picture DjD · Sep 27, 2015 · Viewed 29.6k times · Source

We have a C# WCF web service hosted on Windows 2008 SP2/IIS 7 accessing an Oracle database. Usually data access works fine but during load testing, it often times out and logs and exception saying:

Error occurred when processing XXXXXXXX Web Service
Oracle.DataAccess.Client.OracleException Connection request timed out at Oracle.DataAccess.Client.OracleException.HandleErrorHelper(Int32 errCode, OracleConnection conn, IntPtr opsErrCtx, OpoSqlValCtx* pOpoSqlValCtx, Object src, String procedure, Boolean bCheck)
   at Oracle.DataAccess.Client.OracleException.HandleError(Int32 errCode, OracleConnection conn, IntPtr opsErrCtx, Object src)
   at Oracle.DataAccess.Client.OracleConnection.Open()
   at MyWorkspace.WorkForceDataAccess.CheckStaffIdInRSW()
   at MyWorkspace.MyClass.MyFunction(MyDataType MyData)

To query the database, we use something like this:

OracleConnection orConn = new OracleConnection();
orConn.ConnectionString = "user id=xxx; password=xxx; Connection Timeout=600; Max Pool Size=150; data source= (DESCRIPTION =(ADDRESS = (PROTOCOL = TCP)(HOST = MYHOST.MYDOMAIN.com)(PORT = 1771)) (CONNECT_DATA =(SERVER = DEDICATED)(SERVICE_NAME = MYSERVICE.MYDOMAIN.com)))";
orConn.Open();

using (var cmd = new OracleCommand("MY_UTIL.check_StaffIdInRSW", orConn) { CommandType = CommandType.StoredProcedure })
{
    cmd.Parameters.Add("P_Staff_Id", OracleDbType.Int32);
    cmd.Parameters["P_Staff_Id"].Direction = ParameterDirection.Input;
    cmd.Parameters["P_Staff_Id"].Value = Convert.ToInt32(MyDataObject.StaffId);

    cmd.Parameters.Add("P_retvalue", OracleDbType.Int32);
    cmd.Parameters["P_retvalue"].Direction = ParameterDirection.Output;

    cmd.ExecuteNonQuery(); // Execute the function

    //obtain result
    returnVal = int.Parse(cmd.Parameters["P_retvalue"].Value.ToString());
}

I am pretty confident that the stored procedure that is being invoked is not taking all the time. It is a pretty simple procedure that quickly checks if the P_Staff_Id exists in the table and returns the result.

Additionally, this occurs only during load testing. During normal operations things are fine but during heavy loads with 1 message per second, this occurs after running smooth for some time.

As a workaround, I have added "Connection Timeout=600; Max Pool Size=150“ to the connection string, but that did not fix the issue.

We have the same application running on a development server and it works fine. We never encountered this problem there.

Any suggestions as to what to try would be appreciated. It looks like I am running out of options.

Answer

R. Sridharan picture R. Sridharan · Mar 10, 2016

We had a similar issue, and it took a while to debug this and fix this. Our code on getting stressed with many input files, and many threads processing, each thread using Entity framework and opening Oracle db connection, and doing a bunch of db queries and inserts, used to file occasionally. But works most of the time.

I modified out DbContext constructor to explicitly open the OracleConnection. I added some code like this

for (i = 0; i < 5; i++)
   try {
       oracleConnection.Open();
   } catch (OracleException) {
     Sleep for 15 ms and retry. 
     On last attempt I also do OracleConnection.ClearAllPools()
   }

It improved, but still didn't solve it completely. I broke in the catch from debugger, and saw that many threads are trying to open and few threads are processing away. On Open in Oracle stack, Oracle for its internal purpose does ThreadPool.QueueUserWorkItem and waits for its completion. I can see on top of stack its wait. Here plenty of pooled connections are available (default is 100), I am hardly using 10. So it is not out of resource.

But the issue is in our code also we used ThreadPool.QueueUserWorkItem with no additional throttling. I thought that was cool to just queue all jobs we need to do, how much ever we need to this, and let .NET take care of this. But this has subtle issue. All our jobs have consumed the full queue count. When OracleConnection wants to get a pooled connection from the pool, it also queues to the thread pool. But it is never going to complete. Our jobs are all waiting for OracleConnection.Open, and its Queued Thread proc will still be in queue. So finally the wait will exit by timeout. It is a pity that even though there is plenty of pooled connection is available, we have consumed all ThreadPool proc, and Oracle's threadpool didn't even get a chance. Here setting ThreadPool.SetMaxThreads also isn't going to help. The issue is still the same. We hog all thread pool resource, and Orcale isn't going to find one and will still be in queue.

The fix is not to rely on only ThreadPool, but we add our own throttling as well. I used BlockingCollection and sempahores and add only some limit number of concurrent jobs in ThreadPool, say 5. In this way OracleConnection will always find a ThreadPool thread available, and wont fail.