Correct Implementation of Transient Fault Handling (Azure)

awj picture awj · Jan 6, 2015 · Viewed 6.9k times · Source

For the past day or so I've been trying to implement Transient Fault Handling on an Azure SQL database. Although I have a working connection to the DB, I'm not convinced that it's handling the transient faults as expected.

So far my approach has involved

public static void SetRetryStratPol()
{
    const string defaultRetryStrategyName = "default";

    var strategy = new Incremental(defaultRetryStrategyName, 3, TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(2));
    var strategies = new List<RetryStrategy> { strategy };
    var manager = new RetryManager(strategies, defaultRetryStrategyName);
    RetryManager.SetDefault(manager);
    retryPolicy = new RetryPolicy<SqlDatabaseTransientErrorDetectionStrategy>(strategy);
    retryPolicy.Retrying += (obj, eventArgs) =>
                            {
                                var msg = String.Format("Retrying, CurrentRetryCount = {0} , Delay = {1}, Exception = {2}", eventArgs.CurrentRetryCount, eventArgs.Delay, eventArgs.LastException.Message);
                                System.Diagnostics.Debug.WriteLine(msg);
                            };
}

I call that method from the Global.asax's, Application_Start(). [retryPolicy is a global static variable on a static class which also includes this next method.]

I then have a method

public static ReliableSqlConnection GetReliableConnection()
{
    var conn = new ReliableSqlConnection("Server=...,1433;Database=...;User ID=...;Password=...;Trusted_Connection=False;Encrypt=True;Connection Timeout=30;", retryPolicy);

    conn.Open();

    return conn;
}

I then use this method

using (var conn = GetReliableConnection())
using (var cmd = conn.CreateCommand())
{
    cmd.CommandText = "SELECT COUNT(*) FROM ReliabilityTest";

    result = (int) cmd.ExecuteScalarWithRetry();

    return View(result);
}

So far, this works. Then, in order to test the retry policy, I've tried using a wrong username (a suggestion from here).

But when I step through that code the cursor immediately jumps to my catch statement with

Login failed for user '[my username]'.

I would have expected that this exception only be caught after several seconds, but no delay is incurred at all.

Furthermore, I've also tried with the Entity Framework, following exactly this post, but get the same result.

What have I missed? Is there a configuration step or am I incorrectly inducing a transient fault?

Answer

Gaurav Mantri picture Gaurav Mantri · Jan 6, 2015

Transient Fault Handling block is for handling transient errors. Failed login because of incorrect username/password is certainly not one of them. From this web page: http://msdn.microsoft.com/en-us/library/dn440719%28v=pandp.60%29.aspx:

What Are Transient Faults?

When an application uses a service, errors can occur because of temporary conditions such as intermittent service, infrastructure-level faults, network issues, or explicit throttling by the service; these types of error occur more frequently with cloud-based services, but can also occur in on-premises solutions. If you retry the operation a short time later (maybe only a few milliseconds later) the operation may succeed. These types of error conditions are referred to as transient faults. Transient faults typically occur very infrequently, and in most cases, only a few retries are necessary for the operation to succeed.

You may want to check the source code for this application block (http://topaz.codeplex.com/) and see what error codes returned from SQL databases are considered transient errors and are thus retried.

You can always extend the functionality and include failed login as one of the transient error to test your code.

UPDATE

Do take a look at the source code here: http://topaz.codeplex.com/SourceControl/latest#source/Source/TransientFaultHandling.Data/SqlDatabaseTransientErrorDetectionStrategy.cs. This is where the retry magic happens. What you could do is create a class (let's call it CustomSqlDatabaseTransientErrorDetectionStrategy) and copy the entire code from the link to this class). Then for testing purpose, you can add login failed scenario as one of the transient error and use this class in your application instead of SqlDatabaseTransientErrorDetectionStrategy.