I am trying to create a Redis message bus failover scenario with a SignalR app.
At first, we tried a simple hardware load-balancer failover, that simply monitored two Redis servers. The SignalR application pointed to the singular HLB endpoint. I then failed one server, but was unable to successfully get any messages through on the second Redis server without recycling the SignalR app pool. Presumably this is because it needs to issue the setup commands to the new Redis message bus.
As of SignalR RC1, Microsoft.AspNet.SignalR.Redis.RedisMessageBus
uses Booksleeve's RedisConnection()
to connect to a single Redis for pub/sub.
I created a new class, RedisMessageBusCluster()
that uses Booksleeve's ConnectionUtils.Connect()
to connect to one in a cluster of Redis servers.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using BookSleeve;
using Microsoft.AspNet.SignalR.Infrastructure;
namespace Microsoft.AspNet.SignalR.Redis
{
/// <summary>
/// WIP: Getting scaleout for Redis working
/// </summary>
public class RedisMessageBusCluster : ScaleoutMessageBus
{
private readonly int _db;
private readonly string[] _keys;
private RedisConnection _connection;
private RedisSubscriberConnection _channel;
private Task _connectTask;
private readonly TaskQueue _publishQueue = new TaskQueue();
public RedisMessageBusCluster(string serverList, int db, IEnumerable<string> keys, IDependencyResolver resolver)
: base(resolver)
{
_db = db;
_keys = keys.ToArray();
// uses a list of connections
_connection = ConnectionUtils.Connect(serverList);
//_connection = new RedisConnection(host: server, port: port, password: password);
_connection.Closed += OnConnectionClosed;
_connection.Error += OnConnectionError;
// Start the connection - TODO: can remove this Open as the connection is already opened, but there's the _connectTask is used later on
_connectTask = _connection.Open().Then(() =>
{
// Create a subscription channel in redis
_channel = _connection.GetOpenSubscriberChannel();
// Subscribe to the registered connections
_channel.Subscribe(_keys, OnMessage);
// Dirty hack but it seems like subscribe returns before the actual
// subscription is properly setup in some cases
while (_channel.SubscriptionCount == 0)
{
Thread.Sleep(500);
}
});
}
protected override Task Send(Message[] messages)
{
return _connectTask.Then(msgs =>
{
var taskCompletionSource = new TaskCompletionSource<object>();
// Group messages by source (connection id)
var messagesBySource = msgs.GroupBy(m => m.Source);
SendImpl(messagesBySource.GetEnumerator(), taskCompletionSource);
return taskCompletionSource.Task;
},
messages);
}
private void SendImpl(IEnumerator<IGrouping<string, Message>> enumerator, TaskCompletionSource<object> taskCompletionSource)
{
if (!enumerator.MoveNext())
{
taskCompletionSource.TrySetResult(null);
}
else
{
IGrouping<string, Message> group = enumerator.Current;
// Get the channel index we're going to use for this message
int index = Math.Abs(group.Key.GetHashCode()) % _keys.Length;
string key = _keys[index];
// Increment the channel number
_connection.Strings.Increment(_db, key)
.Then((id, k) =>
{
var message = new RedisMessage(id, group.ToArray());
return _connection.Publish(k, message.GetBytes());
}, key)
.Then((enumer, tcs) => SendImpl(enumer, tcs), enumerator, taskCompletionSource)
.ContinueWithNotComplete(taskCompletionSource);
}
}
private void OnConnectionClosed(object sender, EventArgs e)
{
// Should we auto reconnect?
if (true)
{
;
}
}
private void OnConnectionError(object sender, BookSleeve.ErrorEventArgs e)
{
// How do we bubble errors?
if (true)
{
;
}
}
private void OnMessage(string key, byte[] data)
{
// The key is the stream id (channel)
var message = RedisMessage.Deserialize(data);
_publishQueue.Enqueue(() => OnReceived(key, (ulong)message.Id, message.Messages));
}
protected override void Dispose(bool disposing)
{
if (disposing)
{
if (_channel != null)
{
_channel.Unsubscribe(_keys);
_channel.Close(abort: true);
}
if (_connection != null)
{
_connection.Close(abort: true);
}
}
base.Dispose(disposing);
}
}
}
Booksleeve has its own mechanism for determining a master, and will automatically fail over to another server, and am now testing this with SignalR.Chat
.
In web.config
, I set the list of available servers:
<add key="redis.serverList" value="dbcache1.local:6379,dbcache2.local:6379"/>
Then in Application_Start()
:
// Redis cluster server list
string redisServerlist = ConfigurationManager.AppSettings["redis.serverList"];
List<string> eventKeys = new List<string>();
eventKeys.Add("SignalR.Redis.FailoverTest");
GlobalHost.DependencyResolver.UseRedisCluster(redisServerlist, eventKeys);
I added two additional methods to Microsoft.AspNet.SignalR.Redis.DependencyResolverExtensions
:
public static IDependencyResolver UseRedisCluster(this IDependencyResolver resolver, string serverList, IEnumerable<string> eventKeys)
{
return UseRedisCluster(resolver, serverList, db: 0, eventKeys: eventKeys);
}
public static IDependencyResolver UseRedisCluster(this IDependencyResolver resolver, string serverList, int db, IEnumerable<string> eventKeys)
{
var bus = new Lazy<RedisMessageBusCluster>(() => new RedisMessageBusCluster(serverList, db, eventKeys, resolver));
resolver.Register(typeof(IMessageBus), () => bus.Value);
return resolver;
}
Now the problem is that when I have several breakpoints enabled, until after a user name has been added, then disable all breakpoints, the application works as expected. However, with the breakpoints disabled from the beginning, there seems to be some race condition that may be failing during the connection process.
Thus, in RedisMessageCluster()
:
// Start the connection
_connectTask = _connection.Open().Then(() =>
{
// Create a subscription channel in redis
_channel = _connection.GetOpenSubscriberChannel();
// Subscribe to the registered connections
_channel.Subscribe(_keys, OnMessage);
// Dirty hack but it seems like subscribe returns before the actual
// subscription is properly setup in some cases
while (_channel.SubscriptionCount == 0)
{
Thread.Sleep(500);
}
});
I tried adding both a Task.Wait
, and even an additional Sleep()
(not shown above) - which were waiting/etc, but still getting errors.
The recurring error seems to be in Booksleeve.MessageQueue.cs
~ln 71:
A first chance exception of type 'System.InvalidOperationException' occurred in BookSleeve.dll
iisexpress.exe Error: 0 : SignalR exception thrown by Task: System.AggregateException: One or more errors occurred. ---> System.InvalidOperationException: The queue is closed
at BookSleeve.MessageQueue.Enqueue(RedisMessage item, Boolean highPri) in c:\Projects\Frameworks\BookSleeve-1.2.0.5\BookSleeve\MessageQueue.cs:line 71
at BookSleeve.RedisConnectionBase.EnqueueMessage(RedisMessage message, Boolean queueJump) in c:\Projects\Frameworks\BookSleeve-1.2.0.5\BookSleeve\RedisConnectionBase.cs:line 910
at BookSleeve.RedisConnectionBase.ExecuteInt64(RedisMessage message, Boolean queueJump) in c:\Projects\Frameworks\BookSleeve-1.2.0.5\BookSleeve\RedisConnectionBase.cs:line 826
at BookSleeve.RedisConnection.IncrementImpl(Int32 db, String key, Int64 value, Boolean queueJump) in c:\Projects\Frameworks\BookSleeve-1.2.0.5\BookSleeve\IStringCommands.cs:line 277
at BookSleeve.RedisConnection.BookSleeve.IStringCommands.Increment(Int32 db, String key, Int64 value, Boolean queueJump) in c:\Projects\Frameworks\BookSleeve-1.2.0.5\BookSleeve\IStringCommands.cs:line 270
at Microsoft.AspNet.SignalR.Redis.RedisMessageBusCluster.SendImpl(IEnumerator`1 enumerator, TaskCompletionSource`1 taskCompletionSource) in c:\Projects\Frameworks\SignalR\SignalR.1.0RC1\SignalR\src\Microsoft.AspNet.SignalR.Redis\RedisMessageBusCluster.cs:line 90
at Microsoft.AspNet.SignalR.Redis.RedisMessageBusCluster.<Send>b__2(Message[] msgs) in c:\Projects\Frameworks\SignalR\SignalR.1.0RC1\SignalR\src\Microsoft.AspNet.SignalR.Redis\RedisMessageBusCluster.cs:line 67
at Microsoft.AspNet.SignalR.TaskAsyncHelper.GenericDelegates`4.<>c__DisplayClass57.<ThenWithArgs>b__56() in c:\Projects\Frameworks\SignalR\SignalR.1.0RC1\SignalR\src\Microsoft.AspNet.SignalR.Core\TaskAsyncHelper.cs:line 893
at Microsoft.AspNet.SignalR.TaskAsyncHelper.TaskRunners`2.<>c__DisplayClass42.<RunTask>b__41(Task t) in c:\Projects\Frameworks\SignalR\SignalR.1.0RC1\SignalR\src\Microsoft.AspNet.SignalR.Core\TaskAsyncHelper.cs:line 821
--- End of inner exception stack trace ---
---> (Inner Exception #0) System.InvalidOperationException: The queue is closed
at BookSleeve.MessageQueue.Enqueue(RedisMessage item, Boolean highPri) in c:\Projects\Frameworks\BookSleeve-1.2.0.5\BookSleeve\MessageQueue.cs:line 71
at BookSleeve.RedisConnectionBase.EnqueueMessage(RedisMessage message, Boolean queueJump) in c:\Projects\Frameworks\BookSleeve-1.2.0.5\BookSleeve\RedisConnectionBase.cs:line 910
at BookSleeve.RedisConnectionBase.ExecuteInt64(RedisMessage message, Boolean queueJump) in c:\Projects\Frameworks\BookSleeve-1.2.0.5\BookSleeve\RedisConnectionBase.cs:line 826
at BookSleeve.RedisConnection.IncrementImpl(Int32 db, String key, Int64 value, Boolean queueJump) in c:\Projects\Frameworks\BookSleeve-1.2.0.5\BookSleeve\IStringCommands.cs:line 277
at BookSleeve.RedisConnection.BookSleeve.IStringCommands.Increment(Int32 db, String key, Int64 value, Boolean queueJump) in c:\Projects\Frameworks\BookSleeve-1.2.0.5\BookSleeve\IStringCommands.cs:line 270
at Microsoft.AspNet.SignalR.Redis.RedisMessageBusCluster.SendImpl(IEnumerator`1 enumerator, TaskCompletionSource`1 taskCompletionSource) in c:\Projects\Frameworks\SignalR\SignalR.1.0RC1\SignalR\src\Microsoft.AspNet.SignalR.Redis\RedisMessageBusCluster.cs:line 90
at Microsoft.AspNet.SignalR.Redis.RedisMessageBusCluster.<Send>b__2(Message[] msgs) in c:\Projects\Frameworks\SignalR\SignalR.1.0RC1\SignalR\src\Microsoft.AspNet.SignalR.Redis\RedisMessageBusCluster.cs:line 67
at Microsoft.AspNet.SignalR.TaskAsyncHelper.GenericDelegates`4.<>c__DisplayClass57.<ThenWithArgs>b__56() in c:\Projects\Frameworks\SignalR\SignalR.1.0RC1\SignalR\src\Microsoft.AspNet.SignalR.Core\TaskAsyncHelper.cs:line 893
at Microsoft.AspNet.SignalR.TaskAsyncHelper.TaskRunners`2.<>c__DisplayClass42.<RunTask>b__41(Task t) in c:\Projects\Frameworks\SignalR\SignalR.1.0RC1\SignalR\src\Microsoft.AspNet.SignalR.Core\TaskAsyncHelper.cs:line 821<---
public void Enqueue(RedisMessage item, bool highPri)
{
lock (stdPriority)
{
if (closed)
{
throw new InvalidOperationException("The queue is closed");
}
Where a closed queue exception is being thrown.
I foresee another issue: Since the Redis connection is made in Application_Start()
there may be some issues in "reconnecting" to another server. However, I think this is valid when using the singular RedisConnection()
, where there is only one connection to choose from. However, with the intorduction of ConnectionUtils.Connect()
I'd like to hear from @dfowler
or the other SignalR guys in how this scenario is handled in SignalR.
The SignalR team has now implemented support for a custom connection factory with StackExchange.Redis, the successor to BookSleeve, which supports redundant Redis connections via ConnectionMultiplexer.
The initial problem encountered was that in spite of creating my own extension methods in BookSleeve to accept a collection of servers, fail-over was not possible.
Now, with the evolution of BookSleeve to StackExchange.Redis, we can now configure collection of servers/ports right in the Connect
initialization.
The new implementation is much simpler than the road I was going down, in creating a UseRedisCluster
method, and the back-end pluming now supports true fail-over:
var conn = ConnectionMultiplexer.Connect("redisServer1:6380,redisServer2:6380,redisServer3:6380,allowAdmin=true");
StackExchange.Redis also allows for additional manual configuration as outlined in the Automatic and Manual Configuration
section of the documentation:
ConfigurationOptions config = new ConfigurationOptions
{
EndPoints =
{
{ "redis0", 6379 },
{ "redis1", 6380 }
},
CommandMap = CommandMap.Create(new HashSet<string>
{ // EXCLUDE a few commands
"INFO", "CONFIG", "CLUSTER",
"PING", "ECHO", "CLIENT"
}, available: false),
KeepAlive = 180,
DefaultVersion = new Version(2, 8, 8),
Password = "changeme"
};
In essence, the ability to initialize our SignalR scale-out environment with a collection of servers now solves the initial problem.