SignalR .NET client disconnecting

smitra picture smitra · Mar 20, 2013 · Viewed 8.6k times · Source

We are coming across an interesting issue. Here is what our setup looks like:

  • SignalR Server (an ASP.NET MVC application) on windows Server 2012.
  • Sencha HTML5 apps (SignalR clients) on the same server (Windows Server 2012).
  • .NET Windows Service on a Windows Server 2008 R2 server. This also acts as a SignalR client.

Initially we were using SignalR 0.5.3 - when we started observing that the windows service's connection to the signal R server drops out. The frequency of this ranges from every few minutes to every few hours. It reconnects in most cases, but fails to reconnect occassionally, resulting in the windows service losing its connection once every couple of days. But there is no set pattern to it. It is not related to server restarting/backups, etc. We added logging to the windows service to monitor the StateChanged event on the client connection and found that the event gets fired when it disconnects and reconnects, but not when it does not reconnect.

Then we came across this thread: client constantly reconnecting

and decided to upgrade everything to SignalR 1.0.1 (we had to do it anyway at some point). The windows service was also upgraded to framework 4.5 (from Framework 2.0) now referencing the new Microsoft.AspNet.SignalR.Client.dll. This also allowed us (using a newly added connection property) to determine that the windows service was in fact using the ServerSentEvents protocol. Installing the same windows service on a Windows Server 2012 machine uses the WebSockets protocol. This is in line with this thread: SignalR .NET Client doesn't support WebSockets on Windows 7

However, the behaviour of the service on the Windows Server 2008 R2 server did not change. It still disconnects and reconnects, and loses its connection once in a while. Due to a few limitations, we cannot use the windows server 2012 for the windows service and are stuck with older OSs. This isn't to say that windows service using the websockets protocol would solve all our problems (we haven't tested that thoroughly).

The third thing we tried is to get the source code from GitHub and compile it and upgrade the services (SignalR Server, and the clients) - this was done to ensure that we get the latest copy with any potential bug fixes.

But it did not help. We are now at a point where we feel we have exhausted our options. Suggestions would be greatly appreciated. Thanks.

=====================================

EDIT: MORE INFORMATION:

Okay, now we have some more information. We added some code into the windows service (SignalR Client) to log into the SignalR Server every 30 minutes (for testing the connection).

Here is what happens on the client side every 30 minutes:

WriteEvent(Now(), "INFO", "PING", "Performing logon procedure with SiteCode = " & msSiteCode & ".")
trans.Invoke("login", New String() {msSiteCode, "", "SERVER", "", ""})

where trans is the instance of the server-side class inheriting from Hub, and WriteEvent is basically a trace to write to a log file.

and the client side also has a 'isLoggedIn' method as follows:

Private Sub isLoggedIn(ByVal bLoggedIn As String)
        If bLoggedIn Then
            WriteEvent(Now(), "INFO", "", "SignalR Server: Authenticated")           
        Else
            WriteEvent(Now(), "ERROR", "", "SignalR Server: Authentication failed")
        End If
End Sub

On the server side we have the login method:

Public Sub login(ByVal sAccount As String, _
                     ByVal sCompanyCode As String, _
                     ByVal sClientId As String, _
                     ByVal sPassword As String, _
                     ByVal sModuleCode As String)
       Try
            'Some code omitted that validates the user and sets bValidated.

            If bValidated Then
                'Update user in cache
                ConnectionCache.Instance.UpdateCache(userId, Context.ConnectionId, UserCredential.Connection_Status.Connected)
                Clients.Caller.isLoggedIn(True)

                Dim connectionId As String = ConnectionCache.Instance.FindConnectionId(userId)
                LogEvent("Successful login for connectionid: " & connectionId & ". Context. User: " & userId, _
                         EventLogEntryType.Information)
            Else
                Clients.Caller.isLoggedIn(False, results)
            End If
        Catch ex As Exception
            LogEvent("Login: " & ex.Message, EventLogEntryType.Error)
        End Try
End Sub

If we look at the client log file, every 30 minutes we get the following log entries:

  • Performing logon procedure with SiteCode = ABCD.
  • SignalR Server: Authenticated

So we know that the login server-side method is being called, and the isLoggedIn client-side method is also called.

However, at some point, while the server-side method is called, the isLoggedIn client-side method does not get called. So every 30 minutes, we start getting just one entry:

  • Performing logon procedure with SiteCode = ABCD.

In addition, the log event:

LogEvent("Successful login for connectionid: " & connectionId & ". Context. User: " & userId, EventLogEntryType.Information)

in the server-side login method gets written on the server-side log. So Clients.Caller.isLoggedIn(True) gets called as expected, but we don't see that on the client-side.

So I guess what we are looking at is that the client is always able to access the server and is able to call the server side (login) function, but the server fails calling the client side (isLoggedIn) function, and this starts happening at some point.

Also, this could be something specific to .NET clients, as I am pretty sure we have not seen this happening with our HTML5/javascript clients.

Answer

smitra picture smitra · Apr 30, 2014

In the end, we just created a simple "PINGING" function. This gets called every 15 minutes. The logic is as follows:

  1. SignalR Client has a timer that calls the server PING method every 15 minutes.
  2. Server calls client's PINGCLIENT method on the client in response.
  3. In the next PING timer event on the client (after 15 minutes) we check if got the response. If we did not, we suspend all activity and re-initialise the Hub connection. Then restart the PINGING timer.

So while we gave up on trying to figure out what the cause was, we have a workaround to manage loss of "server-to-client" connection when it happens. Note that this is in addition to the in-built re-connection logic in signalR.

We also maintain logs and on average this happens (the client does not get a PING back from the server) maybe once a day.