I have a Function app in Azure that is triggered when an item is put on a queue. It looks something like this (greatly simplified):
public static async Task Run(string myQueueItem, TraceWriter log)
{
using (var client = new HttpClient())
{
client.BaseAddress = new Uri(Config.APIUri);
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
StringContent httpContent = new StringContent(myQueueItem, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("/api/devices/data", httpContent);
response.EnsureSuccessStatusCode();
string json = await response.Content.ReadAsStringAsync();
ApiResponse apiResponse = JsonConvert.DeserializeObject<ApiResponse>(json);
log.Info($"Activity data successfully sent to platform in {apiResponse.elapsed}ms. Tracking number: {apiResponse.tracking}");
}
}
This all works great and runs pretty well. Every time an item is put on the queue, we send the data to some API on our side and log the response. Cool.
The problem happens when there's a big spike in "the thing that generates queue messages" and a lot of items are put on the queue at once. This tends to happen around 1,000 - 1,500 items in a minute. The error log will have something like this:
2017-02-14T01:45:31.692 mscorlib: Exception while executing function: Functions.SendToLimeade. f-SendToLimeade__-1078179529: An error occurred while sending the request. System: Unable to connect to the remote server. System: Only one usage of each socket address (protocol/network address/port) is normally permitted 123.123.123.123:443.
At first, I thought this was an issue with the Azure Function app running out of local sockets, as illustrated here. However, then I noticed the IP address. The IP address 123.123.123.123 (of course changed for this example) is our IP address, the one that the HttpClient is posting to. So, now I'm wondering if it is our servers running out of sockets to handle these requests.
Either way, we have a scaling issue going on here. I'm trying to figure out the best way to solve it.
Some ideas:
Req.ServicePoint.BindIPEndPointDelegate
. This seems promising, but what do you do when you truly need to scale? I don't want this problem coming back in 2 years.serviceBus.maxConcurrentCalls
to 1 and only a single message will be processed at once. Maybe I could set this to a relatively low number. Now, at some point our queue will be filling up faster than we can process them, but at that point the answer is adding more servers on our end.Any insight on a recommended (scalable!) design for this sort of system would be greatly appreciated!
I think I've figured out a solution for this. I've been running these changes for the past 3 hours 6 hours, and I've had zero socket errors. Before I would get these errors in large batches every 30 minutes or so.
First, I added a new class to manage the HttpClient.
public static class Connection
{
public static HttpClient Client { get; private set; }
static Connection()
{
Client = new HttpClient();
Client.BaseAddress = new Uri(Config.APIUri);
Client.DefaultRequestHeaders.Add("Connection", "Keep-Alive");
Client.DefaultRequestHeaders.Add("Keep-Alive", "timeout=600");
Client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
}
}
Now, we have a static instance of HttpClient
that we use for every call to the function. From my research, keeping HttpClient instances around for as long as possible is highly recommended, everything is thread safe, and HttpClient will queue up requests and optimize requests to the same host. Notice I also set the Keep-Alive
headers (I think this is the default, but I figured I'll be implicit).
In my function, I just grab the static HttpClient instance like:
var client = Connection.Client;
StringContent httpContent = new StringContent(myQueueItem, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("/api/devices/data", httpContent);
response.EnsureSuccessStatusCode();
I haven't really done any in-depth analysis of what's happening at the socket level (I'll have to ask our IT guys if they're able to see this traffic on the load balancer), but I'm hoping it just keeps a single socket open to our server and makes a bunch of HTTP calls as the queue items are processed. Anyway, whatever it's doing seems to be working. Maybe someone has some thoughts on how to improve.