.NET sockets vs C++ sockets at high performance

mdarsigny picture mdarsigny · Dec 11, 2011 · Viewed 10.8k times · Source

My question is to settle an argument with my co-workers on C++ vs C#.

We have implemented a server that receives a large amount of UDP streams. This server was developed in C++ using asynchronous sockets and overlapped I/O using completion ports. We use 5 completion ports with 5 threads. This server can easily handle a 500 Mbps throughput on a gigabit network without any lost of packets / error (we didn't push our tests farther than 500 Mbps).

We have tried to re-implement the same kind of server in C# and we have not been able to reach the same incoming throughput. We are using asynchronous receive using ReceiveAsync method and a pool of SocketAsyncEventArgs to avoid the overhead of creating new object for every receive call. Each SAEventArgs has a buffer set to it so we do not need to allocate memory for every receive. The pool is very, very large so we can queue more than 100 receive requests. This server is unable to handle an incoming throughput of more than 240 Mbps. Over that limit, we lose some packets in our UDP streams.

My question is this: should I expect the same performance using C++ sockets and C# sockets? My opinion is that it should be the same performance if memory is managed correctly in .NET.

Side question: would anybody know a good article/reference explaining how .NET sockets use I/O completion ports under the hood?

Answer

Richard picture Richard · Dec 11, 2011

would anybody know a good article/reference explaining how .NET sockets use I/O completion ports under the hood?

I suspect the only reference would be the implementation (ie. Reflector or other assembly de-compiler). With that you will find that all asynchronous IO goes through an IO Completion Port with call backs being processed in the IO-thread pool (which is separate to the normal thread pool).

use 5 completion ports

I would expect to use a single completion port processing all the IO into a single pool of threads with one thread per pool servicing completions (assuming you are doing any other IO, including disk, asynchronously as well).

Multiple completion ports would make sense if you have some form of prioritisation going on.

My question is this: should I expect the same performance using C++ sockets and C# sockets?

Yes or no, depending on how narrowly you define the "using ... sockets" part. In terms of the operations from the start of the asynchronous operation until the completion is posted to the completion port I would expect no significant difference (all the processing is in the Win32 API or Windows kernel).

However the safety that the .NET runtime provides will add some overhead. Eg. buffer lengths will be checked, delegates validated etc. If the limit on the application is CPU then this is likely to make a difference, and at the extreme a small difference can easily add up.

Also the .NET version will occasionally pause for GC (.NET 4.5 does asynchronous collection, so this will get better in the future). There are techniques to minimise garbage accumulating (eg. reuse objects rather than creating them, make use of structures while avoiding boxing).

In the end, if the C++ version works and is meeting your performance needs, why port?