I/O performance - async vs TPL vs Dataflow vs RX

Eliezer Kohen picture Eliezer Kohen · Apr 17, 2013 · Viewed 7.7k times · Source

I have a piece of C# 5.0 code that generates a ton of network and disk I/O. I need to run multiple copies of this code in parallel. Which of the following technologies is likely to give me the best performance:

  • async methods with await

  • directly use Task from TPL

  • the TPL Dataflow nuget

  • Reactive Extensions

I'm not very good at this parallel stuff, but if using a lower lever, like say Thread, can give me a lot better performance I'd consider that too.

Answer

Ana Betts picture Ana Betts · Apr 17, 2013

This is like trying to optimize the length of your transatlantic flight by asking the quickest method to remove your seatbelt.

Ok, some real advice, since I was kind of a jerk

Let's give a helpful answer. Think of performance as in "Classes" of activities - each one is an order of magnitude slower (at least!):

  1. Only accessing the CPU, very little memory usage (i.e. rendering very simple graphics to a very fast GPU, or calculating digits of Pi)
  2. Only accessing CPU and in-memory things, nothing on disk (i.e. a well-written game)
  3. Accessing the disk
  4. Accessing the network.

If you do even one of activity #3, there's no point in doing optimizations typical to activities #1 and #2 like optimizing threading libraries - they're completely overshadowed by the disk hit. Same for CPU tricks - if you're constantly incurring L2/L3 cache misses, sparing a few CPU cycles by hand-writing assembly isn't worth it (which is why things like loop unrolling are usually a bad idea these days).

So, what can we derive from this? There are two ways to make your program faster, either move up from #3 to #2 (which isn't often possible, depending on what you're doing), or by doing less I/O. I/O and network speed is the rate-limiting factor in most modern applications, and that's what you should be trying to optimize.