Get a reference to a struct inside array

kaalus picture kaalus · Aug 20, 2011 · Viewed 16.5k times · Source

I want to modify a field of a struct which is inside an array without having to set entire struct. In the example below, I want to set one field of element 543 in the array. I don't want to have to copy entire element (because copying MassiveStruct would hurt performance).

class P
{
    struct S
    {
      public int a;
      public MassiveStruct b;
    }

    void f(ref S s)
    {
      s.a = 3;
    }

    public static void Main()
    {
      S[] s = new S[1000];
      f(ref s[543]);  // Error: An object reference is required for the non-static field, method, or property
    }
}

Is there a way to do it in C#? Or do I always have to copy entire struct out of array, modify the copy, and then put the modified copy back into array.

Answer

Glenn Slayden picture Glenn Slayden · Jan 11, 2013

[edit 2017: see important comments regarding C# 7 at the end of this post]

After many years of wrestling with this exact problem, I'll summarize the few techniques and solu­tions I have found. Stylistic tastes aside, arrays of structs are really the o̲n̲l̲y in-memory bulk storage method available in C#. If your app truly processes millions of medium-sized objects under high throughput conditions, there no other managed alternative.

I agree with @kaalus that object headers and GC pressure can quickly mount; nevertheless my NLP grammar pro­cess­ing system can manipulate 8-10 gigabytes (or more) of structural analyses in less than a min­ute when parsing and/or generating lengthy natural language sentences. Cue the chorus: “C# isn't meant for such problems...,” “Switch to assembly language...,” “Wire-wrap up an FPGA...,” etc.

Well, instead let's run some tests. First of all, it is critical to have total understanding of the full spectrum of value-type (struct) man­agement issues and the class vs. struct tradeoff sweet-spots. Also of course boxing, pinning/unsafe code, fixed buffers, GCHandle, IntPtr, and more, but most importantly of all in my opinion, wise use of managed pointers (a.k.a. "interior pointers").

Your mastery of these topics will also include knowledge of the fact that, should you happen to include in your struct one or more references to managed types (as opposed to just blittable primitives), then your options for accessing the struct with unsafe pointers are greatly reduced. This is not a problem for the managed pointer method I'll mention below. So generally, including object referen­ces is fine and doesn't change much regarding this discussion.

Oh, and if you do really need to preserve your unsafe access, you can use a GCHandle in 'Normal' mode to store object reference(s) in your struct indefinitely. Fortunately, putting the GCHandle into your struct does not trigger the unsafe-access prohibition. (Note that GCHandle is itself a value-type, and you can even define and go to town with

var gch = GCHandle.Alloc("spookee",GCHandleType.Normal);
GCHandle* p = &gch;
String s = (String)p->Target;

...and so forth. As a value type, the GCHandle is imaged directly into your struct, but obviously any reference types it stores are not. They are out in the heap, not included in the physical layout of your array. Finally on GCHandle, beware of its copy-semantics, though, because you'll have a memory leak if you don't eventually Free each GCHandle you allocate.

@Ani reminds us that some people consider mutable struct instances "evil," but it's really the fact that they are accident prone that's the problem. Indeed, the OP's example...

s[543].a = 3;

...illustrates exactly what we're trying to achieve: access our data records in-situ. (Beware: the syntax for an array of reference-type 'class' instances has identical appearance, but in this article we're specifically discussing only non-jagged arrays of user-defined value-types here.) For my own programs, I generally consider it a severe bug if I encounter an oversized blittable struct that has (accidentally) been wholly imaged out of its array storage row:

rec no_no = s[543];   // don't do
no_no.a = 3           // it like this

As far as how big (wide) your struct can or should be, it won't matter, because you are going to be careful never to let the struct do what was just shown in the previous example, that is, migrate in-toto out of its embedding array. In fact, this points to a fundamental premise of this entire article:

rule:
For arrays of struct, always access individual fields in-situ; never "mention" (in C#) the struct instance itself in-toto.

Unfortunately, the C# language offers no way to systematically flag or forbid code that violates this rule, so success here generally depends on careful programming discipline.

Since our "jumbo-structs" are never imaged out of their array, they're really just templates over memory. In other words, the right thinking is to conceive of the struct as overlaying the array elements. We always think of each as a vacuous "memory temp­late," as opposed to a transferrable or portable encapsulator or data container. For array-bound "jumbo" value-types, we never want to invoke that most existential characteristic of a "struct", namely, pass-by-value.

Example:

public struct rec
{
    public int a, b, c, d, e, f;
}

Here we overlay 6 ints for a total of 24 bytes per "record." You'll want to consider and be aware of packing options to obtain an alignment-friendly size. But excessive padding can cut into your memory budget: because a more important consideration is the 85,000 byte limit on non-LOH objects. Make sure your record size multiplied by the expected number of rows does not exceed this limit.

So for the example given here, you would be best advised to keep your array of recs to no more 3,000 rows each. Hopefully your application can be designed around this sweet-spot. This is not so limiting when you remember that--alternatively--each row would be a separate garbage-collected object, instead of just the one array. You've cut your object proliferation by a three orders of magnitude, which is good for a day's work. Thus the .NET environment here is strongly steering us with a pretty specific constraint: it seems that if you target your app's memory design towards monolithic alloc­ations in the 30-70 KB range, then you really can get away with lots and lots of them, and in fact you'll instead become limited by a thornier set of performance bottlenecks (namely, bandwidth on the hardware bus).

So now you have a single .NET reference type (array) with 3,000 6-tuples in physically contiguous tabular storage. First and foremost, we must be super-careful to never "pick up" one of the structs. As Jon Skeet notes above, "Massive structs will often perform worse than classes," and this is absolutely correct. There's no better way to paralyze your memory bus than to start throwing plump value types around willy-nilly.

So let's capitalize on an infrequently-mentioned aspect of the array of structs: All objects (and fields of those objects or structs) of all rows of the entire array are always initialized to their default values. You can start plugging values in, one at a time, in any row or column (field), anywhere in the array. You can leave some fields at their default values, or replace neighbor fields without dis­turbing one in the middle. Gone is that annoying manual initialization required with stack-resident (local variable) structs before use.

Sometimes it's hard to maintain the field-by-field approach because .NET is always trying to get us to blast in an entire new'd-up struct--but to me, this so-called "initialization" is just a violation of our taboo (against plucking the whole struct out of the array), in a different guise.

Now we get to the crux of the matter. Clearly, accessing your tabular data in-situ minimizes data-shuffling busywork. But often this is an inconvenient hassle. Array accesses can be slow in .NET, due to bounds-checking. So how do you maintain a "working" pointer into the interior of an array, so as to avoid having the system constantly recomputing the indexing offsets.

Evaluation

Let's evaluate the performance of five different methods for the manipulation of individual fields within value-type array storage rows. The test below is designed to measure the efficiency of intensively accessing the data fields of a struct positioned at some array index, in situ--that is, "where they lie," without extracting or rewriting the entire struct (array element). Five different access methods are compared, with all other factors held the same.

The five methods are as follows:

  1. Normal, direct array access via square-brackets and the field specifier dot. Note that, in .NET, arrays are a special and unique primitive of the Common Type System. As @Ani mentions above, this syntax cannot be used to change an individual field of a reference instance, such as a list, even when it is parameterized with a value-type.
  2. Using the undocumented __makeref C# language keyword.
  3. Managed pointer via a delegate which uses the ref keyword
  4. "Unsafe" pointers
  5. Same as #3, but using a C# function instead of a delegate.

Before I give the C# test results, here's the test harness implementation. These tests were run on .NET 4.5, an AnyCPU release build running on x64, Workstation gc. (Note that, because the test isn't interested the efficiency of allocating and de-allocating the array itself, the LOH consideration mentioned above does not apply.)

const int num_test = 100000;
static rec[] s1, s2, s3, s4, s5;
static long t_n, t_r, t_m, t_u, t_f;
static Stopwatch sw = Stopwatch.StartNew();
static Random rnd = new Random();

static void test2()
{
    s1 = new rec[num_test];
    s2 = new rec[num_test];
    s3 = new rec[num_test];
    s4 = new rec[num_test];
    s5 = new rec[num_test];

    for (int x, i = 0; i < 5000000; i++)
    {
        x = rnd.Next(num_test);
        test_m(x); test_n(x); test_r(x); test_u(x); test_f(x);
        x = rnd.Next(num_test);
        test_n(x); test_r(x); test_u(x); test_f(x); test_m(x);
        x = rnd.Next(num_test);
        test_r(x); test_u(x); test_f(x); test_m(x); test_n(x);
        x = rnd.Next(num_test);
        test_u(x); test_f(x); test_m(x); test_n(x); test_r(x);
        x = rnd.Next(num_test);
        test_f(x); test_m(x); test_n(x); test_r(x); test_u(x);
        x = rnd.Next(num_test);
    }
    Debug.Print("Normal (subscript+field):          {0,18}", t_n);
    Debug.Print("Typed-reference:                   {0,18}", t_r);
    Debug.Print("C# Managed pointer: (ref delegate) {0,18}", t_m);
    Debug.Print("C# Unsafe pointer:                 {0,18}", t_u);
    Debug.Print("C# Managed pointer: (ref func):    {0,18}", t_f);
}

Because the code fragments which implement the test for each specific method are long-ish, I'll give the results first. Time is 'ticks;' lower means better.

Normal (subscript+field):             20,804,691
Typed-reference:                      30,920,655
Managed pointer: (ref delegate)       18,777,666   // <- a close 2nd
Unsafe pointer:                       22,395,806
Managed pointer: (ref func):          18,767,179   // <- winner

I was surprised that these results were so unequivocal. TypedReferences are slowest, presumably because they lug around type information along with the pointer. Considering the heft of the IL-code for the belabored "Normal" version, it performed surprisingly well. Mode transitions seem to hurt unsafe code to the point where you really have to justify, plan, and measure each place you're going to deploy it.

But the hands down fastest times are achieved by leveraging the ref keyword in functions' parameter passing for the purpose of pointing to an interior part of the array, thus eliminating the "per-field-access" array indexing computation.

Perhaps the design of my test favors this one, but the test scenarios are representative of empirical use patterns in my app. What surprised my about those numbers is that the advantage of staying in managed mode--while having your pointers, too--was not cancelled by having to call a function or invoke through a delegate.

The Winner

Fastest one: (And perhaps simplest too?)

static void f(ref rec e)
{
    e.a = 4;
    e.e = e.a;
    e.b = e.d;
    e.f = e.d;
    e.b = e.e;
    e.a = e.c;
    e.b = 5;
    e.d = e.f;
    e.c = e.b;
    e.e = e.a;
    e.b = e.d;
    e.f = e.d;
    e.c = 6;
    e.b = e.e;
    e.a = e.c;
    e.d = e.f;
    e.c = e.b;
    e.e = e.a;
    e.d = 7;
    e.b = e.d;
    e.f = e.d;
    e.b = e.e;
    e.a = e.c;
    e.d = e.f;
    e.e = 8;
    e.c = e.b;
    e.e = e.a;
    e.b = e.d;
    e.f = e.d;
    e.b = e.e;
    e.f = 9;
    e.a = e.c;
    e.d = e.f;
    e.c = e.b;
    e.e = e.a;
    e.b = e.d;
    e.a = 10;
    e.f = e.d;
    e.b = e.e;
    e.a = e.c;
    e.d = e.f;
    e.c = e.b;
}
static void test_f(int ix)
{
    long q = sw.ElapsedTicks;
    f(ref s5[ix]);
    t_f += sw.ElapsedTicks - q;
}

But it has the disadvantage that you can't keep related logic together in your program: the imple­mentation of the function is divided across two C# functions, f and test_f.

We can address this particular problem with only a tiny sacrifice in performance. The next one is basically identical to the foregoing, but embeds one of the functions within the other as a lambda function...

A Close Second

Replacing the static function in the preceding example with an inline delegate requires the use of ref arguments, which in turn precludes the use of the Func<T> lambda syntax; instead you must use an explicit delegate from old-style .NET.

By adding this global declaration once:

delegate void b(ref rec ee);

...we can use it throughout the program to directly ref into elements of array rec[], accessing them inline:

static void test_m(int ix)
{
    long q = sw.ElapsedTicks;
    /// the element to manipulate "e", is selected at the bottom of this lambda block
    ((b)((ref rec e) =>
    {
        e.a = 4;
        e.e = e.a;
        e.b = e.d;
        e.f = e.d;
        e.b = e.e;
        e.a = e.c;
        e.b = 5;
        e.d = e.f;
        e.c = e.b;
        e.e = e.a;
        e.b = e.d;
        e.f = e.d;
        e.c = 6;
        e.b = e.e;
        e.a = e.c;
        e.d = e.f;
        e.c = e.b;
        e.e = e.a;
        e.d = 7;
        e.b = e.d;
        e.f = e.d;
        e.b = e.e;
        e.a = e.c;
        e.d = e.f;
        e.e = 8;
        e.c = e.b;
        e.e = e.a;
        e.b = e.d;
        e.f = e.d;
        e.b = e.e;
        e.f = 9;
        e.a = e.c;
        e.d = e.f;
        e.c = e.b;
        e.e = e.a;
        e.b = e.d;
        e.a = 10;
        e.f = e.d;
        e.b = e.e;
        e.a = e.c;
        e.d = e.f;
        e.c = e.b;
    }))(ref s3[ix]);
    t_m += sw.ElapsedTicks - q;
}

Also, although it may look like a new lambda function is being instantiated on each call, this won't happen if you're careful: when using this method, make sure you do not "close over" any local variables (that is, refer to variables which are outside the lambda function, from within its body), or do anything else that will bar your delegate instance from being static. If a local variable happens to fall into your lambda and the lambda thus gets promoted to an instance/class, you'll "probably" notice a difference as it tries to create five million delegates.

As long as you keep the lambda function clear of these side-effects, there won't be multiple instances; what's happening here is that, whenever C# determines that a lambda has no non-explicit dependencies, it lazily creates (and caches) a static singleton. It's a little unfortunate that a performance alternation this drastic is hidden from our view as a silent optimization. Overall, I like this method. It's fast and clutter-free--except for the bizarre parentheses, none of which can be omitted here.

And the rest

For completeness, here are the rest of the tests: normal bracketing-plus-dot; TypedReference; and unsafe pointers.

static void test_n(int ix)
{
    long q = sw.ElapsedTicks;
    s1[ix].a = 4;
    s1[ix].e = s1[ix].a;
    s1[ix].b = s1[ix].d;
    s1[ix].f = s1[ix].d;
    s1[ix].b = s1[ix].e;
    s1[ix].a = s1[ix].c;
    s1[ix].b = 5;
    s1[ix].d = s1[ix].f;
    s1[ix].c = s1[ix].b;
    s1[ix].e = s1[ix].a;
    s1[ix].b = s1[ix].d;
    s1[ix].f = s1[ix].d;
    s1[ix].c = 6;
    s1[ix].b = s1[ix].e;
    s1[ix].a = s1[ix].c;
    s1[ix].d = s1[ix].f;
    s1[ix].c = s1[ix].b;
    s1[ix].e = s1[ix].a;
    s1[ix].d = 7;
    s1[ix].b = s1[ix].d;
    s1[ix].f = s1[ix].d;
    s1[ix].b = s1[ix].e;
    s1[ix].a = s1[ix].c;
    s1[ix].d = s1[ix].f;
    s1[ix].e = 8;
    s1[ix].c = s1[ix].b;
    s1[ix].e = s1[ix].a;
    s1[ix].b = s1[ix].d;
    s1[ix].f = s1[ix].d;
    s1[ix].b = s1[ix].e;
    s1[ix].f = 9;
    s1[ix].a = s1[ix].c;
    s1[ix].d = s1[ix].f;
    s1[ix].c = s1[ix].b;
    s1[ix].e = s1[ix].a;
    s1[ix].b = s1[ix].d;
    s1[ix].a = 10;
    s1[ix].f = s1[ix].d;
    s1[ix].b = s1[ix].e;
    s1[ix].a = s1[ix].c;
    s1[ix].d = s1[ix].f;
    s1[ix].c = s1[ix].b;
    t_n += sw.ElapsedTicks - q;
}


static void test_r(int ix)
{
    long q = sw.ElapsedTicks;
    var tr = __makeref(s2[ix]);
    __refvalue(tr, rec).a = 4;
    __refvalue(tr, rec).e = __refvalue( tr, rec).a;
    __refvalue(tr, rec).b = __refvalue( tr, rec).d;
    __refvalue(tr, rec).f = __refvalue( tr, rec).d;
    __refvalue(tr, rec).b = __refvalue( tr, rec).e;
    __refvalue(tr, rec).a = __refvalue( tr, rec).c;
    __refvalue(tr, rec).b = 5;
    __refvalue(tr, rec).d = __refvalue( tr, rec).f;
    __refvalue(tr, rec).c = __refvalue( tr, rec).b;
    __refvalue(tr, rec).e = __refvalue( tr, rec).a;
    __refvalue(tr, rec).b = __refvalue( tr, rec).d;
    __refvalue(tr, rec).f = __refvalue( tr, rec).d;
    __refvalue(tr, rec).c = 6;
    __refvalue(tr, rec).b = __refvalue( tr, rec).e;
    __refvalue(tr, rec).a = __refvalue( tr, rec).c;
    __refvalue(tr, rec).d = __refvalue( tr, rec).f;
    __refvalue(tr, rec).c = __refvalue( tr, rec).b;
    __refvalue(tr, rec).e = __refvalue( tr, rec).a;
    __refvalue(tr, rec).d = 7;
    __refvalue(tr, rec).b = __refvalue( tr, rec).d;
    __refvalue(tr, rec).f = __refvalue( tr, rec).d;
    __refvalue(tr, rec).b = __refvalue( tr, rec).e;
    __refvalue(tr, rec).a = __refvalue( tr, rec).c;
    __refvalue(tr, rec).d = __refvalue( tr, rec).f;
    __refvalue(tr, rec).e = 8;
    __refvalue(tr, rec).c = __refvalue( tr, rec).b;
    __refvalue(tr, rec).e = __refvalue( tr, rec).a;
    __refvalue(tr, rec).b = __refvalue( tr, rec).d;
    __refvalue(tr, rec).f = __refvalue( tr, rec).d;
    __refvalue(tr, rec).b = __refvalue( tr, rec).e;
    __refvalue(tr, rec).f = 9;
    __refvalue(tr, rec).a = __refvalue( tr, rec).c;
    __refvalue(tr, rec).d = __refvalue( tr, rec).f;
    __refvalue(tr, rec).c = __refvalue( tr, rec).b;
    __refvalue(tr, rec).e = __refvalue( tr, rec).a;
    __refvalue(tr, rec).b = __refvalue( tr, rec).d;
    __refvalue(tr, rec).a = 10;
    __refvalue(tr, rec).f = __refvalue( tr, rec).d;
    __refvalue(tr, rec).b = __refvalue( tr, rec).e;
    __refvalue(tr, rec).a = __refvalue( tr, rec).c;
    __refvalue(tr, rec).d = __refvalue( tr, rec).f;
    __refvalue(tr, rec).c = __refvalue( tr, rec).b;
    t_r += sw.ElapsedTicks - q;
}

static void test_u(int ix)
{
    long q = sw.ElapsedTicks;

    fixed (rec* p = &s4[ix])
    {
        p->a = 4;
        p->e = p->a;
        p->b = p->d;
        p->f = p->d;
        p->b = p->e;
        p->a = p->c;
        p->b = 5;
        p->d = p->f;
        p->c = p->b;
        p->e = p->a;
        p->b = p->d;
        p->f = p->d;
        p->c = 6;
        p->b = p->e;
        p->a = p->c;
        p->d = p->f;
        p->c = p->b;
        p->e = p->a;
        p->d = 7;
        p->b = p->d;
        p->f = p->d;
        p->b = p->e;
        p->a = p->c;
        p->d = p->f;
        p->e = 8;
        p->c = p->b;
        p->e = p->a;
        p->b = p->d;
        p->f = p->d;
        p->b = p->e;
        p->f = 9;
        p->a = p->c;
        p->d = p->f;
        p->c = p->b;
        p->e = p->a;
        p->b = p->d;
        p->a = 10;
        p->f = p->d;
        p->b = p->e;
        p->a = p->c;
        p->d = p->f;
        p->c = p->b;
    }
    t_u += sw.ElapsedTicks - q;
}

Summary

For memory-intensive work in large-scale C# apps, using managed pointers to directly access the fields of value-typed array elements in-situ is the way to go.

If you're really serious about performance, this might be enough reason to use C++/CLI (or CIL, for that matter) instead of C# for the relevant parts of your app, because those languages allow you to directly declare managed pointers within a function body.

In C#, the only way to create a managed pointer is to declare a function with a ref or out argument, and then the callee will observe the managed pointer. Thus, to get the performance benefits in C#, you have to use one of the (top two) methods shown above. [see C#7 below]

Sadly, these deploy the kludge of splitting a function into multiple parts just for the purpose of accessing an array element. Although considerably less elegant than the equivalent C++/CLI code would be, tests indicate that even in C#, for high-throughput applications we still obtain a big performance benefit versus naïve value-type array access.


[edit 2017: While perhaps conferring a small degree of prescience to this article's exhortations in general, the release of C# 7 in Visual Studio 2017 concomitantly renders the specific methods described above entirely obsolete. In short, the new ref locals feature in the language permits you to declare your own managed pointer as a local variable, and use it to consolidate the single array dereferencing operation. So given for example the test structure from above...

public struct rec { public int a, b, c, d, e, f; }
static rec[] s7 = new rec[100000];

...here is how the same test function from above can now be written:

static void test_7(int ix)
{
    ref rec e = ref s7[ix];         // <---  C#7 ref local
    e.a = 4;  e.e = e.a; e.b = e.d; e.f = e.d; e.b = e.e; e.a = e.c;
    e.b = 5;  e.d = e.f; e.c = e.b; e.e = e.a; e.b = e.d; e.f = e.d;
    e.c = 6;  e.b = e.e; e.a = e.c; e.d = e.f; e.c = e.b; e.e = e.a;
    e.d = 7;  e.b = e.d; e.f = e.d; e.b = e.e; e.a = e.c; e.d = e.f;
    e.e = 8;  e.c = e.b; e.e = e.a; e.b = e.d; e.f = e.d; e.b = e.e;
    e.f = 9;  e.a = e.c; e.d = e.f; e.c = e.b; e.e = e.a; e.b = e.d;
    e.a = 10; e.f = e.d; e.b = e.e; e.a = e.c; e.d = e.f; e.c = e.b;
}

Notice how this completely eliminates the need for kludges such as those I discussed above. The sleeker use of a managed pointer avoids the unnecessary function call that was used in "the winner," the best-performing methodology of those I reviewed. Therefore, the performance with the new feature can only be better than the winner of methods compared above.

Ironically enough, C# 7 also adds local functions, a feature which would directly solve the complaint about poor encapsulation I raised for two of the aforementioned hacks. Happily enough the whole enterprise of proliferating dedicated functions just for the purpose of gaining access to managed pointers is now completely moot.