I was trying to determine the overhead of the header on a .NET array (in a 32-bit process) using this code:
long bytes1 = GC.GetTotalMemory(false);
object[] array = new object[10000];
for (int i = 0; i < 10000; i++)
array[i] = new int[1];
long bytes2 = GC.GetTotalMemory(false);
array[0] = null; // ensure no garbage collection before this point
Console.WriteLine(bytes2 - bytes1);
// Calculate array overhead in bytes by subtracting the size of
// the array elements (40000 for object[10000] and 4 for each
// array), and dividing by the number of arrays (10001)
Console.WriteLine("Array overhead: {0:0.000}",
((double)(bytes2 - bytes1) - 40000) / 10001 - 4);
Console.Write("Press any key to continue...");
Console.ReadKey();
The result was
204800
Array overhead: 12.478
In a 32-bit process, object[1] should be the same size as int[1], but in fact the overhead jumps by 3.28 bytes to
237568
Array overhead: 15.755
Anyone know why?
(By the way, if anyone's curious, the overhead for non-array objects, e.g. (object)i in the loop above, is about 8 bytes (8.384). I heard it's 16 bytes in 64-bit processes.)
Here's a slightly neater (IMO) short but complete program to demonstrate the same thing:
using System;
class Test
{
const int Size = 100000;
static void Main()
{
object[] array = new object[Size];
long initialMemory = GC.GetTotalMemory(true);
for (int i = 0; i < Size; i++)
{
array[i] = new string[0];
}
long finalMemory = GC.GetTotalMemory(true);
GC.KeepAlive(array);
long total = finalMemory - initialMemory;
Console.WriteLine("Size of each element: {0:0.000} bytes",
((double)total) / Size);
}
}
But I get the same results - the overhead for any reference type array is 16 bytes, whereas the overhead for any value type array is 12 bytes. I'm still trying to work out why that is, with the help of the CLI spec. Don't forget that reference type arrays are covariant, which may be relevant...
EDIT: With the help of cordbg, I can confirm Brian's answer - the type pointer of a reference-type array is the same regardless of the actual element type. Presumably there's some funkiness in object.GetType()
(which is non-virtual, remember) to account for this.
So, with code of:
object[] x = new object[1];
string[] y = new string[1];
int[] z = new int[1];
z[0] = 0x12345678;
lock(z) {}
We end up with something like the following:
Variables:
x=(0x1f228c8) <System.Object[]>
y=(0x1f228dc) <System.String[]>
z=(0x1f228f0) <System.Int32[]>
Memory:
0x1f228c4: 00000000 003284dc 00000001 00326d54 00000000 // Data for x
0x1f228d8: 00000000 003284dc 00000001 00329134 00000000 // Data for y
0x1f228ec: 00000000 00d443fc 00000001 12345678 // Data for z
Note that I've dumped the memory 1 word before the value of the variable itself.
For x
and y
, the values are:
For z
, the values are:
Different value type arrays (byte[], int[] etc) end up with different type pointers, whereas all reference type arrays use the same type pointer, but have a different element type pointer. The element type pointer is the same value as you'd find as the type pointer for an object of that type. So if we looked at a string object's memory in the above run, it would have a type pointer of 0x00329134.
The word before the type pointer certainly has something to do with either the monitor or the hash code: calling GetHashCode()
populates that bit of memory, and I believe the default object.GetHashCode()
obtains a sync block to ensure hash code uniqueness for the lifetime of the object. However, just doing lock(x){}
didn't do anything, which surprised me...
All of this is only valid for "vector" types, by the way - in the CLR, a "vector" type is a single-dimensional array with a lower-bound of 0. Other arrays will have a different layout - for one thing, they'd need the lower bound stored...
So far this has been experimentation, but here's the guesswork - the reason for the system being implemented the way it has. From here on, I really am just guessing.
object[]
arrays can share the same JIT code. They're going to behave the same way in terms of memory allocation, array access, Length
property and (importantly) the layout of references for the GC. Compare that with value type arrays, where different value types may have different GC "footprints" (e.g. one might have a byte and then a reference, others will have no references at all, etc).Every time you assign a value within an object[]
the runtime needs to check that it's valid. It needs to check that the type of the object whose reference you're using for the new element value is compatible with the element type of the array. For instance:
object[] x = new object[1];
object[] y = new string[1];
x[0] = new object(); // Valid
y[0] = new object(); // Invalid - will throw an exception
This is the covariance I mentioned earlier. Now given that this is going to happen for every single assignment, it makes sense to reduce the number of indirections. In particular, I suspect you don't really want to blow the cache by having to go to the type object for each assigment to get the element type. I suspect (and my x86 assembly isn't good enough to verify this) that the test is something like:
If we can terminate the search in the first three steps, there's not a lot of indirection - which is good for something that's going to happen as often as array assignments. None of this needs to happen for value type assignments, because that's statically verifiable.
So, that's why I believe reference type arrays are slightly bigger than value type arrays.
Great question - really interesting to delve into it :)