Large Arrays, and LOH Fragmentation. What is the accepted convention?

Gorchestopher H picture Gorchestopher H · Mar 8, 2010 · Viewed 8.8k times · Source

I have an other active question HERE regarding some hopeless memory issues that possibly involve LOH Fragmentation among possibly other unknowns.

What my question now is, what is the accepted way of doing things? If my app needs to be done in Visual C#, and needs to deal with large arrays to the tune of int[4000000], how can I not be doomed by the garbage collector's refusal to deal with the LOH?

It would seem that I am forced to make any large arrays global, and never use the word "new" around any of them. So, I'm left with ungraceful global arrays with "maxindex" variables instead of neatly sized arrays that get passed around by functions.

I've always been told that this was bad practice. What alternative is there?

Is there some kind of function to the tune of System.GC.CollectLOH("Seriously") ? Are there possibly some way to outsource garbage collection to something other than System.GC?

Anyway, what are the generally accepted rules for dealing with large (>85Kb) variables?

Answer

Paul Ruane picture Paul Ruane · Mar 8, 2010

Firstly, the garbage collector does collect the LOH, so do not be immediately scared by its prescence. The LOH gets collected when generation 2 gets collected.

The difference is that the LOH does not get compacted, which means that if you have an object in there that has a long lifetime then you will effectively be splitting the LOH into two sections — the area before and the area after this object. If this behaviour continues to happen then you could end up with the situation where the space between long-lived objects is not sufficiently large for subsequent assignments and .NET has to allocate more and more memory in order to place your large objects, i.e. the LOH gets fragmented.

Now, having said that, the LOH can shrink in size if the area at its end is completely free of live objects, so the only problem is if you leave objects in there for a long time (e.g. the duration of the application).

Starting from .NET 4.5.1, LOH could be compacted, see GCSettings.LargeObjectHeapCompactionMode property.

Strategies to avoid LOH fragmentation are:

  • Avoid creating large objects that hang around. Basically this just means large arrays, or objects which wrap large arrays (such as the MemoryStream which wraps a byte array), as nothing else is that big (components of complex objects are stored separately on the heap so are rarely very big). Also watch out for large dictionaries and lists as these use an array internally.
  • Watch out for double arrays — the threshold for these going into the LOH is much, much smaller — I can't remember the exact figure but its only a few thousand.
  • If you need a MemoryStream, considering making a chunked version that backs onto a number of smaller arrays rather than one huge array. You could also make custom version of the IList and IDictionary which using chunking to avoid stuff ending up in the LOH in the first place.
  • Avoid very long Remoting calls, as Remoting makes heavy use of MemoryStreams which can fragment the LOH during the length of the call.
  • Watch out for string interning — for some reason these are stored as pages on the LOH and can cause serious fragmentation if your application continues to encounter new strings to intern, i.e. avoid using string.Intern unless the set of strings is known to be finite and the full set is encountered early on in the application's life. (See my earlier question.)
  • Use Son of Strike to see what exactly is using the LOH memory. Again see this question for details on how to do this.
  • Consider pooling large arrays.

Edit: the LOH threshold for double arrays appears to be 8k.