Using a hashtable inside a Parallel.ForEach?

Vin picture Vin · Nov 1, 2009 · Viewed 8.1k times · Source

I have a Parallel.ForEach loop running an intensive operation inside the body.

The operation can use a Hashtable to store the values, and can be reused for other consecutive loop items. I add to the Hashtable after the intensive operation is complete, the next loop item can look up in the Hashtable and reuse the object, instead of running the intensive operation again.

However, because I am using Parallel.ForEach there is a unsafe issue, causing the Hashtable.Add and the ContainsKey(key) calls go out of sync, as they might be running in parallel. Introducing locks may cause perf issues.

Here's the sample code:

Hashtable myTable = new Hashtable;
Parallel.ForEach(items, (item, loopState) =>
{
    // If exists in myTable use it, else add to hashtable
    if(myTable.ContainsKey(item.Key))
    {
       myObj = myTable[item.Key];
    }
    else
    {
       myObj = SomeIntensiveOperation();
       myTable.Add(item.Key, myObj); // Issue is here : breaks with exc during runtime
    }
    // Do something with myObj
    // some code here
}

There must be some API, Property setting inside TPL library, that could handle this scenario. Is there?

Answer

Sam Harwell picture Sam Harwell · Nov 1, 2009

You're looking for System.Collections.Concurrent.ConcurrentDictionary<TKey, TValue>. The new concurrent collections use significantly improved locking mechanisms and should perform excellectly in parallel algorithms.

Edit: The result might look like this:

ConcurrentDictionary<T,K> cache = ...;
Parallel.ForEach(items, (item, loopState) =>
{
    K value;
    if (!cache.TryGetValue(item.Key, out value))
    {
        value = SomeIntensiveOperation();
        cache.TryAdd(item.Key, value);
    }

    // Do something with value
} );

Word of warning: if the elements in items do not all have unique item.Key, then SomeIntensiveOperation could get called twice for that key. In the example, the key isn't passed to SomeIntensiveOperation, but it means that the "Do something with value" code could execute key/valueA and key/valueB pairs, and only one result would get stored in the cache (not necessarily the first one computed by SomeIntensiveOperation either). You'd need a parallel lazy factory to handle this if it's a problem. Also, for obvious reasons SomeIntensiveOperation should be thread safe.