Quickest way to compare two generic lists for differences

Frank picture Frank · Oct 9, 2012 · Viewed 345.7k times · Source

What is the quickest (and least resource intensive) to compare two massive (>50.000 items) and as a result have two lists like the ones below:

  1. items that show up in the first list but not in the second
  2. items that show up in the second list but not in the first

Currently I'm working with the List or IReadOnlyCollection and solve this issue in a linq query:

var list1 = list.Where(i => !list2.Contains(i)).ToList();
var list2 = list2.Where(i => !list.Contains(i)).ToList();

But this doesn't perform as good as i would like. Any idea of making this quicker and less resource intensive as i need to process a lot of lists?

Answer

Jon Skeet picture Jon Skeet · Oct 9, 2012

Use Except:

var firstNotSecond = list1.Except(list2).ToList();
var secondNotFirst = list2.Except(list1).ToList();

I suspect there are approaches which would actually be marginally faster than this, but even this will be vastly faster than your O(N * M) approach.

If you want to combine these, you could create a method with the above and then a return statement:

return !firstNotSecond.Any() && !secondNotFirst.Any();

One point to note is that there is a difference in results between the original code in the question and the solution here: any duplicate elements which are only in one list will only be reported once with my code, whereas they'd be reported as many times as they occur in the original code.

For example, with lists of [1, 2, 2, 2, 3] and [1], the "elements in list1 but not list2" result in the original code would be [2, 2, 2, 3]. With my code it would just be [2, 3]. In many cases that won't be an issue, but it's worth being aware of.