What is the quickest (and least resource intensive) to compare two massive (>50.000 items) and as a result have two lists like the ones below:
Currently I'm working with the List or IReadOnlyCollection and solve this issue in a linq query:
var list1 = list.Where(i => !list2.Contains(i)).ToList();
var list2 = list2.Where(i => !list.Contains(i)).ToList();
But this doesn't perform as good as i would like. Any idea of making this quicker and less resource intensive as i need to process a lot of lists?
Use Except
:
var firstNotSecond = list1.Except(list2).ToList();
var secondNotFirst = list2.Except(list1).ToList();
I suspect there are approaches which would actually be marginally faster than this, but even this will be vastly faster than your O(N * M) approach.
If you want to combine these, you could create a method with the above and then a return statement:
return !firstNotSecond.Any() && !secondNotFirst.Any();
One point to note is that there is a difference in results between the original code in the question and the solution here: any duplicate elements which are only in one list will only be reported once with my code, whereas they'd be reported as many times as they occur in the original code.
For example, with lists of [1, 2, 2, 2, 3]
and [1]
, the "elements in list1 but not list2" result in the original code would be [2, 2, 2, 3]
. With my code it would just be [2, 3]
. In many cases that won't be an issue, but it's worth being aware of.