Algorithm: efficient way to remove duplicate integers from an array

ejel picture ejel · Oct 7, 2009 · Viewed 199.9k times · Source

I got this problem from an interview with Microsoft.

Given an array of random integers, write an algorithm in C that removes duplicated numbers and return the unique numbers in the original array.

E.g Input: {4, 8, 4, 1, 1, 2, 9} Output: {4, 8, 1, 2, 9, ?, ?}

One caveat is that the expected algorithm should not required the array to be sorted first. And when an element has been removed, the following elements must be shifted forward as well. Anyway, value of elements at the tail of the array where elements were shifted forward are negligible.

Update: The result must be returned in the original array and helper data structure (e.g. hashtable) should not be used. However, I guess order preservation is not necessary.

Update2: For those who wonder why these impractical constraints, this was an interview question and all these constraints are discussed during the thinking process to see how I can come up with different ideas.

Answer

ejel picture ejel · Oct 7, 2009

A solution suggested by my girlfriend is a variation of merge sort. The only modification is that during the merge step, just disregard duplicated values. This solution would be as well O(n log n). In this approach, the sorting/duplication removal are combined together. However, I'm not sure if that makes any difference, though.