How to remove duplicates from unsorted std::vector while keeping the original ordering using algorithms?

WilliamKF picture WilliamKF · Aug 30, 2012 · Viewed 27.1k times · Source

I have an array of integers that I need to remove duplicates from while maintaining the order of the first occurrence of each integer. I can see doing it like this, but imagine there is a better way that makes use of STL algorithms better? The insertion is out of my control, so I cannot check for duplicates before inserting.

int unsortedRemoveDuplicates(std::vector<int> &numbers) {
    std::set<int> uniqueNumbers;
    std::vector<int>::iterator allItr = numbers.begin();
    std::vector<int>::iterator unique = allItr;
    std::vector<int>::iterator endItr = numbers.end();

    for (; allItr != endItr; ++allItr) {
        const bool isUnique = uniqueNumbers.insert(*allItr).second;

        if (isUnique) {
            *unique = *allItr;
            ++unique;
        }
    }

    const int duplicates = endItr - unique;

    numbers.erase(unique, endItr);
    return duplicates;
}

How can this be done using STL algorithms?

Answer

juanchopanza picture juanchopanza · Aug 30, 2012

Sounds like a job for std::copy_if. Define a predicate that keeps track of elements that already have been processed and return false if they have.

If you don't have C++11 support, you can use the clumsily named std::remove_copy_if and invert the logic.

This is an untested example:

template <typename T>
struct NotDuplicate {
  bool operator()(const T& element) {
    return s_.insert(element).second; // true if s_.insert(element);
  }
 private:
  std::set<T> s_;
};

Then

std::vector<int> uniqueNumbers;
NotDuplicate<int> pred;
std::copy_if(numbers.begin(), numbers.end(), 
             std::back_inserter(uniqueNumbers),
             std::ref(pred));

where an std::ref has been used to avoid potential problems with the algorithm internally copying what is a stateful functor, although std::copy_if does not place any requirements on side-effects of the functor being applied.