Pushing unique data into vector

user1155299 picture user1155299 · Apr 29, 2012 · Viewed 47.3k times · Source

I have the following data:

FolioA Name1 100
FolioA Name2 110
FolioA Name3 100
FolioB Name1 100
FolioB Name3 106
FolioC Name1 108
FolioC Name2 102
FolioC Name3 110

I want to only insert unique names(i.e. Name1, Name2 and Name3, each once) into

std::vector<std::string> name;

as I iterate through the data.

So, I have the following code where I have stored the data in a map called test:

std::map<std::string, std::map<std::string, double> >test;
std::map<std::string, std::map<std::string, double > >::iterator it1 = test.begin(), end1 = test.end();
    while (it1 !=end1) {
        std::map<std::string, double>::iterator it2 = it1->second.begin(), end2=it1->second.end();
        **name.push_back(it2->first);**
        ++it2;
    }
    ++it1;
}

But, currently by pushing the data into name the way I am has 3 instances of Name1, 2 of Name2, and 3 of Name3, which is expected from my code. How do I fix it to only have unique names.

Answer

juanchopanza picture juanchopanza · Apr 29, 2012

Since you want to keep the first instance for a given name, you will have to perform a name lookup at some point. A simple algorithm involving only your vector would be to can check if the the entry already exists using std::find

std::vector<std::string> name;

....
if (std::find(name.begin(), name.end(), someName) == name.end()) {
  // someName not in name, add it
  name.push_back(someName);
}

But here you are performing a search each time you want to insert an element, and this (by itself) is up to O(N) complexity, giving O(N*N) for the whole algorithm. So you could optimize by using an intermediary container with fast look up, such as an std::set as suggested by @Chad and which has O(logN) complexity for look-up, giving O(N*logN) overall, or a hash container such as C++11's std::unordered_set, which has close to constant time look-up, giving ~O(N) overall complexity.

#include <unordered_set>

std::unordered_set<std::string> name_set;
....

// still need to search, since you want to keep 
// the first instance of each name, and not the last.
// But unordered_set performs the look-up at insertion,
// only inserting if someName not already in the set
name_set.insert(someName);

and then, following @Chad's example,

std::vector<std::string> name(names_set.begin(), name_set.end());

If you don't have C++11, hash map alternatives are boost::hash_map and tr1::hash_map.