Fastest way to sort a list of number and their index

Vincent picture Vincent · Apr 23, 2012 · Viewed 23.6k times · Source

I have a question that could seem very basic, but it is in a context where "every CPU tick counts" (this is a part of a larger algorithm that will be used on supercomputers).

The problem is quite simple : what is the fastest way to sort a list of unsigned long long int numbers and their original indexes ? (At the beginning, the unsigned long long int numbers are in a completely random order.)

Example :
Before
Numbers: 32 91 11 72
Indexes: 0 1 2 3
After
Numbers: 11 32 72 91
Indexes: 2 0 3 1 

By "fastest way", I mean : what algorithm to use : std::sort, C qsort, or another sorting algorithm available on the web ? What container to use (C array, std::vector, std::map...) ? How to sort the indexes at the same time (use structures, std::pair, std::map...) ?

How many element to sort ? -> typically 4Go of numbers

Answer

Jerry Coffin picture Jerry Coffin · Apr 23, 2012

The obvious starting point would be a structure with operator< defined for it:

struct data { 
    unsigned long long int number;
    size_t index;
};

struct by_number { 
    bool operator()(data const &left, data const &right) { 
        return left.number < right.number;
    }
};

...and an std::vector to hold the data:

 std::vector<data> items;

and std::sort to do the sorting:

 std::sort(items.begin(), items.end(), by_number());

The simple fact is, that the normal containers (and such) are sufficiently efficient that using them doesn't make your code substantially less efficient. You might be able to do better by writing some part in a different way, but you might about as easily do worse. Start from solid and readable, and test -- don't (attempt to) optimize prematurely.

Edit: of course in C++11, you can use a lambda expression instead:

std::sort(items.begin(), items.end(), 
          [](data const &a, data const &b) { return a.number < b.number; });

This is generally a little more convenient to write. Readability depends--for something simple like this, I'd say sort ... by_number is pretty readable, but that depends (heavily) on the name you give to the comparison operator. The lambda makes the actual sorting criteria easier to find, so you don't need to choose a name carefully for the code to be readable.