Radix Sort on an Array of Strings?

Question 1

Radix Sort on an Array of Strings?

c++ arrays sorting radix-sort

Kavix0 · Apr 13, 2014 · Viewed 9.4k times · Source

Answer

Answer

The slides you've found are great! But where did those queues come from in your code?

Anyway, here you are (live example):

template <typename E>
size_t bin(const E& elem, size_t digit)
{
    return elem.size() > digit ? size_t(elem[digit]) + 1 : 0;
}

template <size_t R, typename C, typename P>
void radix_sort(P& pos, const C& data, size_t digit)
{
    using A = std::array<size_t, R + 1>;
    A count = {};
    P prev(pos);

    for (auto i : prev)
        ++count[bin(data[i], digit)];

    A done = {}, offset = {{0}};
    std::partial_sum(count.begin(), count.end() - 1, offset.begin() + 1);

    for (auto i : prev)
    {
        size_t b = bin(data[i], digit);
        pos[offset[b] + done[b]++] = i;
    }
}

struct shorter
{
    template <typename A>
    bool operator()(const A& a, const A& b) { return a.size() < b.size(); }
};

template <size_t R, typename C>
std::vector<size_t> radix_sort(const C& data)
{
    std::vector<size_t> pos(data.size());
    std::iota(pos.begin(), pos.end(), 0);

    size_t width = std::max_element(data.begin(), data.end(), shorter())->size();

    for (long digit = long(width) - 1; digit >= 0; --digit)
        radix_sort<R>(pos, data, size_t(digit));

    return pos;
}

which you can use like that

int main()
{
    std::vector<std::string> data = generate();
    std::vector<size_t> pos = radix_sort<128>(data);
    for (auto i : pos)
        std::cout << data[i] << std::endl;
}

where generate() is included in the live example and generates the strings given in your question.

I am not trying to explain how this works here, I assume you can figure out since you are working on the problem. But a few comments are in order.

We are neither sorting the input sequence in-place, nor returning a sorted copy; we are just returning a sequence of positions of input elements in the sorted sequence.
We are processing strings from right to left.
The complexity is O(lw) where l is the input length (number of input strings) and w is the maximum input width (max. length of all input strings). So this algorithm makes sense if the string width does not vary too much.
The first template parameter R of radix_sort() is the number of possible values for each digit (letter) in the input. E.g. R = 128 means that possible values are 0..127. This should be fine for your input. I haven't tried to do anything clever with respect to ASCII codes, but you can customize function bin() for that.
In the output of bin(), value 0 is reserved to mean "we are past the end of this string". Such strings are placed before others that are still continuing.
I have tried to give self-explanatory names to variables and functions, and use standard library calls for common tasks where possible.
The code is generic, e.g. it can sort any random access container containing random access containers, not just vectors of strings.
I am using C++11 features here and there for convenience, but nothing is really necessary: one could easily do the same just with C++03.

Question 2

I've been researching around, and while I've figured out the general idea of using Radix Sort to alphabetize an array of strings, I know I'm going the wrong direction.

This is what I have so far:

void radixSort(string* sortMe, int l)
{
    queue<string>* sections = new queue<string>[27];    //Have a-z, and also one for strings that are null terminated.
    for(int i = 0; i < numElements; i++)
    {
        if(!(sortMe[i][l] == 32))
            sections[sortMe[i][l]-96].push(sortMe[i]);      //-96 because the ascii code for a is 97. If, for example a is the character, it will be placed at 1. 0 is left for null characters
    }

    for(int i =0; i < 26; i++)
    {
        while(!sections[i].empty())
        {
            temp.push_back(sections[i].front());
            sections[i].pop();
        }
    }
}

What I have so far sorts all the strings by the first character, and I know that I then have to go through and make subarrays of the remaining characters and sort them, but how can I implement it efficiently? The strings are of variable size and can include spaces, for example:

subdivides
main street
pants
impaled decolonizing
argillaceous
axial satisfactoriness
temperamental
hypersensitiveness
bears
hairbreadths
creams surges
unlaboured
hoosier
buggiest
mauritanians
emanators
acclaiming
zouaves dishpan
traipse
solarisms
remunerativeness
solubilizing
chiseled
jugular
ooziness
toastier
baud
suffixed
powerless tiding
disassimilated
gasps
flirtier
uh

This is something I found that seems like it will be of use: http://algs4.cs.princeton.edu/lectures/51DemoKeyIndexedCounting.pdf

Radix Sort on an Array of Strings?

Answer

Related questions