Is there any way to sort strings in all languages?

emeraldhieu picture emeraldhieu · Oct 3, 2011 · Viewed 10.3k times · Source

I have this code. It sorts correctly in French and Russian. I used Locale.US and it seems to be right. Is this solution do right with all languages out there? Does it work with other languages? For example: Chinese, Korean, Japanese... If not, what is the better solution?

public class CollationTest {
    public static void main(final String[] args) {
        final Collator collator = Collator.getInstance(Locale.US);
        final SortedSet<String> set = new TreeSet<String>(collator);

        set.add("abîmer");
        set.add("abîmé");
        set.add("aberrer");
        set.add("abhorrer");
        set.add("aberrance");
        set.add("abécédaire");
        set.add("abducteur");
        set.add("abdomen");

        set.add("государственно-монополистический");
        set.add("гостить");
        set.add("гостевой");
        set.add("гостеприимный");
        set.add("госпожа");
        set.add("госплан");
        set.add("господи");
        set.add("господа");

        for(final String s : set) {
            System.out.println(s);
        }
    }
}

Update: Sorry, I don't require this set must contain all languages in order. I mean this set contain one language and sort correctly in every languages.

public class CollationTest {
    public static void main(final String[] args) {
        final Collator collator = Collator.getInstance(Locale.US);
        final SortedSet<String> set = new TreeSet<String>(collator);

        // Sorting in French.
        set.clear();
        set.add("abîmer");
        set.add("abîmé");
        set.add("aberrer");
        set.add("abhorrer");
        set.add("aberrance");
        set.add("abécédaire");
        set.add("abducteur");
        set.add("abdomen");
        for(final String s : set) {
            System.out.println(s);
        }

        // Sorting in Russian.
        set.clear();
        set.add("государственно-монополистический");
        set.add("гостить");
        set.add("гостевой");
        set.add("гостеприимный");
        set.add("госпожа");
        set.add("госплан");
        set.add("господи");
        set.add("господа");
        for(final String s : set) {
            System.out.println(s);
        }
    }
}

Answer

Cemo picture Cemo · Oct 3, 2011

Because of every language has its own alphabetic order you can not. For example,

Russian language as you stated has с letter has a different order than Turkish language.

You should always use collator. What I can suggest you is to us Collection API.

    //
    // Define a collator for German language
    //
    Collator collator = Collator.getInstance(Locale.GERMAN);

    //
    // Sort the list using Collator
    //
    Collections.sort(words, collator);

For futher information check and as stated here

This program shows what can happen when you sort the same list of words with two different collators:

Collator fr_FRCollator = Collator.getInstance(new Locale("fr","FR"));

Collator en_USCollator = Collator.getInstance(new Locale("en","US"));

The method for sorting, called sortStrings, can be used with any Collator. Notice that the sortStrings method invokes the compare method:

 public static void sortStrings(Collator collator, 
                           String[] words) {
  String tmp;
     for (int i = 0; i < words.length; i++) {
        for (int j = i + 1; j < words.length; j++) { 
           if (collator.compare(words[i], words[j]) > 0) {
              tmp = words[i];
              words[i] = words[j];
              words[j] = tmp;
           }
         }
      }
 }

The English Collator sorts the words as follows:

peach péché pêche sin

According to the collation rules of the French language, the preceding list is in the wrong order. In French péché should follow pêche in a sorted list. The French Collator sorts the array of words correctly, as follows:

peach pêche péché sin