Java array sort UTF-8

Minutis picture Minutis · Feb 13, 2012 · Viewed 7.4k times · Source

I want to sort an ArrayList<String> but the problem is my native language characters - my alphabet is like this: a, ą, b, c, č, d, e, f ... z, ž. As you see z character is second from the end and ą is second in alphabet, so after I sort my array it is sorted incorrectly. All my native language characters are moved to the end of array. Example:

package lt;

import java.util.ArrayList;
import java.util.Collections;

public class test {
    public static void main(String[] args) {
        List<String> items = new ArrayList<>();
        items.add("bbc");
        items.add("ąbc");
        items.add("abc");
        items.add("zzz");

        System.out.println("Unsorted: ");
        for(String str : items) {
            System.out.println(str);
        }

        Collections.sort(items);
        System.out.println();

        System.out.println("Sorted: ");
        for(String str : items) {
            System.out.println(str);
        }
    }
}

Output:

Unsorted: 
bbc
ąbc
abc
zzz

Sorted: 
abc
bbc
zzz
ąbc

Should be:

Sorted:
abc
ąbc
bbc
zzz

Answer

Vic picture Vic · Feb 13, 2012

You should use Collator class.

For example

Locale lithuanian = new Locale("lt_LT");
Collator lithuanianCollator = Collator.getInstance(lithuanian);

And then sort the collection using this collator

Collections.sort(theList, lithuanianCollator);