Is there a better way for getting rid of accents and making those letters regular apart from using String.replaceAll()
method and replacing letters one by one?
Example:
Input: orčpžsíáýd
Output: orcpzsiayd
It doesn't need to include all letters with accents like the Russian alphabet or the Chinese one.
Use java.text.Normalizer
to handle this for you.
string = Normalizer.normalize(string, Normalizer.Form.NFD);
// or Normalizer.Form.NFKD for a more "compatable" deconstruction
This will separate all of the accent marks from the characters. Then, you just need to compare each character against being a letter and throw out the ones that aren't.
string = string.replaceAll("[^\\p{ASCII}]", "");
If your text is in unicode, you should use this instead:
string = string.replaceAll("\\p{M}", "");
For unicode, \\P{M}
matches the base glyph and \\p{M}
(lowercase) matches each accent.
Thanks to GarretWilson for the pointer and regular-expressions.info for the great unicode guide.