Removing diacritics in Polish

empi picture empi · Aug 24, 2010 · Viewed 9.4k times · Source

I'm trying to remove diacritic characters from a pangram in Polish. I'm using code from Michael Kaplan's blog http://www.siao2.com/2007/05/14/2629747.aspx, however, with no success.

Consider following pangram: "Pchnąć w tę łódź jeża lub ośm skrzyń fig.". Everything works fine but for letter "ł", I still get "ł". I guess the problem is that "ł" is represented as single unicode character and there is no following NonSpacingMark.

Do you have any idea how I can fix it (without relying on custom mapping in some dictionary - I'm looking for some kind of unicode conversion)?

Answer

sinnerinc picture sinnerinc · Feb 16, 2015

Some time ago I've come across this solution, which seems to work fine:

    public static string RemoveDiacritics(this string s)
    {
        string asciiEquivalents = Encoding.ASCII.GetString(
                     Encoding.GetEncoding("Cyrillic").GetBytes(s)
                 );

        return asciiEquivalents;
    }