How to transliterate Cyrillic to Latin text

ckknight picture ckknight · Dec 3, 2009 · Viewed 37.1k times · Source

I have a method which turns any Latin text (e.g. English, French, German, Polish) into its slug form,

e.g. Alpha Bravo Charlie => alpha-bravo-charlie

But it can't work for Cyrillic text (e.g. Russian), so what I'm wanting to do is transliterate the Cyrillic text to Latin characters, then slugify that.

Does anyone have a way to do such transliteration? Whether by actual source or a library.

I'm coding in C#, so a .NET library will work. Alternatively, if you have non-C# code, I'm sure I could convert it.

Answer

Dima Stefantsov picture Dima Stefantsov · Jun 18, 2012

You can use .NET open source dll library UnidecodeSharpFork to transliterate Cyrillic and many more languages to Latin.

Example usage:

Assert.AreEqual("Rabota s kirillitsey", "Работа с кириллицей".Unidecode());
Assert.AreEqual("CZSczs", "ČŽŠčžš".Unidecode());
Assert.AreEqual("Hello, World!", "Hello, World!".Unidecode());

Testing Cyrillic:

/// <summary>
/// According to http://en.wikipedia.org/wiki/Romanization_of_Russian BGN/PCGN.
/// http://en.wikipedia.org/wiki/BGN/PCGN_romanization_of_Russian
/// With converting "ё" to "yo".
/// </summary>
[TestMethod]
public void RussianAlphabetTest()
{
    string russianAlphabetLowercase = "а б в г д е ё ж з и й к л м н о п р с т у ф х ц ч ш щ ъ ы ь э ю я";
    string russianAlphabetUppercase = "А Б В Г Д Е Ё Ж З И Й К Л М Н О П Р С Т У Ф Х Ц Ч Ш Щ Ъ Ы Ь Э Ю Я";

    string expectedLowercase = "a b v g d e yo zh z i y k l m n o p r s t u f kh ts ch sh shch \" y ' e yu ya";
    string expectedUppercase = "A B V G D E Yo Zh Z I Y K L M N O P R S T U F Kh Ts Ch Sh Shch \" Y ' E Yu Ya";

    Assert.AreEqual(expectedLowercase, russianAlphabetLowercase.Unidecode());
    Assert.AreEqual(expectedUppercase, russianAlphabetUppercase.Unidecode());
}

Simple, fast and powerful. And it's easy to extend/modify transliteration table if you want to.