I have a method which turns any Latin text (e.g. English, French, German, Polish) into its slug form,
e.g. Alpha Bravo Charlie
=> alpha-bravo-charlie
But it can't work for Cyrillic text (e.g. Russian), so what I'm wanting to do is transliterate the Cyrillic text to Latin characters, then slugify that.
Does anyone have a way to do such transliteration? Whether by actual source or a library.
I'm coding in C#, so a .NET library will work. Alternatively, if you have non-C# code, I'm sure I could convert it.
You can use .NET open source dll library UnidecodeSharpFork to transliterate Cyrillic and many more languages to Latin.
Example usage:
Assert.AreEqual("Rabota s kirillitsey", "Работа с кириллицей".Unidecode());
Assert.AreEqual("CZSczs", "ČŽŠčžš".Unidecode());
Assert.AreEqual("Hello, World!", "Hello, World!".Unidecode());
Testing Cyrillic:
/// <summary>
/// According to http://en.wikipedia.org/wiki/Romanization_of_Russian BGN/PCGN.
/// http://en.wikipedia.org/wiki/BGN/PCGN_romanization_of_Russian
/// With converting "ё" to "yo".
/// </summary>
[TestMethod]
public void RussianAlphabetTest()
{
string russianAlphabetLowercase = "а б в г д е ё ж з и й к л м н о п р с т у ф х ц ч ш щ ъ ы ь э ю я";
string russianAlphabetUppercase = "А Б В Г Д Е Ё Ж З И Й К Л М Н О П Р С Т У Ф Х Ц Ч Ш Щ Ъ Ы Ь Э Ю Я";
string expectedLowercase = "a b v g d e yo zh z i y k l m n o p r s t u f kh ts ch sh shch \" y ' e yu ya";
string expectedUppercase = "A B V G D E Yo Zh Z I Y K L M N O P R S T U F Kh Ts Ch Sh Shch \" Y ' E Yu Ya";
Assert.AreEqual(expectedLowercase, russianAlphabetLowercase.Unidecode());
Assert.AreEqual(expectedUppercase, russianAlphabetUppercase.Unidecode());
}
Simple, fast and powerful. And it's easy to extend/modify transliteration table if you want to.