Detect language of text

Nikhil picture Nikhil · Sep 23, 2009 · Viewed 33.6k times · Source

Is there any C# library which can detect the language of a particular piece of text? i.e. for an input text "This is a sentence", it should detect the language as "English". Or for "Esto es una sentencia" it should detect the language as "Spanish".

I understand that language detection from text is not a deterministic problem. But both Google Translate and Bing Translator have an "Auto detect" option, which best-guesses the input language. Is there something similar available publicly, preferably in C#?

Answer

Ivan Akcheurov picture Ivan Akcheurov · May 23, 2011

Yes indeed, TextCat is very good for language identification. And it has a lot of implementations in different languages.

There were no ports in .Net. So I have written one: NTextCat (NuGet, Online Demo).

It is pure .NET Standard 2.0 DLL + command line interface to it. By default, it uses a profile of 14 languages.

Any feedback is very appreciated! New ideas and feature requests are welcomed too :)