How to detect language of user entered text?

ManBugra picture ManBugra · Jul 12, 2010 · Viewed 49.8k times · Source

I am dealing with an application that is accepting user input in different languages (currently 3 languages fixed). The requirement is that users can enter text and dont bother to select the language via a provided checkbox in the UI.

Is there an existing Java library to detect the language of a text?

I want something like this:

text = "To be or not to be thats the question."

// returns ISO 639 Alpha-2 code
language = detect(text);

print(language);

result:

EN

I dont want to know how to create a language detector by myself (i have seen plenty of blogs trying to do that). The library should provide a simple APi and also work completely offline. Open-source or commercial closed doesn't matter.

i also found this questions on SO (and a few more):

How to detect language
How to detect language of text?

Answer

yvespeirsman picture yvespeirsman · May 15, 2015

This Language Detection Library for Java should give more than 99% accuracy for 53 languages.

Alternatively, there is Apache Tika, a library for content analysis that offers much more than just language detection.