Python langdetect: choose between one language or the other only

vandernath picture vandernath · May 15, 2016 · Viewed 8.8k times · Source

I'm using langdetect to determine the language of a set of strings which I know are either in English or French.

Sometimes, langdetect tells me the language is Romanian for a string I know is in French.

How can I make langdetect choose between English or French only, and not all other languages?

Thanks!

Answer

Philip Bergström picture Philip Bergström · Aug 24, 2018

Option 1

One option would be using the package langid instead. Then you can simply restrict the languages with a method call:

import langid
langid.set_languages(['fr', 'en'])  # ISO 639-1 codes
lang, score = langid.classify('This is a french or english text')
print(lang) # en

Option 2

If you really want to use the langdetect package, you can copy the package folder (if you're not sure where it is, use python -m site --user-site) and remove the profiles you don't need from the folder langdetect\profiles.

This is not a very dynamic solution though.