I'm using langdetect
to determine the language of a set of strings which I know are either in English or French.
Sometimes, langdetect
tells me the language is Romanian for a string I know is in French.
How can I make langdetect
choose between English or French only, and not all other languages?
Thanks!
Option 1
One option would be using the package langid
instead. Then you can simply restrict the languages with a method call:
import langid
langid.set_languages(['fr', 'en']) # ISO 639-1 codes
lang, score = langid.classify('This is a french or english text')
print(lang) # en
Option 2
If you really want to use the langdetect
package, you can copy the package folder (if you're not sure where it is, use python -m site --user-site)
and remove the profiles you don't need from the folder langdetect\profiles
.
This is not a very dynamic solution though.