NLTK available languages for stopwords

gal007 picture gal007 · Feb 7, 2019 · Viewed 16.2k times · Source

I'm wondering where I can find the full list of supported langs (and their keys) for the NLTK stopwords.

I find a list in https://pypi.org/project/stop-words/ but it does not contain the keys for each country. So, it is not clear if you can retrieve the list by simply stopwords.words("Bulgarian"). In fact, that will throw an error.

I checked in the NLTK site and there are 4 documents matching "stopwords" but none of them describes that. https://www.nltk.org/search.html?q=stopwords&check_keywords=yes&area=default

And nothing is sayd in their book: http://www.nltk.org/book/ch02.html#stopwords_index_term

So, do you know where can I find the list of keys?

Answer

Grad student at NU picture Grad student at NU · Sep 25, 2019
os.listdir('/root/nltk_data/corpora/stopwords/')

['hungarian',
 'swedish',
 'kazakh',
 'norwegian',
 'finnish',
 'arabic',
 'indonesian',
 'portuguese',
 'turkish',
 'azerbaijani',
 'slovene',
 'spanish',
 'danish',
 'nepali',
 'romanian',
 'greek',
 'dutch',
 'README',
 'tajik',
 'german',
 'english',
 'russian',
 'french',
 'italian']