Corpora/stopwords not found when import nltk library

Frits Verstraten picture Frits Verstraten · Jan 12, 2017 · Viewed 84.1k times · Source

I trying to import the nltk package in python 2.7

  import nltk
  stopwords = nltk.corpus.stopwords.words('english')
  print(stopwords[:10])

Running this gives me the following error:

LookupError: 
**********************************************************************
Resource 'corpora/stopwords' not found.  Please use the NLTK
Downloader to obtain the resource:  >>> nltk.download()

So therefore I open my python termin and did the following:

import nltk  
nltk.download()

Which gives me:

showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml

However this does not seem to stop. And running it again still gives me the same error. Any thoughts where this goes wrong?

Answer

Kurt Bourbaki picture Kurt Bourbaki · Jan 13, 2017

You are currently trying to download every item in nltk data, so this can take long. You can try downloading only the stopwords that you need:

import nltk
nltk.download('stopwords')

Or from command line (thanks to Rafael Valero's answer):

python -m nltk.downloader stopwords

Reference: