I'm trying to install NLTK for Python 3.4. The actual NLTK module appears to have installed fine. I then ran
import nltk
nltk.download()
and chose to download everything. However, after it was done, the window simply says 'out of date'. I tried refreshing and downloading, yet it stays 'out of date' as shown here:NLTK Window 1
I looked online and tried various fixes, but I haven't found any that helped my case yet.
I also tried to manually find the missing parts, which turned out to be 'Open Multilingual Wordnet' and 'Wordnet'. Here's how I found which parts were missing: Open Multilingual Wordnet.
What should I do? Should I uninstall and reinstall NLTK? I haven't really found a way to delete the packages (except for manually deleting it).
EDIT: Regarding Solution 2 and Solution 3: For more clarification on the Solution 2 issue:
If something has sucessfully downloaded, this is the output:
>>> nltk.download('subjectivity')
[nltk_data] Downloading package subjectivity to
[nltk_data] C:\Users\Shane\AppData\Roaming\nltk_data...
[nltk_data] Package subjectivity is already up-to-date!
True
However, for 'wordnet' and 'omw', this is what happens when I redownload:
>>> nltk.download('omw')
[nltk_data] Downloading package omw to
[nltk_data] C:\Users\Shane\AppData\Roaming\nltk_data...
[nltk_data] Unzipping corpora\omw.zip.
True
In short:
Don't use the GUI, add all packages within the python interpreter.
$ python3
>>> import nltk
>>> nltk.download('all')
In long:
It might be because of the recent addition of Open Multilingual WordNet
and something is not working right with the NLTK download GUI interface and the indices.
Solution 1:
Simply use the nltk.download()
GUI and download the two packages without selecting all. (May not work but worth the try)
Solution 2:
Install the package individually through the python interpreter:
>>> import nltk
>>> nltk.download('wordnet')
>>> nltk.download('omw') # Open Multilingual WordNet
Solution 3:
Let the nltk.download('all')
check through all packages in its index and download them if they're not available.
>>> import nltk
>>> nltk.downlad('all')
Note: If any files was corrupted possibly due to broken internet connection, simply find the directory where NLTK data is stored and then proceed with solution 3.
To find where nltk_data
is stored, nltk.data.path
stores the possible locations:
>>> import nltk
>>> nltk.data.path
['/home/alvas/nltk_data', '/usr/share/nltk_data', '/usr/local/share/nltk_data', '/usr/lib/nltk_data', '/usr/local/lib/nltk_data']
Since the point of the data download is to use them, to know that you're not missing the components you need, and if that's wordnet
and omw
, you can try this:
>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('bank')[0]
Synset('bank.n.01')
>>> wn.synsets('bank')[0].lemma_names('spa')
['margen', 'orilla', 'vera']
>>> wn.synsets('bank')[0].lemma_names('fre')
['rive', 'banque']
Don't worry so much as in what is shown on the GUI. Once nltk.download('all')
is completed without errors, it means you have all the corpora and models that NLTK supports.
But as a good practice, please raise an issue in https://github.com/nltk/nltk_data/issues so that the developers can check if the problem can be replicated. Show some more printscreen of the error. before and after the proposed solutions too =)