Change nltk.download() path directory from default ~/ntlk_data

shenglih picture shenglih · Jul 1, 2017 · Viewed 20.1k times · Source

I was trying to download/update python nltk packages on a computing server and it returned this [Errno 122] Disk quota exceeded: error.

Specifically:

[nltk_data] Downloading package stop words to /home/sh2264/nltk_data...
[nltk_data] Error downloading u'stopwords' from
[nltk_data] <https://raw.githubusercontent.com/nltk/nltk_data/gh-
[nltk_data] pages/packages/corpora/stopwords.zip>: [Errno 122]
[nltk_data] Disk quota exceeded:
[nltk_data] u'/home/sh2264/nltk_data/corpora/stopwords.zip
False

How could I change the entire path for nltk packages, and what other changes should I make to ensure errorless loading of nltk?

Answer

Ortomala Lokni picture Ortomala Lokni · Jul 6, 2017

According to the documentation:

By default, packages are installed in either a system-wide directory (if Python has sufficient access to write to it); or in the current user’s home directory. However, the download_dir argument may be used to specify a different installation target, if desired.

To specify the download directory, use for example:

nltk.download('treebank', download_dir='/mnt/data/treebank')