NLTK: why does nltk not recognize the CLASSPATH variable for stanford-ner?

chapman picture chapman · Sep 28, 2015 · Viewed 10.7k times · Source

This is my code

from nltk.tag import StanfordNERTagger
st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz')

And i get

NLTK was unable to find stanford-ner.jar! Set the CLASSPATH
  environment variable.

This is what my .bashrc looks like in ubuntu

export CLASSPATH=/home/wolfgang/Downloads/stanford-ner-2015-04-20/stanford-ner-3.5.2.jar
export STANFORD_MODELS=/home/wolfgang/Downloads/stanford-ner-2015-04-20/classifiers

Also, i tried printing the environmental variable in python this way

import os
os.environ.get('CLASSPATH')

And i recieve

'/home/wolfgang/Downloads/stanford-ner-2015-04-20/stanford-ner-3.5.2.jar'

Therefore the variables are being SET!

What is wrong then?

Why doe'snt nltk recognize my environmental variables?

Answer

wolfgang picture wolfgang · Sep 28, 2015

change the .jar file and the environmental variable from stanford-ner-3.5.2.jar to stanford-ner.jar

apparently NLTK has a name_pattern variable in nltk_internals.py which only accepts the CLASSPATH if it matches a regex of the value stanford-ner.jar