NLTK Named Entity Recognition with Custom Data

user1502248 picture user1502248 · Jul 4, 2012 · Viewed 25.8k times · Source

I'm trying to extract named entities from my text using NLTK. I find that NLTK NER is not very accurate for my purpose and I want to add some more tags of my own as well. I've been trying to find a way to train my own NER, but I don't seem to be able to find the right resources. I have a couple of questions regarding NLTK-

  1. Can I use my own data to train an Named Entity Recognizer in NLTK?
  2. If I can train using my own data, is the named_entity.py the file to be modified?
  3. Does the input file format have to be in IOB eg. Eric NNP B-PERSON ?
  4. Are there any resources - apart from the nltk cookbook and nlp with python that I can use?

I would really appreciate help in this regard

Answer

jjdubs picture jjdubs · Jul 9, 2012

Are you committed to using NLTK/Python? I ran into the same problems as you, and had much better results using Stanford's named-entity recognizer: http://nlp.stanford.edu/software/CRF-NER.shtml. The process for training the classifier using your own data is very well-documented in the FAQ.

If you really need to use NLTK, I'd hit up the mailing list for some advice from other users: http://groups.google.com/group/nltk-users.

Hope this helps!