Is it possible to train Stanford NER system to recognize more named entities types?

JudyJiang picture JudyJiang · Mar 3, 2014 · Viewed 17.2k times · Source

I'm using some NLP libraries now, (stanford and nltk) Stanford I saw the demo part but just want to ask if it possible to use it to identify more entity types.

So currently stanford NER system (as the demo shows) can recognize entities as person(name), organization or location. But the organizations recognized are limited to universities or some, big organizations. I'm wondering if I can use its API to write program for more entity types, like if my input is "Apple" or "Square" it can recognize it as a company.

Do I have to make my own training dataset?

Further more, if I ever want to extract entities and their relationships between each other, I feel I should use the stanford dependency parser. I mean, extract first the named entities and other parts tagged as "noun" and find relations between them.

Am I correct.

Thanks.

Answer

mbatchkarov picture mbatchkarov · Mar 4, 2014

Yes, you need your own training set. The pre-trained Stanford models only recognise the word "Stanford" as a named entity because they have been trained on data that had that word (or very similar words according to the feature set they use, I don't know what that is) marked as a named entity.

Once you have more data, you need to put it in the right format described in this question and the Stanford tutorial.