Algorithms for named entity recognition

caw picture caw · Jun 22, 2009 · Viewed 9.2k times · Source

I would like to use named entity recognition (NER) to find adequate tags for texts in a database.

I know there is a Wikipedia article about this and lots of other pages describing NER, I would preferably hear something about this topic from you:

  • What experiences did you make with the various algorithms?
  • Which algorithm would you recommend?
  • Which algorithm is the easiest to implement (PHP/Python)?
  • How to the algorithms work? Is manual training necessary?

Example:

"Last year, I was in London where I saw Barack Obama." => Tags: London, Barack Obama

I hope you can help me. Thank you very much in advance!

Answer

Ale picture Ale · Jun 22, 2009

To start with check out http://www.nltk.org/ if you plan working with python although as far as I know the code isn't "industrial strength" but it will get you started.

Check out section 7.5 from http://nltk.googlecode.com/svn/trunk/doc/book/ch07.html but to understand the algorithms you probably will have to read through a lot of the book.

Also check this out http://nlp.stanford.edu/software/CRF-NER.shtml. It's done with java,

NER isn't an easy subject and probably nobody will tell you "this is the best algorithm", most of them have their pro/cons.

My 0.05 of a dollar.

Cheers,