Semantic Role Labeling using NLTK

Prahalad Deshpande picture Prahalad Deshpande · Dec 14, 2013 · Viewed 9.7k times · Source

I have a list of sentences and I want to analyze every sentence and identify the semantic roles within that sentence. How do I do that?

I came across the PropBankCorpusReader within NLTK module that adds semantic labeling information to the Penn Treebank. Also my research on the internet suggests that this module is used to perform Semantic Role Labeling.

I am however unable to find a small HOWTO that helps me understand how we can leverage the PropBankCorpusReader to perform SRL on arbitary text.

Hence can someone point out examples of using PropbankCorpusReader to perform SRL on arbitary sentences?

Answer

cjm picture cjm · Dec 14, 2013

SRL is not at all a trivial problem, and not really something that can be done out of the box using nltk.

You can break down the task of SRL into 3 separate steps:

  1. Identifying the predicate.
  2. Performing word sense disambiguation on the predicate to determine which semantic arguments it accepts.
  3. Identifying the semantic arguments in the sentence.

Most current approaches to this problem use supervised machine learning, where the classifier would train on a subset of Propbank or FrameNet sentences and then test on the remaining subset to measure its accuracy. Researchers tend to focus on tweaking features and algorithms, as well as tinkering with whether the above steps are done sequentially or simultaneously, and in what order.

Some papers you might want to check out are:

The Markov Logic approach is promising but in my own experience it runs into severe scalability issues (I've only ever used Alchemy, though Alchemy Lite looks interesting). It's not a huge amount of work to implement some kind of classifier using the nltk Propbank data, and some off the shelf classifiers already exist in Python.

EDIT: This assignment from the University of Edinburgh gives some examples of how to parse Propbank data, and part of a school project I did implements a complete Propbank feature parser, though the features are geared specifically towards use in Markov Logic Networks in the style of Meza-Ruiz and Riedel (2009).