I have a list of sentences and I want to analyze every sentence and identify the semantic roles within that sentence. How do I do that?
I came across the PropBankCorpusReader within NLTK module that adds semantic labeling information to the Penn Treebank. Also my research on the internet suggests that this module is used to perform Semantic Role Labeling.
I am however unable to find a small HOWTO that helps me understand how we can leverage the PropBankCorpusReader to perform SRL on arbitary text.
Hence can someone point out examples of using PropbankCorpusReader
to perform SRL on arbitary sentences?
SRL is not at all a trivial problem, and not really something that can be done out of the box using nltk
.
You can break down the task of SRL into 3 separate steps:
Most current approaches to this problem use supervised machine learning, where the classifier would train on a subset of Propbank or FrameNet sentences and then test on the remaining subset to measure its accuracy. Researchers tend to focus on tweaking features and algorithms, as well as tinkering with whether the above steps are done sequentially or simultaneously, and in what order.
Some papers you might want to check out are:
The Markov Logic approach is promising but in my own experience it runs into severe scalability issues (I've only ever used Alchemy, though Alchemy Lite looks interesting). It's not a huge amount of work to implement some kind of classifier using the nltk Propbank data, and some off the shelf classifiers already exist in Python.
EDIT: This assignment from the University of Edinburgh gives some examples of how to parse Propbank data, and part of a school project I did implements a complete Propbank feature parser, though the features are geared specifically towards use in Markov Logic Networks in the style of Meza-Ruiz and Riedel (2009).