I've chunked a sentence using:
grammar = '''
NP:
{<DT>*(<NN.*>|<JJ.*>)*<NN.*>}
NVN:
{<NP><VB.*><NP>}
'''
chunker = nltk.chunk.RegexpParser(grammar)
tree = chunker.parse(tagged)
print tree
The result looks like:
(S
(NVN
(NP The_Pigs/NNS)
are/VBP
(NP a/DT Bristol-based/JJ punk/NN rock/NN band/NN))
that/WDT
formed/VBN
in/IN
1977/CD
./.)
But now I'm stuck trying to figure out how to navigate that. I want to be able to find the NVN subtree, and access the left-side noun phrase ("The_Pigs"), the verb ("are") and the right-side noun phrase ("a Bristol-based punk rock band"). How do I do that?
Try:
ROOT = 'ROOT'
tree = ...
def getNodes(parent):
for node in parent:
if type(node) is nltk.Tree:
if node.label() == ROOT:
print "======== Sentence ========="
print "Sentence:", " ".join(node.leaves())
else:
print "Label:", node.label()
print "Leaves:", node.leaves()
getNodes(node)
else:
print "Word:", node
getNodes(tree)