Extract verb phrases using Spacy

Nidhi picture Nidhi · Dec 17, 2017 · Viewed 9.2k times · Source

I have been using Spacy for noun chunks extraction using Doc.noun_chunks property provided by Spacy. How could I extract verb phrases from input text using Spacy library (of the form 'VERB ? ADV * VERB +' )?

Answer

Programmer_nltk picture Programmer_nltk · Dec 23, 2017

This might help you.

from __future__ import unicode_literals
import spacy,en_core_web_sm
import textacy
nlp = en_core_web_sm.load()
sentence = 'The author is writing a new book.'
pattern = r'<VERB>?<ADV>*<VERB>+'
doc = textacy.Doc(sentence, lang='en_core_web_sm')
lists = textacy.extract.pos_regex_matches(doc, pattern)
for list in lists:
    print(list.text)

Output:

is writing

On how to highlight the verb phrases do check the link below.

Highlight verb phrases using spacy and html

Another Approach:

Recently observed Textacy has made some changes to regex matches. Based on that approach i tried this way.

from __future__ import unicode_literals
import spacy,en_core_web_sm
import textacy
nlp = en_core_web_sm.load()
sentence = 'The cat sat on the mat. He dog jumped into the water. The author is writing a book.'
pattern = [{'POS': 'VERB', 'OP': '?'},
           {'POS': 'ADV', 'OP': '*'},
           {'POS': 'VERB', 'OP': '+'}]
doc = textacy.make_spacy_doc(sentence, lang='en_core_web_sm')
lists = textacy.extract.matches(doc, pattern)
for list in lists:
    print(list.text)

Output:

sat
jumped
writing

I checked the POS matches in this links seems the result is not the intended one.

[https://explosion.ai/demos/matcher][1]

Did anybody try framing POS tags instead of Regexp pattern for finding Verb phrases?

Edit 2:

import spacy   
from spacy.matcher import Matcher
from spacy.util import filter_spans

nlp = spacy.load('en_core_web_sm') 

sentence = 'The cat sat on the mat. He quickly ran to the market. The dog jumped into the water. The author is writing a book.'
pattern = [{'POS': 'VERB', 'OP': '?'},
           {'POS': 'ADV', 'OP': '*'},
           {'POS': 'AUX', 'OP': '*'},
           {'POS': 'VERB', 'OP': '+'}]

# instantiate a Matcher instance
matcher = Matcher(nlp.vocab)
matcher.add("Verb phrase", None, pattern)

doc = nlp(sentence) 
# call the matcher to find matches 
matches = matcher(doc)
spans = [doc[start:end] for _, start, end in matches]

print (filter_spans(spans))   

Output:

[sat, quickly ran, jumped, is writing]

Based on help from mdmjsh's answer.

Edit3: Strange behavior. The following sentence for the following pattern the verb phrase gets identified correctly in https://explosion.ai/demos/matcher

pattern = [{'POS': 'VERB', 'OP': '?'},
           {'POS': 'ADV', 'OP': '*'},
           {'POS': 'VERB', 'OP': '+'}]

The very black cat must be really meowing really loud in the yard.

But outputs the following while running from code.

[must, really meowing]