I have been using Spacy for noun chunks extraction using Doc.noun_chunks property provided by Spacy. How could I extract verb phrases from input text using Spacy library (of the form 'VERB ? ADV * VERB +' )?
This might help you.
from __future__ import unicode_literals
import spacy,en_core_web_sm
import textacy
nlp = en_core_web_sm.load()
sentence = 'The author is writing a new book.'
pattern = r'<VERB>?<ADV>*<VERB>+'
doc = textacy.Doc(sentence, lang='en_core_web_sm')
lists = textacy.extract.pos_regex_matches(doc, pattern)
for list in lists:
print(list.text)
Output:
is writing
On how to highlight the verb phrases do check the link below.
Highlight verb phrases using spacy and html
Another Approach:
Recently observed Textacy has made some changes to regex matches. Based on that approach i tried this way.
from __future__ import unicode_literals
import spacy,en_core_web_sm
import textacy
nlp = en_core_web_sm.load()
sentence = 'The cat sat on the mat. He dog jumped into the water. The author is writing a book.'
pattern = [{'POS': 'VERB', 'OP': '?'},
{'POS': 'ADV', 'OP': '*'},
{'POS': 'VERB', 'OP': '+'}]
doc = textacy.make_spacy_doc(sentence, lang='en_core_web_sm')
lists = textacy.extract.matches(doc, pattern)
for list in lists:
print(list.text)
Output:
sat
jumped
writing
I checked the POS matches in this links seems the result is not the intended one.
[https://explosion.ai/demos/matcher][1]
Did anybody try framing POS tags instead of Regexp pattern for finding Verb phrases?
Edit 2:
import spacy
from spacy.matcher import Matcher
from spacy.util import filter_spans
nlp = spacy.load('en_core_web_sm')
sentence = 'The cat sat on the mat. He quickly ran to the market. The dog jumped into the water. The author is writing a book.'
pattern = [{'POS': 'VERB', 'OP': '?'},
{'POS': 'ADV', 'OP': '*'},
{'POS': 'AUX', 'OP': '*'},
{'POS': 'VERB', 'OP': '+'}]
# instantiate a Matcher instance
matcher = Matcher(nlp.vocab)
matcher.add("Verb phrase", None, pattern)
doc = nlp(sentence)
# call the matcher to find matches
matches = matcher(doc)
spans = [doc[start:end] for _, start, end in matches]
print (filter_spans(spans))
Output:
[sat, quickly ran, jumped, is writing]
Based on help from mdmjsh's answer.
Edit3: Strange behavior. The following sentence for the following pattern the verb phrase gets identified correctly in https://explosion.ai/demos/matcher
pattern = [{'POS': 'VERB', 'OP': '?'},
{'POS': 'ADV', 'OP': '*'},
{'POS': 'VERB', 'OP': '+'}]
The very black cat must be really meowing really loud in the yard.
But outputs the following while running from code.
[must, really meowing]