The input text are always list of dish names where there are 1~3 adjectives and a noun
Inputs
thai iced tea
spicy fried chicken
sweet chili pork
thai chicken curry
outputs:
thai tea, iced tea
spicy chicken, fried chicken
sweet pork, chili pork
thai chicken, chicken curry, thai curry
Basically, I am looking to parse the sentence tree and try to generate bi-grams by pairing an adjective with the noun.
And I would like to achieve this with spacy or nltk
I used spacy 2.0 with english model. To find nouns and "not-nouns" to parse the input and then I put together not-nouns and nouns to create a desired output.
Your input:
s = ["thai iced tea",
"spicy fried chicken",
"sweet chili pork",
"thai chicken curry",]
Spacy solution:
import spacy
nlp = spacy.load('en') # import spacy, load model
def noun_notnoun(phrase):
doc = nlp(phrase) # create spacy object
token_not_noun = []
notnoun_noun_list = []
for item in doc:
if item.pos_ != "NOUN": # separate nouns and not nouns
token_not_noun.append(item.text)
if item.pos_ == "NOUN":
noun = item.text
for notnoun in token_not_noun:
notnoun_noun_list.append(notnoun + " " + noun)
return notnoun_noun_list
Call function:
for phrase in s:
print(noun_notnoun(phrase))
Results:
['thai tea', 'iced tea']
['spicy chicken', 'fried chicken']
['sweet pork', 'chili pork']
['thai chicken', 'curry chicken']