Get position of word in sentence with spacy

jack west picture jack west · Sep 5, 2017 · Viewed 8.3k times · Source

I'm aware of the basic spacy workflow for getting various attributes from a document, however I can't find a built in function to return the position (start/end) of a word which is part of a sentence.

Would anyone know if this is possible with Spacy?

Answer

DhruvPathak picture DhruvPathak · Sep 5, 2017

These are available as attributes of the tokens in the sentences. Doc says:

idx int The character offset of the token within the parent document.

i int The index of the token within the parent document.

>>> import spacy
>>> nlp = spacy.load('en')
>>> parsed_sentence = nlp(u'This is my sentence')
>>> [(token.text,token.i) for token in parsed_sentence]
[(u'This', 0), (u'is', 1), (u'my', 2), (u'sentence', 3)]
>>> [(token.text,token.idx) for token in parsed_sentence]
[(u'This', 0), (u'is', 5), (u'my', 8), (u'sentence', 11)]