software to extract word functions like subject, predicate, object etc

Max Koretskyi picture Max Koretskyi · May 18, 2015 · Viewed 13.5k times · Source

I need to extract relations of the words in a sentence. I'm mostly interested in identifying a subject, predicate and an object. For example, for the follwoing sentence:

She gave him a pen

I'd like to have:

She_subject gave_predicate him a pen_object.

Is Stanford NLP can do that? I've tried their relation annotator but it didn't seem to work as I expected? Maybe there's other software that can produce this result?

Answer

tsleyson picture tsleyson · May 18, 2015

According to http://nlp.stanford.edu/software/lex-parser.shtml, Stanford NLP does have a parser which can identify the subject and predicate of a sentence. You can try it out online http://nlp.stanford.edu:8080/parser/index.jsp. You can use the typed dependencies to identify the subject, predicate, and object.

From the example page, the sentence My dog also likes eating sausage will give you this parse:

(ROOT
  (S
    (NP (PRP$ My) (NN dog))
    (ADVP (RB also))
    (VP (VBZ likes)
      (S
        (VP (VBG eating)
          (NP (NN sausage)))))
    (. .)))

The parser can also generate dependencies:

poss(dog-2, My-1)
nsubj(likes-4, dog-2)
advmod(likes-4, also-3)
root(ROOT-0, likes-4)
xcomp(likes-4, eating-5)
dobj(eating-5, sausage-6)

The dependency nsubj shows the main predicate and the subject—in this case, likes and dog. The numbers give the position of the word in the sentence (one-indexed, for some reason). The dobj dependency shows the relation of the predicate and object. The xcomp dependency gives internal information about the predicate.

This also works when the predicate is not a verb: My dog is large and in charge gives:

poss(dog-2, My-1)
nsubj(large-4, dog-2)
cop(large-4, is-3)
root(ROOT-0, large-4)
cc(large-4, and-5)
conj(large-4, in-6)
pobj(in-6, charge-7)

This tells us that large is the main predicate (nsubj(large-4, dog-2)), but there was a copula (cop(large-4, is-3)), as well as a conjunction and a preposition with an object.

I'm not familiar with the API, so I can't give exact code. Perhaps someone else who knows the API can do that. The parser is documented at the Stanford NLP doc site. You might also find the answer to Tools for text simplification (Java) helpful. There's more information about the dependency format in The Stanford Dependency Manual.