How can I split a text into sentences using the Stanford parser?

S Gaber picture S Gaber · Feb 29, 2012 · Viewed 35.1k times · Source

How can I split a text or paragraph into sentences using Stanford parser?

Is there any method that can extract sentences, such as getSentencesFromString() as it's provided for Ruby?

Answer

Kenston Choi picture Kenston Choi · Feb 29, 2012

You can check the DocumentPreprocessor class. Below is a short snippet. I think there may be other ways to do what you want.

String paragraph = "My 1st sentence. “Does it work for questions?” My third sentence.";
Reader reader = new StringReader(paragraph);
DocumentPreprocessor dp = new DocumentPreprocessor(reader);
List<String> sentenceList = new ArrayList<String>();

for (List<HasWord> sentence : dp) {
   // SentenceUtils not Sentence
   String sentenceString = SentenceUtils.listToString(sentence);
   sentenceList.add(sentenceString);
}

for (String sentence : sentenceList) {
   System.out.println(sentence);
}