Please have a look at the following.
String[]sentenceHolder = titleAndBodyContainer.split("\n|\\.(?!\\d)|(?<!\\d)\\.");
This is how I tried to split a paragraph into sentences. But, there is a problem. My paragraph includes dates like Jan. 13, 2014
, words like U.S
and numbers like 2.2
. They all got splitted by the above code. So basically, this code splits lot of 'dots' whether it is a full stop or not.
I tried String[]sentenceHolder = titleAndBodyContainer.split(".\n");
and String[]sentenceHolder = titleAndBodyContainer.split("\\.");
as well. All failed.
How can I split a paragraph into sentences "properly"?
You can try this
String str = "This is how I tried to split a paragraph into a sentence. But, there is a problem. My paragraph includes dates like Jan.13, 2014 , words like U.S and numbers like 2.2. They all got split by the above code.";
Pattern re = Pattern.compile("[^.!?\\s][^.!?]*(?:[.!?](?!['\"]?\\s|$)[^.!?]*)*[.!?]?['\"]?(?=\\s|$)", Pattern.MULTILINE | Pattern.COMMENTS);
Matcher reMatcher = re.matcher(str);
while (reMatcher.find()) {
System.out.println(reMatcher.group());
}
Output:
This is how I tried to split a paragraph into a sentence.
But, there is a problem.
My paragraph includes dates like Jan.13, 2014 , words like U.S and numbers like 2.2.
They all got split by the above code.