Split a text into sentences

thelolcat picture thelolcat · May 4, 2013 · Viewed 9.6k times · Source

How can I split a text into an array of sentences?

Example text:

Fry me a Beaver. Fry me a Beaver! Fry me a Beaver? Fry me Beaver no. 4?! Fry me many Beavers... End

Should output:

0 => Fry me a Beaver.
1 => Fry me a Beaver!
2 => Fry me a Beaver?
3 => Fry me Beaver no. 4?!
4 => Fry me many Beavers...
5 => End

I tried some solutions that I've found on SO through search, but they all fail, especially at the 4th sentence.

/(?<=[!?.])./

/\.|\?|!/

/((?<=[a-z0-9)][.?!])|(?<=[a-z0-9][.?!]\"))(\s|\r\n)(?=\"?[A-Z])/

/(?<=[.!?]|[.!?][\'"])\s+/    // <- closest one

Answer

HamZa picture HamZa · May 4, 2013

Since you want to "split" sentences why are you trying to match them ?

For this case let's use preg_split().

Code:

$str = 'Fry me a Beaver. Fry me a Beaver! Fry me a Beaver? Fry me Beaver no. 4?! Fry me many Beavers... End';
$sentences = preg_split('/(?<=[.?!])\s+(?=[a-z])/i', $str);
print_r($sentences);

Output:

Array
(
    [0] => Fry me a Beaver.
    [1] => Fry me a Beaver!
    [2] => Fry me a Beaver?
    [3] => Fry me Beaver no. 4?!
    [4] => Fry me many Beavers...
    [5] => End
)

Explanation:

Well to put it simply we are spliting by grouped space(s) \s+ and doing two things:

  1. (?<=[.?!]) Positive look behind assertion, basically we search if there is a point or question mark or exclamation mark behind the space.

  2. (?=[a-z]) Positive look ahead assertion, searching if there is a letter after the space, this is kind of a workaround for the no. 4 problem.