How can I split a text into an array of sentences?
Example text:
Fry me a Beaver. Fry me a Beaver! Fry me a Beaver? Fry me Beaver no. 4?! Fry me many Beavers... End
Should output:
0 => Fry me a Beaver.
1 => Fry me a Beaver!
2 => Fry me a Beaver?
3 => Fry me Beaver no. 4?!
4 => Fry me many Beavers...
5 => End
I tried some solutions that I've found on SO through search, but they all fail, especially at the 4th sentence.
/(?<=[!?.])./
/\.|\?|!/
/((?<=[a-z0-9)][.?!])|(?<=[a-z0-9][.?!]\"))(\s|\r\n)(?=\"?[A-Z])/
/(?<=[.!?]|[.!?][\'"])\s+/ // <- closest one
Since you want to "split" sentences why are you trying to match them ?
For this case let's use preg_split().
Code:
$str = 'Fry me a Beaver. Fry me a Beaver! Fry me a Beaver? Fry me Beaver no. 4?! Fry me many Beavers... End';
$sentences = preg_split('/(?<=[.?!])\s+(?=[a-z])/i', $str);
print_r($sentences);
Output:
Array
(
[0] => Fry me a Beaver.
[1] => Fry me a Beaver!
[2] => Fry me a Beaver?
[3] => Fry me Beaver no. 4?!
[4] => Fry me many Beavers...
[5] => End
)
Explanation:
Well to put it simply we are spliting by grouped space(s) \s+ and doing two things:
(?<=[.?!]) Positive look behind assertion, basically we search if there is a point or question mark or exclamation mark behind the space.
(?=[a-z]) Positive look ahead assertion, searching if there is a letter after the space, this is kind of a workaround for the no. 4
problem.