I have been using
explode(".",$mystring)
to split a paragraph into sentences. However this doen't cover sentences that have been concluded with different punctuation such as ! ? : ;
Is there a way of using an array as a delimiter instead of a single character? Alternativly is there another neat way of splitting using various punctuation?
I tried
explode(("." || "?" || "!"),$mystring)
hopefully but it didn't work...
You can use preg_split()
combined with a PCRE lookahead condition to split the string after each occurance of .
, ;
, :
, ?
, !
, .. while keeping the actual punctuation intact:
Code:
$subject = 'abc sdfs. def ghi; this is [email protected]! asdasdasd? abc xyz';
// split on whitespace between sentences preceded by a punctuation mark
$result = preg_split('/(?<=[.?!;:])\s+/', $subject, -1, PREG_SPLIT_NO_EMPTY);
print_r($result);
Result:
Array
(
[0] => abc sdfs.
[1] => def ghi;
[2] => this is [email protected]!
[3] => asdasdasd?
[4] => abc xyz
)
You can also add a blacklist for abbreviations (Mr., Mrs., Dr., ..) that should not be split into own sentences by inserting a negative lookbehind assertion:
$subject = 'abc sdfs. Dr. Foo said he is not a sentence; asdasdasd? abc xyz';
// split on whitespace between sentences preceded by a punctuation mark
$result = preg_split('/(?<!Mr.|Mrs.|Dr.)(?<=[.?!;:])\s+/', $subject, -1, PREG_SPLIT_NO_EMPTY);
print_r($result);
Result:
Array
(
[0] => abc sdfs.
[1] => Dr. Foo said he is not a sentence;
[2] => asdasdasd?
[3] => abc xyz
)