Are preg_match() and preg_replace() slow?

Jasko Koyn picture Jasko Koyn · Jan 15, 2013 · Viewed 15.4k times · Source

I've been coding in PHP for a while and I keep reading that you should only use preg_match and preg_replace when you have to because it slows down performance. Why is this? Would it really be bad to use 20 preg_matches in one file instead of using another PHP function.

Answer

Elias Van Ootegem picture Elias Van Ootegem · Jan 15, 2013

As Mike Brant said in his answer: There's nothing wrong with using any of the preg_* functions, if you need them.
You want to know if it's a good idea to have something like 20 preg_match calls in a single file, well, honestly: I'd say that's too many. I've often stated that "if your solution to a problem relies on more than 3 regex's at any given time, you're part of the problem". I have occasionally sinned against my own mantra, though.

If you are using 20 preg_match calls, chances are you can halve that number simply by having a closer look at the actual regular expressions. Regex's, especially the Perl regex, are incredibly powerful, and are well worth the time to get to know them. The reason why they tend to be slower is simply because the regex has to be parsed, and "translated" to a considerable number of branches and loops at some low level. If, say, you want to replace all lower-case a's with an upper-case char, you could use a regular expression, sure, but in PHP this would look like this:

preg_replace('/a/','A',$string);

Look at the expression, the first argument: it's a string that is passed as an argument. This string will be parsed (when parsing, the delimiters are checked, a match string is created and then the string is iterated, each char is compared to the pattern (in this case a), and if the substring matches, it's replaced.
Seems like a bit of a hasstle, especially considering that the last step (comparing substrings and replace matches) is all we really want.

$string = str_replace('a','A',$string);

Does just that, without the additional checks performed when a regular expression is parsed and validated.
Don't forget that preg_match also constructs an array of matches, and constructing an array isn't free either.

In short: regex's are slower because the expression is parsed, validated and finally translated into a set of simple, low-level instructions.

Note that, in some cases people use explode and implode for string manipulations. This, too, creates an array which is -again- not free. Considering that you're imploding that very same array shortly thereafter. Perhaps another option is more desirable (and in some cases preg_replace can be faster here).
Basically: regex's need additional processing, that simple string functions don't require. But when in doubt, there's only 1 way to be absolutely sure: set up a test script...