PHP - preg_replace backreferencing

Sven picture Sven · Feb 24, 2013 · Viewed 10.1k times · Source

I have big problems understanding how to use preg_replace with backreferencing.

I have a plain-text string and want to replace every link with the HTML syntax for a link. So "www.mydomain.tld" or "http://www.mydomain.tld" or "http://mydomain.tld" should be wrapped in an HTML a-tag. I have found a working function that does this online, but I want to understand how to do it myself.

In the function I found, this is the replacement:

"\\1<a href=\"http://\\2\" target=\"_blank\" rel=\"nofollow\">\\2</a>"

I see some escaped quotation marks in there and these bits: \\1 \\2.
According to the PHP documentation these are backreferences. But how do I use them, what do they do?

I found nothing about that in the spec, so any help would be greatly appreciated!

Answer

cryptic ツ picture cryptic ツ · Feb 24, 2013

This will do the job for you. Please see below for an explanation on how it all works.

$string = 'some text www.example.com more text http://example.com more text https://www.example.com more text';

$string = preg_replace('#\b(?:http(s?)://)?((?:[a-z\d-]+\.)+[a-z]+)\b#', "<a href='http$1://$2'>http$1://$2</a>", $string);

echo $string; // some text <a href='http://www.example.com'>http://www.example.com</a> more text <a href='http://example.com'>http://example.com</a> more text <a href='https://www.example.com'>https://www.example.com</a> more text

\b match word boundary (?:http(s?)://)? optionally match string if it contains 'http://' or 'https://', if https grab the 's' so we can build correct URL

(?:[a-z\d-]+\.)+ match one or more occurrence of of series of letter/numbers followed by a period

[a-z]+ match one ore more occurrences of a series of letters, TLD, note TLDs are now open for purchase so can't limit length anymore. see http://tinyurl.com/cle6jqb

We then capture both of the last two sections in addition to the 's' in a backreference by enclosing them in parentheses.

We then build the URL:

<a href='http$1://$2'>http$1://$2</a>

http$1:// create HTTP if HTTPS the backreference $1 will contain an 's'

$2 will contain the domain name. We make the link where the URL is made the link text.