Finding urls from text string via php and regex?

Sisir picture Sisir · May 25, 2011 · Viewed 12k times · Source

I know the question title looks very repetitive. But some of the solution i did not find here.

I need to find urls form text string:

$pattern = '`.*?((http|https)://[\w#$&+,\/:;[email protected]]+)[^\w#$&+,\/:;[email protected]]*?`i';

    if (preg_match_all($pattern,$url_string,$matches)) {
        print_r($matches[1]);
    }

using this pattern i was able to find urls with http:// and https:// which is okey. But i have user input where people add url like www.domain.com even domain.com

So, i need to validate the string first where i can replace www.domain.com domain.com with common protocol http:// before them. Or i need to comeup with more good pattern?

I am not good with regex and don't know what to do.

My idea is first finding the urls with http:// and https:// the put them in an array then replace these url with space(" ") in the text string then use other patterns for it. But i am not sure what pattern to use.

I am using this $url_string = preg_replace($pattern, ' ', $url_string ); but that removes if any www.domain.com or domain.com url between two valid url with http:// or https://

If you can help that will be great.

To make things more clear:

i need a pattern or some other method where i can find all urls in a text sting. the example of url are:

  1. domain.com
  2. www.domain.com
  3. http://www.domain.com
  4. http://domain.com
  5. https://www.domain.com
  6. https://domain.com

thanks! 5.

Answer

Adrian B picture Adrian B · May 25, 2011
$pattern = '#(www\.|https?://)?[a-z0-9]+\.[a-z0-9]{2,4}\S*#i';
preg_match_all($pattern, $str, $matches, PREG_PATTERN_ORDER);