I need to process links within an html string in several different ways.
$str = 'My long <a href="http://example.com/abc" rel="link">string</a> has any
<a href="/local/path" title="with attributes">number</a> of
<a href="#anchor" data-attr="lots">links</a>.'
$links = extractLinks($str);
foreach ($links as $link) {
$pattern = "#((http|https|ftp)://(\S*?\.\S*?))(\s|\;|\)|\]|\[|\{|\}|,|\"|'|:|\<|$|\.\s)#ie";
if (preg_match($pattern,$str)) {
// Process Remote links
// For example, replace url with short url,
// or replace long anchor text with truncated
} else {
// Process Local Links, Anchors
}
}
function extractLinks($str) {
// First, I tried DomDocument
$dom = new DomDocument();
$dom->loadHTML($str);
return $dom->getElementsByTagName('a');
// But this just returns:
// DOMNodeList Object
// (
// [length] => 3
// )
// Then I tried Regex
if(preg_match_all("|<a.*(?=href=\"([^\"]*)\")[^>]*>([^<]*)</a>|i", $str, $matches)) {
print_r($matches);
}
// But this didn't work either.
}
Desired result of extractLinks($str)
:
[0] => Array(
'str' = '<a href="http://example.com/abc" rel="link">string</a>',
'href' = 'http://example.com/abc';
'anchorText' = 'string'
),
[1] => Array(
'str' = '<a href="/local/path" title="with attributes">number</a>',
'href' = '/local/path';
'anchorText' = 'number'
),
[2] => Array(
'str' = '<a href="#anchor" data-attr="lots">links</a>',
'href' = '#anchor';
'anchorText' = 'links'
);
I need all of these so I can do things like edit the href (add tracking, shorten, etc.), or replace the whole tag with something else (<a href="/u/username">username</a>
could become username
).
Here's a demo of what I'm trying to do.
You just need to change it as:
$str = 'My long <a href="http://example.com/abc" rel="link">string</a> has any
<a href="/local/path" title="with attributes">number</a> of
<a href="#anchor" data-attr="lots">links</a>.';
$dom = new DomDocument();
$dom->loadHTML($str);
$output = array();
foreach ($dom->getElementsByTagName('a') as $item) {
$output[] = array (
'str' => $dom->saveHTML($item),
'href' => $item->getAttribute('href'),
'anchorText' => $item->nodeValue
);
}
By putting it in a loop and using getAttribute
, nodeValue
and saveHTML(THE_NODE)
you will have your output