The function below is designed to apply rel="nofollow"
attributes to all external links and no internal links unless the path matches a predefined root URL defined as $my_folder
below.
So given the variables...
$my_folder = 'http://localhost/mytest/go/';
$blog_url = 'http://localhost/mytest';
And the content...
<a href="http://localhost/mytest/">internal</a>
<a href="http://localhost/mytest/go/hostgator">internal cloaked link</a>
<a href="http://cnn.com">external</a>
The end result, after replacement should be...
<a href="http://localhost/mytest/">internal</a>
<a href="http://localhost/mytest/go/hostgator" rel="nofollow">internal cloaked link</a>
<a href="http://cnn.com" rel="nofollow">external</a>
Notice that the first link is not altered, since its an internal link.
The link on the second line is also an internal link, but since it matches our $my_folder
string, it gets the nofollow
too.
The third link is the easiest, since it does not match the blog_url
, its obviously an external link.
However, in the script below, ALL of my links are getting nofollow
. How can I fix the script to do what I want?
function save_rseo_nofollow($content) {
$my_folder = $rseo['nofollow_folder'];
$blog_url = get_bloginfo('url');
preg_match_all('~<a.*>~isU',$content["post_content"],$matches);
for ( $i = 0; $i <= sizeof($matches[0]); $i++){
if ( !preg_match( '~nofollow~is',$matches[0][$i])
&& (preg_match('~' . $my_folder . '~', $matches[0][$i])
|| !preg_match( '~'.$blog_url.'~',$matches[0][$i]))){
$result = trim($matches[0][$i],">");
$result .= ' rel="nofollow">';
$content["post_content"] = str_replace($matches[0][$i], $result, $content["post_content"]);
}
}
return $content;
}
Here is the DOMDocument solution...
$str = '<a href="http://localhost/mytest/">internal</a>
<a href="http://localhost/mytest/go/hostgator">internal cloaked link</a>
<a href="http://cnn.com" rel="me">external</a>
<a href="http://google.com">external</a>
<a href="http://example.com" rel="nofollow">external</a>
<a href="http://stackoverflow.com" rel="junk in the rel">external</a>
';
$dom = new DOMDocument();
$dom->preserveWhitespace = FALSE;
$dom->loadHTML($str);
$a = $dom->getElementsByTagName('a');
$host = strtok($_SERVER['HTTP_HOST'], ':');
foreach($a as $anchor) {
$href = $anchor->attributes->getNamedItem('href')->nodeValue;
if (preg_match('/^https?:\/\/' . preg_quote($host, '/') . '/', $href)) {
continue;
}
$noFollowRel = 'nofollow';
$oldRelAtt = $anchor->attributes->getNamedItem('rel');
if ($oldRelAtt == NULL) {
$newRel = $noFollowRel;
} else {
$oldRel = $oldRelAtt->nodeValue;
$oldRel = explode(' ', $oldRel);
if (in_array($noFollowRel, $oldRel)) {
continue;
}
$oldRel[] = $noFollowRel;
$newRel = implode($oldRel, ' ');
}
$newRelAtt = $dom->createAttribute('rel');
$noFollowNode = $dom->createTextNode($newRel);
$newRelAtt->appendChild($noFollowNode);
$anchor->appendChild($newRelAtt);
}
var_dump($dom->saveHTML());
string(509) "<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<a href="http://localhost/mytest/">internal</a>
<a href="http://localhost/mytest/go/hostgator">internal cloaked link</a>
<a href="http://cnn.com" rel="me nofollow">external</a>
<a href="http://google.com" rel="nofollow">external</a>
<a href="http://example.com" rel="nofollow">external</a>
<a href="http://stackoverflow.com" rel="junk in the rel nofollow">external</a>
</body></html>
"