I trying to get the "link" elements from certain webpages. I can't figure out what i'm doing wrong though. I'm getting the following error:
Severity: Warning
Message: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: no name in Entity, line: 536
Filename: controllers/test.php
Line Number: 34
Line 34 is the following in the code:
$dom->loadHTML($html);
my code:
$url = "http://www.amazon.com/";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
if($html = curl_exec($ch)){
// parse the html into a DOMDocument
$dom = new DOMDocument();
$dom->recover = true;
$dom->strictErrorChecking = false;
$dom->loadHTML($html);
$hrefs = $dom->getElementsByTagName('a');
echo "<pre>";
print_r($hrefs);
echo "</pre>";
curl_close($ch);
}else{
echo "The website could not be reached.";
}
It means some of the HTML code is invalid. THis is just a warning, not an error. Your script will still process it. To suppress the warnings set
libxml_use_internal_errors(true);
Or you could just completely suppress the warning by doing
@$dom->loadHTML($html);