Extract Data from HTML using PHP

David picture David · Sep 6, 2010 · Viewed 16.8k times · Source

Here is what I am looking for :

I have a Link which displays some data on HTML format :

http://www.118.com/people-search.mvc...0&pageNumber=1

Data comes in below format :

<div class="searchResult regular"> 

Bird John

56 Leathwaite Road
London
SW11 6RS 020 7228 5576

I want my PHP page to execute above URL and Extract/Parse Data from the Result HTML page based on above Tags as h2=Name address=Address telephoneNumber= Phone Number

and Display them in a Tabular Format.

I got this but it only shows the TEXT format of an HTML page but works to an extent:

<?
function get_content($url) 
{ 
$ch = curl_init(); 

curl_setopt ($ch, CURLOPT_URL, $url); 
curl_setopt ($ch, CURLOPT_HEADER, 0); 

ob_start(); 

curl_exec ($ch); 
curl_close ($ch); 
$string = ob_get_contents(); 

ob_end_clean(); 

return $string; 

} 


$content = get_content("http://www.118.com/people-search.mvc?Supplied=true&Name=william&Location=Crabtree&pageSize=50&pageNumber=1"); 
echo $content;
$content = get_content("http://www.118.com/people-search.mvc?Supplied=true&Name=william&Location=Crabtree&pageSize=50&pageNumber=2"); 
echo $content;
$content = get_content("http://www.118.com/people-search.mvc?Supplied=true&Name=william&Location=Crabtree&pageSize=50&pageNumber=3"); 
echo $content;
$content = get_content("http://www.118.com/people-search.mvc?Supplied=true&Name=william&Location=Crabtree&pageSize=50&pageNumber=4"); 
echo $content;

?>

Answer

jkilbride picture jkilbride · Sep 6, 2010

You need to use a dom parser Simple HTML or similar

The read the file into an dom object and parse it using the appropriate selectors:

$html = new simple_html_dom("http://www.118.com/people-search.mvc...0&pageNumber=1");

foreach($html->find(.searchResult+regular) as $div) {
  //parse div contents here to extract name and address etc.
}
$html->clear();
unset($html);

For more info see the Simple HTML documentation.