Get text between HTML tags

Ryan Cooper picture Ryan Cooper · Apr 18, 2011 · Viewed 58.1k times · Source

Ok, This is a pretty basic question im sure but im new to PHP and haven't been able to figure it out. The input string is $data im trying to continue to pull and only use the first match. Is the below incorrect? This may not even be the best way to perform the action, im just trying to pull the contents in between two html tags (first set found) and discard the rest of the data. I know there are similar questions, ive read them all, my question is a mix, if theres a better way to do this and how i can define the match as the new input for the rest of the remaining code. If i change $matches to $data2 and use it from there on out it returns errors.

preg_match('/<h2>(.*?)<\/h2>/s', $data, $matches);

Answer

diEcho picture diEcho · Apr 18, 2011

Don't parse HTML via preg_match, use this PHP class instead:

The DOMDocument class

Example:

<?php 

$html= "<p>hi</p>
<h1>H1 title</h1>
<h2>H2 title</h2>
<h3>H2 title</h3>";
 // a new dom object 
 $dom = new domDocument('1.0', 'utf-8'); 
 // load the html into the object 
 $dom->loadHTML($html); 
 //discard white space 
 $dom->preserveWhiteSpace = false; 
 $hTwo= $dom->getElementsByTagName('h2'); // here u use your desired tag
 echo $hTwo->item(0)->nodeValue; 
 //will return "H2 title";
 ?>

Reference