I use the strip_tags()
function but I need to remove some tags (and all of their contents).
for example :
<div>
<p class="test">
Test A
</p>
<span>
Test B
</span>
<div>
Test C
</div>
</div>
Let's say, I need to get rid of the P and SPAN tags, and only keep :
<div>
<div>
Test C
</div>
</div>
strip_tags
expects as a second parameter the tags that you want to KEEP.
In this particular example I could use striptags($html, "<div>");
but the html I'm scraping and the tags that need to be removed are different all the time.
I searched for hours for a function that suits my needs, but couldn't find anything useful.
Any idea's?
Use a regular expression. Something like this should work:
$tags = array( 'p', 'span');
$text = preg_replace( '#<(' . implode( '|', $tags) . ')>.*?<\/$1>#s', '', $text);
The demo shows it replacing the desired tags with nothing.
Note that you may need to tweak it more, say, to compensate for whitespace within the tags, or other unknowns that your example does not demonstrate.
Here is the regex to use to capture tags with or without attributes:
'#<(' . implode( '|', $tags) . ')(?:[^>]+)?>.*?<\/$1>#s'