Working with PHP Xpath trying to quickly pull certain links within a html page.
The following will find all href links on mypage.html:
$nodes = $x->query("//a[@href]");
Whereas the following will find all href links where the description matches my needle:
$nodes = $x->query("//a[contains(@href,'click me')]");
What I am trying to achieve is matching on the href itself, more specific finding url's that contain certain parameters. Is that possible within a Xpath query or should I just start manipulating the output from the first Xpath query?
Not sure I understand the question correctly, but the second XPath expression already does what you are describing. It does not match against the text node of the A element, but the href attribute:
$html = <<< HTML
<ul>
<li>
<a href="http://example.com/page?foo=bar">Description</a>
</li>
<li>
<a href="http://example.com/page?lang=de">Description</a>
</li>
</ul>
HTML;
$xml = simplexml_load_string($html);
$list = $xml->xpath("//a[contains(@href,'foo')]");
Outputs:
array(1) {
[0]=>
object(SimpleXMLElement)#2 (2) {
["@attributes"]=>
array(1) {
["href"]=>
string(31) "http://example.com/page?foo=bar"
}
[0]=>
string(11) "Description"
}
}
As you can see, the returned NodeList contains only the A element with href containing foo (which I understand is what you are looking for). It contans the entire element, because the XPath translates to Fetch all A elements with href attribute containing foo. You would then access the attribute with
echo $list[0]['href'] // gives "http://example.com/page?foo=bar"
If you only want to return the attribute itself, you'd have to do
//a[contains(@href,'foo')]/@href
Note that in SimpleXml, this would return a SimpleXml element though:
array(1) {
[0]=>
object(SimpleXMLElement)#3 (1) {
["@attributes"]=>
array(1) {
["href"]=>
string(31) "http://example.com/page?foo=bar"
}
}
}
but you can output the URL now by
echo $list[0] // gives "http://example.com/page?foo=bar"