I have some HTML that looks like:
<dt>
<a href="#">Hello</a>
(2009)
</dt>
I already have all my HTML loaded into a variable called record
. I need to parse out the year i.e. 2009 if it exists.
How can I get the text inside the dt
tag but not the text inside the a
tag? I've used record.search("dt").inner_text
and this gives me everything.
It's a trivial question but I haven't managed to figure this out.
To get all the direct children with text, but not any further sub-children, you can use XPath like so:
doc.xpath('//dt/text()')
Or if you wish to use search:
doc.search('dt').xpath('text()')