XPath to get all text in element as one value, removing line breaks

Richard Ortega picture Richard Ortega · Jun 13, 2012 · Viewed 14.5k times · Source

I am trying to get all the text in a node for a following set and returning as one value (not multiple nodes).

<p>
   "I love eating out."
   <br>
   <br>
   "This is my favorite restaurant."
   <br>
   "I will definitely be back"
</p>

I am using '/p' and get all the results but it returns with line breaks. Also trying '/p/text()' results in getting each text between each tag as a separate returned value. The ideal return would be --

"I love eating out. This is my favorite restaurant. I will definitely be back"

I've tried searching other questions but couldn't find something as close. Please not that in the current environment I am restricted to only use an XPath Query and cannot parse after or setup any HTML pre-parsing. Specifically I'm using the importXML function inside of Google Docs.

Answer

Dimitre Novatchev picture Dimitre Novatchev · Jun 13, 2012

Use:

normalize-space(/)

When this XPath expression is evaluated, the string value of the document node (/) is first produced and this is provided as argument to the standard XPath function normalize-space().

By definition, normalize-space() returns its argument with the leading and trailing adjacent whitespace characters eliminated, and any interim such group of adjacent whitespace characters -- replaced by a single space character.

The evaluation of the above XPath expression results in:

"I love eating out." "This is my favorite restaurant." "I will definitely be back"

To eliminate the quotes, we additionally use the translate() function:

normalize-space(translate(/,'&quot;', ''))

The result of evaluating this expression is:

I love eating out. This is my favorite restaurant. I will definitely be back

Finally, to have this result wrapped in quotes itself, we use the concat() function:

concat('&quot;',
       normalize-space(translate(/,'&quot;', '')),
       '&quot;'
       )

The evaluation of this XPath expression produces exactly the wanted result:

"I love eating out. This is my favorite restaurant. I will definitely be back"

XSLT - based verification:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:value-of select=
   "concat('&quot;',
           normalize-space(translate(/,'&quot;', '')),
           '&quot;'
           )"/>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document (corrected to be made well-formed):

<p>
       "I love eating out."
       <br />
       <br />
       "This is my favorite restaurant."
       <br />
       "I will definitely be back"
</p>

the XPath expression is evaluated and the result of this evaluation is copied to the output:

"I love eating out. This is my favorite restaurant. I will definitely be back"