extract substring using regex in groovy

RicardoE picture RicardoE · Jul 9, 2013 · Viewed 51.2k times · Source

If I have the following pattern in some text:

def articleContent =  "<![CDATA[ Hellow World ]]>"

I would like to extract the "Hellow World" part, so I use the following code to match it:

def contentRegex = "<![CDATA[ /(.)*/ ]]>"
def contentMatcher = ( articleContent =~ contentRegex )
println contentMatcher[0]

However I keep getting a null pointer exception because the regex doesn't seem to be working, what would be the correct regex for "any peace of text", and how to collect it from a string?

Answer

tim_yates picture tim_yates · Jul 9, 2013

Try:

def result = (articleContent =~ /<!\[CDATA\[(.+)]]>/)[ 0 ]​[ 1 ]

However I worry that you are planning to parse xml with regular expressions. If this cdata is part of a larger valid xml document, better to use an xml parser