XML parsing in Ruby

Ricketyship picture Ricketyship · Jan 10, 2012 · Viewed 13.7k times · Source

I am using a REXML Ruby parser to parse an XML file. But on a 64 bit AIX box with 64 bit Ruby, I am getting the following error:

REXML::ParseException: #<REXML::ParseException: #<RegexpError: Stack overflow in 
regexp matcher: 
/^<((?>(?:[\w:][\-\w\d.]*:)?[\w:][\-\w\d.]*))\s*((?>\s+(?:[\w:][\-\w\d.]*:)?[\w:][\-\w\d.]*\s*=\s*(["']).*?\3)*)\s*(\/)?>/mu>

The call for the same is something like this:

REXML::Document.new(File.open(actual_file_name, "r"))

Does anyone have an idea regarding how to solve this issue?

Answer

Niklas B. picture Niklas B. · Jan 10, 2012

I've had several issues for REXML, it doesn't seem to be the most mature library. Usually I use Nokogiri for Ruby XML parsing stuff, it should be faster and more stable than REXML. After installing it with sudo gem install nokogiri, you can use something like this to get a DOM instance:

doc = Nokogiri.XML(File.open(actual_file_name, 'rb'))
# => #<Nokogiri::XML::Document:0xf1de34 name="document" [...] >

The documentation on the official webpage is also much better than that of REXML, IMHO.