Jsoup like html parser for C++

Writwick picture Writwick · Jul 29, 2013 · Viewed 20.9k times · Source

I have been writing some codes to get some data from some pages in Java and Jsoup was on of the best libraries to work with. But, Unfortunately I have to port the whole code to C/C++. But I a cannot find any decent html parser to use on c++. Is there any Jsoup like library for C++ or How can similar results be achieved?

[Currently I am using Curl to get the source of the pages and roaming the internet to find a html parser]

Answer

ollo picture ollo · Aug 7, 2013

Unfortunately, i guess there's no parser like Jsoup for C++ ...

Beside the libraries which are already mentioned here, there's a good overview about C++ (some C too) parser here: Free C or C++ XML Parser Libraries

For parsing i used TinyXML-2 for (Html-) DOM parsing; it's a very small (only 2 files) library that runs on most OS (even non-desktop).

LibXml

  • push and pull parser (DOM, SAX)
  • Validation
  • XPath and XPointer support
  • Cross-Plattform / good documentation

Apache Xerxces

  • push and pull parser (DOM, SAX)
  • Validation
  • No XPath support (but a package for this?)
  • Cross-Plattform / good documentation

If you are on C++ CLI, check out NSoup - a Jsoup port for .NET.

Some more:

Maybe you can combine a DOM Model / Parser and a CSS selector together?