Java Wikitext Parser

No_name picture No_name · Jul 23, 2012 · Viewed 7.4k times · Source

Any ideas for a nice parser with an easy to use api that is configurable? I'm looking to feed it data such as http://wikitravel.org/wiki/en/api.php?format=xml&action=parse&prop=wikitext&page=San%20Francisco, choose sections of data I want, and output custom html for each unique type of element? Java would be preferred, but if there's a php/js solution that is compatible with most (99%+) wikitext, that would be okay as well.

Answer

Christian picture Christian · Jul 23, 2012

Sweble is probably the best Java parser of wikitext. It claims to be 100% compliant with wikitext, but I seriously doubt that. It parses wikitext into an abstract syntax tree that you then have to do something with (like convert it to HTML).

There is a page on mediawiki.org that lists wikitext parsers in various programming languages. I don't think any of them do 99+% of wikitext though. In general parsing wikitext is a really complex problem. Wikitext isn't even formally defined anywhere outside of the MediaWiki parser itself.