How to parse restructuredtext in python?

zhangailin picture zhangailin · Oct 14, 2012 · Viewed 8.3k times · Source

Is there any module that can parse restructuredtext into a tree model?

Can docutils or sphinx do this?

Answer

mbdevpl picture mbdevpl · Feb 10, 2018

I'd like to extend upon the answer from Gareth Latty. "What you probably want is the parser at docutils.parsers.rst" is a good starting point of the answer, but what's next? Namely:

How to parse restructuredtext in python?

Below is the exact answer for Python 3.6 and docutils 0.14:

import docutils.nodes
import docutils.parsers.rst
import docutils.utils
import docutils.frontend

def parse_rst(text: str) -> docutils.nodes.document:
    parser = docutils.parsers.rst.Parser()
    components = (docutils.parsers.rst.Parser,)
    settings = docutils.frontend.OptionParser(components=components).get_default_values()
    document = docutils.utils.new_document('<rst-doc>', settings=settings)
    parser.parse(text, document)
    return document

And the resulting document can be processed using, for example, below, which will print all references in the document:

class MyVisitor(docutils.nodes.NodeVisitor):

    def visit_reference(self, node: docutils.nodes.reference) -> None:
        """Called for "reference" nodes."""
        print(node)

    def unknown_visit(self, node: docutils.nodes.Node) -> None:
        """Called for all other node types."""
        pass

Here's how to run it:

doc = parse_rst('spam spam lovely spam')
visitor = MyVisitor(doc)
doc.walk(visitor)