Is there a Python module for converting RTF to plain text?

Tony picture Tony · Aug 26, 2009 · Viewed 51.1k times · Source

Ideally, I'd like a module or library that doesn't require superuser access to install; I have limited privileges in my working environment.

Answer

Brendon picture Brendon · Nov 30, 2009

I've been working on a library called Pyth, which can do this:

http://pypi.python.org/pypi/pyth/

Converting an RTF file to plaintext looks something like this:

from pyth.plugins.rtf15.reader import Rtf15Reader
from pyth.plugins.plaintext.writer import PlaintextWriter

doc = Rtf15Reader.read(open('sample.rtf'))

print PlaintextWriter.write(doc).getvalue()

Pyth can also generate RTF files, read and write XHTML, generate documents from Python markup a la Nevow's stan, and has limited experimental support for latex and pdf output. Its RTF support is pretty robust -- we use it in production to read RTF files generated by various versions of Word, OpenOffice, Mac TextEdit, EIOffice, and others.