Quickly Convert (.rtf|.doc) Files to Markdown Syntax with PHP

Sampson picture Sampson · Jun 25, 2009 · Viewed 26.4k times · Source

I've been manually converting articles into Markdown syntax for a few days now, and it's getting rather tedious. Some of these are 3 or 4 pages, italics and other emphasized text throughout. Is there a faster way to convert (.rtf|.doc) files to clean Markdown Syntax that I can take advantage of?

Answer

David picture David · Sep 20, 2011

If you happen to be on a mac, textutil does a good job of converting doc, docx, and rtf to html, and pandoc does a good job of converting the resulting html to markdown:

$ textutil -convert html file.doc -stdout | pandoc -f html -t markdown -o file.md

I have a script that I threw together a while back that tries to use textutil, pdf2html, and pandoc to convert whatever I throw at it to markdown.