How to convert HTML to Markdown while retaining non-markdown HTML tags?

Jeromy Anglim picture Jeromy Anglim · Apr 27, 2013 · Viewed 8.9k times · Source

I'd like to be able to take an existing HTML snippet and convert it to markdown. I've tried pandoc for this purpose:

pandoc test.html -o test.md

where test.html looked like this:

Hello

<!-- more -->

and some more text

<h2>some heading</h2>       

The result was this:

Hello and some more text

some heading
------------

Thus, it not only converts tags that have a direct meaning in markdown. It also removes tags that I would like to retain as HTML (e.g., HTML comments, iframe tags, and so on).

  • How can I convert HTML to markdown in a way that any tags that don't have an equivalent in markdown are retained as raw HTML?
  • More generally how can I have control over how the HTML to markdown conversion is done?

In particular, I'd be interested in command-line program options. For example, perhaps there are options that can be supplied to pandoc.

Answer

Jeromy Anglim picture Jeromy Anglim · Apr 27, 2013

After a bit more searching, I read about the --parse-raw option in a thread on table parsing.

Adding the --parse-raw option seemed to not strip the non-markdown equivalent HTML tags.

pandoc test.html -o test.md --parse-raw