Preserve Line Breaks in Pandoc Markdown -> LaTeX Conversion

maxheld picture maxheld · Sep 26, 2014 · Viewed 11.2k times · Source

I want to convert the following *.md converted into proper LaTeX *.tex.

Lorem *ipsum* something.
Does anyone know lorem by heart?

That would *sad* because there's always Google.

Expected Behavior / Resulting LaTeX from Pandoc

Lorem \emph{ipsum} something.
Does anyone know lorem by heart?

That would \emph{sad} because there's always Google.

Observed Behavior / Resulting LaTeX from Pandoc

Lorem \emph{ipsum} something. Does anyone know lorem by heart?

That would \emph{sad} because there's always Google.

Why do I care? 1. I'm transitioning a bigger git repo from markdown to LaTeX, and I want a clean diff and history. 2. I actually like my LaTeX with one sentence-per-line even though it does not matter for the typesetting.

How can I get Pandoc to do this?

Ps.: I am aware of the option hard_line_breaks, but that only adds \\ between the two first lines, and does not actually preserve my line breaks.

Answer

mb21 picture mb21 · Sep 29, 2014

Update

Since pandoc 1.16, this is possible:

pandoc --wrap=preserve

Old answer

Since Pandoc converts the Markdown to an AST-like internal representation, your non-semantic linebreaks are lost. So what you're looking for is not possible without some custom scripting (like using --no-wrap and then processing the output by inserting a line-break wherever there is a dot followed by a space).

However, you can use the --columns NUMBER options to specify the number of characters on each line. So you won't have a sentence per line, but NUMBER of characters per line.