Replacing varying delimiters using sed or tr

Terra Ashley picture Terra Ashley · May 30, 2013 · Viewed 15.8k times · Source

I need to convert a list of IDs from using a delimiter consisting of , and/or \r\n or \n to using ,|. (essentially: s/[,\r\n]+/,\|/g without a trailing |)

Example input data:

123,456,789,012

or

123,
456
789,
012

and I need the resulting output to be 123,|456,|798,|012,: a comma ending each field, and a pipe separating them.

This seems really simple to do, but I'm quite stumped on how to manage this. I've tried ... quite a few ways, actually, but nothing seems to work. Here are a few examples:

  1. sed "s/[,\r\n]+/,\|/g" < filename does not match any of the delimiters.

  2. sed "s/(,|,?\r?\n?)/,\|/g" does not match anything either.

  3. tr -t "(,?(\r|\n)+)" ",\|" and tr -t "[,\r\n]+" ",\|" only replace ,

  4. tr "(,|\r?\n)" ",\|" works correctly with , but with ,\n and ,\r\n it replaces the matched characters with multiple bars. Ex: 123|||456|||789|||012|

  5. Getting more complex: sed ':a;N;$!ba;s/\n/,/g" (Taken from here) replaces \n correctly with , but does not work with \r\n. Replacing the \n with [,\r\n] simply returns the input.

I'm stumped. Can anyone offer some help or advice on this?

Answer

Jonathan Leffler picture Jonathan Leffler · May 30, 2013

From your sample output, it seems that the output doesn't have a pipe at the end; you have , marking the end of each field, and | separating pairs of fields. For that specification, this works with tr and sed:

$ x="123,
> 456
> 789,
> 012"
$ echo "$x" | tr -s '\r\n' ',' | sed 's/,\(.\)/,|\1/g'
123,|456,|789,|012,
$

The tr command replaces newline and carriage return with comma, squeezing (-s) duplicates. The sed command looks for a comma followed by another character and replaces it with ,|.