I need to convert a list of IDs from using a delimiter consisting of ,
and/or \r\n
or \n
to using ,|
. (essentially: s/[,\r\n]+/,\|/g
without a trailing |
)
Example input data:
123,456,789,012
or
123,
456
789,
012
and I need the resulting output to be 123,|456,|798,|012,
: a comma ending each field, and a pipe separating them.
This seems really simple to do, but I'm quite stumped on how to manage this. I've tried ... quite a few ways, actually, but nothing seems to work. Here are a few examples:
sed "s/[,\r\n]+/,\|/g" < filename
does not match any of the delimiters.
sed "s/(,|,?\r?\n?)/,\|/g"
does not match anything either.
tr -t "(,?(\r|\n)+)" ",\|"
and tr -t "[,\r\n]+" ",\|"
only replace ,
tr "(,|\r?\n)" ",\|"
works correctly with ,
but with ,\n
and ,\r\n
it replaces the matched characters with multiple bars. Ex: 123|||456|||789|||012|
Getting more complex: sed ':a;N;$!ba;s/\n/,/g"
(Taken from here) replaces \n
correctly with ,
but does not work with \r\n
. Replacing the \n
with [,\r\n]
simply returns the input.
I'm stumped. Can anyone offer some help or advice on this?
From your sample output, it seems that the output doesn't have a pipe at the end; you have ,
marking the end of each field, and |
separating pairs of fields. For that specification, this works with tr
and sed
:
$ x="123,
> 456
> 789,
> 012"
$ echo "$x" | tr -s '\r\n' ',' | sed 's/,\(.\)/,|\1/g'
123,|456,|789,|012,
$
The tr
command replaces newline and carriage return with comma, squeezing (-s
) duplicates. The sed
command looks for a comma followed by another character and replaces it with ,|
.