I am trying to scrub some lists into a properly formatted CSV file for database import.
My starting file, looks something like this with what is supposed to be each "line" spanning multiple lines like below
Mr. John Doe
Exclusively Stuff, 186
Caravelle Drive, Ponte Vedra
33487.
I created a sed
script that cleans up the file (there's lots of "dirty" formatting like double spaces and spaces before/after commas). The problem is the Zip with the period. I would like to change that period for a new line, but I cannot get it to work.
The command that I use is:
sed -E -f scrub.sed test.txt
and the scrub.sed
script is as follows:
:a
N
s|[[:space:]][[:space:]]| |g
s|,[[:space:]]|,|g
s|[[:space:]],|,|g
s|\n| |g
s|[[:space:]]([0-9]{5})\.|,FL,\1\n |g
$!ba
What I get is
Mr. John Doe,Exclusively Stuff,186 Caravelle Drive,Ponte Vedra,FL,33487n
If figured that the Zip+.(period) would be a great "delimiter" to use the substitution on and while I can find it, I can't seem to tell it to put a newline there.
Most of the things I found online are about replacing the newline with something else (usually deleting them), but not much on replacing with a newline. I did find this, but it didn't work: How to insert newline character after comma in `),(` with sed?
Is there something I am missing?
Update:
I edited my scrub.sed file putting the literal new line as instucted. It still doesn't work
:a
N
s|[[:space:]][[:space:]]| |g
s|,[[:space:]]|,|g
s|[[:space:]],|,|g
s|\n| |g
s|[[:space:]]([0-9]{5})\.|,FL,\1\
|g
$!ba
What I get is (everything on one line):
Mr. John Doe,Exclusively Stuff,186 Caravelle Drive,Ponte Vedra,FL,33487 Mrs. Jane Smith,Props and Stuff,123 Main Drive,Jacksonville,FL,336907
My expected output should be:
Mr. John Doe,Exclusively Stuff,186 Caravelle Drive,Ponte Vedra,FL,33487
Mrs. Jane Smith,Props and Stuff,123 Main Drive,Jacksonville,FL,336907
The sed
on BSD does not support the \n
representation of a new line (turning it into a literal n
):
$ echo "123." | sed -E 's/([[:digit:]]*)\./\1\n next line/'
123n next line
GNU sed
does support the \n
representation:
$ echo "123." | gsed -E 's/([[:digit:]]*)\./\1\nnext line/'
123
next line
Alternatives are:
Use a single character delimiter that you then use tr
translate into a new line:
$ echo "123." | sed -E 's/([[:digit:]]*)\./\1|next line/' | tr '|' '\n'
123
next line
Or use an escaped literal new line in your sed script:
$ echo "123." | sed -E 's/([[:digit:]]*)\./\1\
next line/'
123
next line
Or define a new line:
POSIX:
nl='
'
BASH / zsh / others that support ANSI C quoting:
nl=$'\n'
And then use sed
with appropriate quoting and escapes to insert the literal \n
:
echo "123." | sed 's/\./'"\\${nl}"'next line/'
123
next line
Or use awk
:
$ echo "123." | awk '/^[[:digit:]]+\./{sub(/\./,"\nnext line")} 1'
123
next line
Or use GNU sed which supports \n