I am working on a copy command to copy the contents of a tab delimited file with a header to a table in postgres. I have used copy before so I know how it works but I have a question regarding how I can get around some data being missing in the file.
I have premade the table to have the same column names as the values in the header.
Some of the "columns" from the file that I received are just blank. I have put fake data in just to get the command working but this is something that is going to be automated weekly. I was wondering if I am just going to have to tell the people to make sure their data is correct or if there is a way to tell the copy command to input a NULL in the place of blank data.
I thought I might have to place in blank tabs and that would act as a null but I wasn't sure if that was the best course of action.
Documentation for COPY
specifies, that default value for NULL
s is \N
. So changing it to be empty as @a_horse_with_no_name suggest is the way to go.
Another way is to preprocess the file using any other tools. I quite often do the following in my scripts:
psql dbname <<EOSQL
COPY tab FROM stdin WITH (FORMAT 'csv', DELIMITER E'\t', NULL 'NULL');
$(printf "1\t2\t3\n4\t\t6\n7\t8\t\n\t10\t11\n"| \
sed -e 's/\(\t\|^\)\(\t\|$\)/\1NULL\2/g')
\.
EOSQL
Instead of printf
you can cat your file. Use whatever manipulations required.
Note: I'm on OSX, so sed
syntax may vary for you.
Also, in the current PostgreSQL CommitFest there's a patch to add generic Pre- and Post- processors to the COPY
command. Here you can find the whole discussion.