I need to load data from some source data sources to a Postgres database. To do this task, I first write the data to a temporary CSV file and then load data from the CSV file to Postgres database using COPY FROM query. I do all of this on Python.
The code looks like this:
table_name = 'products'
temp_file = "'C:\\Users\\username\\tempfile.csv'"
db_conn = psycopg2.connect(host, port, user, password, database)
cursor = db_conn.cursor()
query = """COPY """ + table_name + """ FROM """ + temp_file + " WITH NULL AS ''; """
cursor.execute(query)
I want to avoid the step of writing to the intermediate file. Instead, I would like to write to a Python object and then load data to postgres database using COPY FROM file method.
I am aware of this technique of using psycopg2's copy_from method which copies data from a StringIO object to the postgres database. However, I cannot use psycopg2 for a reason and hence, I don't want my COPY FROM task to be dependent on a library. I want it to be Postgres query which can be run by any other postgres driver as well.
Please advise a better way of doing this without writing to an intermediate file.
You could call the psql command-line tool from your script (i.e. using subprocess.call
) and leverage its \copy
command, piping the output of one instance to the input of another, avoiding a temp file. i.e.
psql -X -h from_host -U user -c "\copy from_table to stdout" | psql -X -h to_host -U user -c "\copy to_table from stdin"
This assumes the table exists in the destination database. If not, a separate command would first need to create it.
Also, note that one caveat of this method is that errors from the first psql
call can get swallowed by the piping process.