How to read and insert bytea columns using psycopg2?

amphibient picture amphibient · Oct 14, 2016 · Viewed 11k times · Source

I am working on a Python script to replicate some Postgresql tables from one environment to another (which does a little more than pg_dump). It works except when I am copying a table that has bytea data type.

I read the source table data in memory, then I dump the memory in the target database with concatenated inserts.

Here is my method that produces an insert statement:

def generateInsert(self, argCachedRow):

    colOrd = 0;

    valClauseList = []

    hasBinary = False

    for colData in argCachedRow:
        colOrd += 1
        colName = self.colOrdLookup.get(colOrd)
        col = self.colLookup.get(colName)
        dataType = col.dataType

        insVal = None

        if colData is not None:

            strVal = str(colData)
            if dataType.useQuote:

                if "'" in strVal:
                    strVal = strVal.replace("'", "''")
                insVal = "'%s'" % strVal
            else:
                if dataType.binary:
                    hasBinary = True
                    #insVal = psycopg2.Binary(colData)
                #else:

                insVal = strVal
        else:
            insVal = "NULL"

        valClauseList.append(insVal)

    valClause = ", ".join(valClauseList)

    if hasBinary:
        valClause = psycopg2.Binary(valClause)

    result = "INSERT INTO %s VALUES (%s)" % (self.name, valClause)

    return result

which works with every table that doesn't have binary data.

I also tried (intuitively) to wrap just the binary column data in psycopg2.Binary, which is the commented out line and then not do it to the whole row value list but that didn't work either.

Here is my simple DataType wrapper, which is loaded by reading Postgres' information_schema tables:

class DataType(object):
    def __init__(self, argDispName, argSqlName, argUseQuote, argBin):
        self.dispName = argDispName
        self.sqlName = argSqlName
        self.useQuote = argUseQuote
        self.binary = argBin

How do I read and insert bytea columns using psycopg2?

Answer

illright picture illright · Oct 14, 2016

If you have this database structure:

CREATE TABLE test (a bytea,
                   b int,
                   c text)

then inserting binary data into the request can be done like so, without any wrappers:

bin_data = b'bytes object'
db = psycopg2.connect(*args)  # DB-API 2.0
c = db.cursor()
c.execute('''INSERT INTO test VALUES (%s, %s, %s)''', (bin_data, 1337, 'foo'))
c.execute('''UPDATE test SET a = %s''', (bin_data + b'1',))

Then, when you query it:

c.execute('''SELECT a FROM test''')

You'll receive a memoryview, which is easily converted back to bytes:

mview = c.fetchone()
new_bin_data = bytes(mview)
print(new_bin_data)

Output: b'bytes object1'

Also, I'd suggest you not to assemble queries by string formatting. psycopg2's built-in parameter substitution is much more convenient and you don't have to worry about validating data to protect from SQL injections.