Storing images in bytea fields in a PostgreSQL database

Olayemi Odunayo picture Olayemi Odunayo · Jun 15, 2013 · Viewed 19.2k times · Source

I stored an image in a PostgreSQL database with column type bytea using PHP. The problem is every time I try to load the image in a browser it does not appear. The Firefox developer console says the image is either truncated or corrupt.

The PHP code:

//code for inserting into the database
if(array_key_exists('submit_pic', $_POST)){
$user=$_SESSION['name'];
if(isset($_FILES['thumbnail'])&&$_FILES['thumbnail']['size']>0){
$fi =  $_FILES['thumbnail']['tmp_name'];
$p=fopen($fi,'r');
$data=fread($p,filesize($fi));
$data=addslashes($data);
$dat= pg_escape_bytea($data); 
$q="update userinfo set image='{$dat}' where email='$user'";
$e=pg_query($q)or die(pg_last_error());

// code for retreving from database
require_once('conn.php');
session_start();
$user=$_SESSION['name'];
pg_query('SET bytea_output = "escape";');
$lquery ="select image from userinfo where email='$user'";
$lq = pg_query($lquery)or die(pg_last_error());
$lqq=pg_fetch_row($lq,'image');
header("conent-type:image");
echo pg_unescape_bytea($lqq[0]);

and i need to store the uploaded image in a database- i am actually using heroku thanks

Answer

Craig Ringer picture Craig Ringer · Jun 15, 2013

TL;DR:

Delete addslashes($data). It's redundant here.

Double-escaping .. twice

$data=fread($p,filesize($fi));
$data=addslashes($data);
$dat= pg_escape_bytea($data); 

You read the data in, escape it as if it were a string literal, then convert it to bytea octal or hex escapes. It could never work that way around even if pg_escape_bytea was sane, which it isn't.

PHP's pg_escape_bytea appears to double-escape the output so it can be inserted into a string literal. This is incredibly ugly, but there doesn't appear to be an alternative that doesn't do this double-escaping, so you can't seem to use parameterised statements for bytea in PHP. You should still do so for everything else.

In this case, simply removing the addslashes line for the data read in from the file is sufficient.

Test case showing that pg_escape_bytea double-escapes (and always uses the old, inefficient octal escapes, too):

<?php
# oh-the-horror.php
print pg_escape_bytea("Blah binary\x00\x01\x02\x03\x04 blah");
?>

Run:

php oh-the-horror.php

Result:

Blah binary\\000\\001\\002\\003\\004 blah

See the doubled backslashes? That's because it's assuming you're going to interpolate it into SQL as a string, which is extremely memory inefficient, ugly, and a very bad habit. You don't seem to get any alternative, though.

Among other things this means that:

pg_unescape_bytea(pg_escape_bytea("\x01\x02\x03"));

... produces the wrong result, since pg_unescape_bytea is not actually the reverse of pg_escape_bytea. It also makes it impossible to feed the output of pg_escape_bytea into pg_query_params as a parameter, you have to interpolate it in.

Decoding

If you're using a modern PostgreSQL, it probably sets bytea_output to hex by default. That means that if I write my data to a bytea field then fetch it back, it'll look something like this:

craig=> CREATE TABLE byteademo(x bytea);
CREATE TABLE
craig=> INSERT INTO byteademo(x) VALUES ('Blah binary\\000\\001\\002\\003\\004 blah');
INSERT 0 1
craig=> SELECT * FROM byteademo ;
                                     x                                      
----------------------------------------------------------------------------
 \x426c61682062696e6172795c3030305c3030315c3030325c3030335c30303420626c6168
(1 row)

"Um, what", you might say? It's fine, it's just PostgreSQL's slightly more compact hex representation of bytea. pg_unescape_bytea will handle it fine and produce the same raw bytes as output ... if you have a modern PHP and libpq. On older versions you'll get garbage and will need to set bytea_output to escape for pg_unescape_bytea to handle it.

What you should do instead

Use PDO.

It has sane(ish) support for bytea.

$sth = $pdo->prepare('INSERT INTO mytable(somecol, byteacol) VALUES (:somecol, :byteacol)');
$sth->bindParam(':somecol', 'bork bork bork');
$sth->bindParam(':byteacol', $thebytes, PDO::PARAM_LOB);
$sth->execute();

See:

You may also want to look in to PostgreSQL's lob (large object) support, which provides a streaming, seekable interface that's still fully transactional.

Now, on to my soap box

If PHP had a real distinction between "byte string" and "text string" types, you wouldn't even need pg_escape_bytea as the database driver could do it for you. None of this ugliness would be required. Unfortunately, there are no separate string and bytes types in PHP.

Please, use PDO with parameterised statements as much as possible.

Where you can't, at least use pg_query_params and parameterised statements. PHP's addslashes is not an alternative, it's inefficient, ugly, and doesn't understand database specific escaping rules. You still have to manually escape bytea if you're not using PDO for icky historical reasons, but everything else should go through parameterised statements.

For guidance on pg_query_params: