Convert PostgreSQL bytea-stored serialized-java-UUID to postgresql-UUID

FvHovell picture FvHovell · Jul 22, 2014 · Viewed 7.8k times · Source

One of our software-projects uses a PostgreSQL-table with a column 'guid' of type bytea.

This is used with hibernate 3.3.2.GA with PostgreSQL 8.4, which serializes the java UUID type using java object serialization. The result is a value like the following escape format bytea literal:

'\254\355\000\005sr\000\016java.util.UUID\274\231\003\367\230m\205/\002\000\002‌​J\000\014leastSigBitsJ\000\013mostSigBitsxp\273\222)\360*r\322\262u\274\310\020\3‌​42\004M '

... which we cannot easily use in a query as select or condition to retrieve relevant rows.

Does anyone have a way to read or use the bytea-column in the select- or where-parts of a query (e.g. via psql or pgadmin3), without setting up some hibernate-query?

Answer

Craig Ringer picture Craig Ringer · Jul 22, 2014

Update: See edit to question, this answer applies to the commonplace 16-byte serializations of uuid; the question was amended to reflect java serialization.


Interesting problem. I landed up writing a simple C extension to do it efficiently, but it's probably more sensible to use the PL/Python version below.

Because uuid is a fixed sized type and bytea is varlena you can't just create cast ... as implicit to binary-coerce them, because the variable length field header would get in the way.

There's no built-in function for bytea input to return a uuid. It'd be a handy thing to have, but I don't think anyone's done it yet.

Simplest way

Update: There's actually a simple way to do this. bytea in hex form is actually a valid uuid literal once the \x is stripped off, because uuid_in accepts plain undecorated hex without - or {}. So just:

regress=> SET bytea_output = 'hex';
SET
regress=> SELECT CAST( substring(CAST (BYTEA '\x0FCC6350118D11E4A5597DE5338EB025' AS text) from 3) AS uuid);
              substring               
--------------------------------------
 0fcc6350-118d-11e4-a559-7de5338eb025
(1 row)

It involves a couple of string copies and a hex encode/decode cycle, but it'll be tons faster than any of the PL answers I suggested earlier, though slower than C.

Other options

Personally I recommend using PL/Perl or pl/pythonu. I'll follow up with an example.

Assuming your uuid is the hex-format bytea literal:

'\x0FCC6350118D11E4A5597DE5338EB025'

you could turn it into a uuid type with:

PL/Perl

create language plperlu;

create or replace function to_uuid(bytea) returns uuid language plperlu immutable as $$
use Data::UUID;
my $ug = new Data::UUID;
my $uuid = $ug->from_hexstring(substr($_[0],2));
return $ug->to_string($uuid);
$$
SET bytea_output = hex;

SELECT to_uuid(BYTEA '\x0FCC6350118D11E4A5597DE5338EB025');

PL/Python

It's probably faster and cleaner in Python because the PL/Python interface passes bytea as raw bytes not as hex strings:

CREATE LANGUAGE plpythonu;

CREATE or replace function to_uuid(uuidbytes bytea) 
RETURNS uuid LANGUAGE plpythonu IMMUTABLE 
AS $$
import uuid
return uuid.UUID(bytes=uuidbytes)
$$;

SELECT to_uuid(BYTEA '\x0FCC6350118D11E4A5597DE5338EB025');

In C, just for kicks. Ugly hack.

You can see the C extension module here.

But really, I mean it about it being ugly. If you want it done properly in C, it's best to actually patch PostgreSQL rather than use an extension.