Join tables using a value inside a JSONB column

Prachi Tripathi picture Prachi Tripathi · Jul 9, 2015 · Viewed 18.4k times · Source

There are two tables:

Authorized Contacts (auth_contacts):

(
userid varchar
contacts jsonb
)

contacts contains an array of contacts with attributes {contact_id, type}

discussion:

(
contact_id varchar
discussion_id varchar
discussion_details jsonb
)

The table auth_contacts has at least 100k records making it non JSONB type is not appropriate according as it would double or triple the amount of records.

Sample data for auth_contacts:

userid  | contacts
'11111' | '{"contact": [{"type": "type_a", "contact_id": "1-A-12"}
                      , {"type": "type_b", "contact_id": "1-A-13"}]}'

discussion table has 5 million odd records.

I want to join on discussion.contact_id (relational column) with contact id which a json object inside array of json objects in auth_contacts.contacts.

One very crude way is:

SELECT *
FROM discussion d 
JOIN (SELECT userid, JSONB_OBJECT_KEYS(a.contacts) AS auth_contact
      FROM auth_contacts a) AS contacts
      ON (d.contact_id = contacts.auth_contact::text)

What this does is actually at runtime create (inner sql) userid vs contact id table (Which is what I was avoiding and hence went for JSONB data type This query for a user with large records takes 26 + seconds which is not all good. Tried a few other ways: PostgreSQL 9.4: Aggregate / Join table on JSON field id inside array

But there should be a cleaner and better way which would be as simple as JOIN d.contact_id = contacts -> contact -> contact_id? When I try this, it doesn't yield any results.

When searching the net this seems to be a pretty cumbersome task?

Answer

Erwin Brandstetter picture Erwin Brandstetter · Jul 10, 2015

Proof of concept

Your "crude way" doesn't actually work. Here is another crude way that does:

SELECT *
FROM  auth_contacts a
    , jsonb_to_recordset(a.contacts->'contact') AS c(contact_id text)
JOIN  discussion d USING (contact_id);

As has been commented, you can also formulate a join condition with the contains operator @>:

SELECT *
FROM   auth_contacts a
JOIN   discussion d ON a.contacts->'contact'
                    @> json_build_array(json_build_object('contact_id', d.contact_id))::jsonb

But rather use JSON creation functions than string concatenation. Looks cumbersome but will actually be very fast if supported with a functional jsonb_path_ops GIN index:

CREATE INDEX auth_contacts_contacts_gin_idx ON auth_contacts
USING  gin ((contacts->'contact') jsonb_path_ops);

Details:

Proper solution

This is all fascinating to play with, but the problem here is the relational model. Your claim:

hence making it non JSONB type is not appropriate according as it would double or triple the amount of records.

is the opposite of what's right. It's nonsense to wrap IDs you need for joining tables into a JSON document type. Normalize your table with a many-to-many relationship and implement all IDs you are working with inside the DB as separate columns with appropriate data type. Basics: