There are two tables:
Authorized Contacts (auth_contacts
):
(
userid varchar
contacts jsonb
)
contacts
contains an array of contacts with attributes {contact_id, type}
discussion
:
(
contact_id varchar
discussion_id varchar
discussion_details jsonb
)
The table auth_contacts
has at least 100k records making it non JSONB type is not appropriate according as it would double or triple the amount of records.
Sample data for auth_contacts
:
userid | contacts
'11111' | '{"contact": [{"type": "type_a", "contact_id": "1-A-12"}
, {"type": "type_b", "contact_id": "1-A-13"}]}'
discussion
table has 5 million odd records.
I want to join on discussion.contact_id
(relational column) with contact id which a json object inside array of json objects in auth_contacts.contacts
.
One very crude way is:
SELECT *
FROM discussion d
JOIN (SELECT userid, JSONB_OBJECT_KEYS(a.contacts) AS auth_contact
FROM auth_contacts a) AS contacts
ON (d.contact_id = contacts.auth_contact::text)
What this does is actually at runtime create (inner sql) userid vs contact id table (Which is what I was avoiding and hence went for JSONB data type This query for a user with large records takes 26 + seconds which is not all good. Tried a few other ways: PostgreSQL 9.4: Aggregate / Join table on JSON field id inside array
But there should be a cleaner and better way which would be as simple as
JOIN d.contact_id = contacts -> contact -> contact_id?
When I try this, it doesn't yield any results.
When searching the net this seems to be a pretty cumbersome task?
Your "crude way" doesn't actually work. Here is another crude way that does:
SELECT *
FROM auth_contacts a
, jsonb_to_recordset(a.contacts->'contact') AS c(contact_id text)
JOIN discussion d USING (contact_id);
As has been commented, you can also formulate a join condition with the contains operator @>
:
SELECT *
FROM auth_contacts a
JOIN discussion d ON a.contacts->'contact'
@> json_build_array(json_build_object('contact_id', d.contact_id))::jsonb
But rather use JSON creation functions than string concatenation. Looks cumbersome but will actually be very fast if supported with a functional jsonb_path_ops GIN index:
CREATE INDEX auth_contacts_contacts_gin_idx ON auth_contacts
USING gin ((contacts->'contact') jsonb_path_ops);
Details:
This is all fascinating to play with, but the problem here is the relational model. Your claim:
hence making it non JSONB type is not appropriate according as it would double or triple the amount of records.
is the opposite of what's right. It's nonsense to wrap IDs you need for joining tables into a JSON document type. Normalize your table with a many-to-many relationship and implement all IDs you are working with inside the DB as separate columns with appropriate data type. Basics: