Query jsonb column containing array of JSON objects

brg picture brg · Jul 31, 2016 · Viewed 7.6k times · Source

I use PostgreSQL 9.5 and Rails 5. I want to query the jsonb column shown below that holds an array of JSON objects to return all the JSON array element containing {"kind":"person"} and also perform a count.
The SQL I use is shown below the json data. Running the query just returns an empty array.

I have tried the queries suggested here and here.

This is what my jsonb data looks like:

   '[
        {"kind":"person", "filter_term":"56","selected_attr":"customer"},
        {"kind":"email", "filter_term":"marketer","selected_attr":"job_title"}
      ]'

I want one of the sql query to return:

                             data
----------------------------------------------------------------------
 '{"kind":"person", "filter_term":"56","selected_attr":"customer"}'
(1 row)

and another query to return array back so that I can call count on it in my app and also loop over it to create forms:

 data
----------------------------------------------------------------------
 '[{"kind":"person", "filter_term":"56","selected_attr":"customer"}]'
 (1 row)

I tried this SQL query:

 "SELECT * FROM \"segments\" WHERE (payload @> '[{\"kind\":\"person\"}]')"

I also tried this query:

  "SELECT payload FROM segments WHERE payload @> '[{\"kind\":\"person\"}]'::jsonb;"

Here is a 3rd query:

 "SELECT * FROM segments s WHERE s.payload->'\"#{a}\"' @> '[{\"kind\":\"person\"}]';"

The model:

class Segment < ApplicationRecord
 store_accessor :payload,:kind, :filter_term, :selected_model_name, :selected_attr, :limit, :selected_operator
end

The migration:

create_table "segments", force: :cascade do |t|

  t.jsonb    "payload",    default: "[]", null: false
  t.index ["payload"], name: "index_segments_on_payload", using: :gin

end

Answer

Erwin Brandstetter picture Erwin Brandstetter · Aug 1, 2016

Assuming this table definition:

CREATE TABLE segments (segments_id serial PRIMARY KEY, payload jsonb);

With JSON values like this:

INSERT INTO segments (payload)
VALUES ('[
            {
                "kind": "person",
                "limit": "1",
                "filter_term": "56",
                "selected_attr": "customer",
                "selected_operator": "less_than"
            },
            {
                "kind": "email",
                "filter_term": "marketer",
                "selected_attr": "job_title",
                "selected_operator": "equals"
            }
        ]'
   );
  • You want to return elements of a JSON array that contain the key/value pair "kind":"person" (not a nested JSON object {"kind":"person"}) - and count array elements as well as table rows (there may be multiple matching array elements per row).

Solutions

To get the count of rows containing a qualifying jsonb value in column segments:

SELECT count(*)
FROM   segments s
WHERE  s.payload @> '[{"kind":"person"}]';

To get all qualifying JSON array elements (being JSON objects themselves) - plus the total count of elements (may be greater than above count at the same time:

SELECT j.*
FROM   segments s
JOIN   LATERAL jsonb_array_elements(s.payload) j(elem) ON j.elem @> '{"kind":"person"}'
WHERE  s.payload @> '[{"kind":"person"}]';

Returns:

elem
------------------------------------------------------------
{"kind": "person", "limit": "1", "filter_term": "56", ... }

To get all at once:

SELECT j.*, count(*) OVER () AS ct_elem, s.ct_rows
FROM  (
   SELECT payload, count(*) OVER () AS ct_rows
   FROM   segments
   WHERE  payload @> '[{"kind":"person"}]'
   ) s
JOIN   LATERAL jsonb_array_elements(s.payload) j(elem) ON j.elem @> '{"kind":"person"}';

Returns (for a table with more entries):

elem                      | ct_elem | ct_rows
--------------------------+---------+---------
{"kind": "person",  ... } | 4       | 3
{"kind": "person",  ... } | 4       | 3
...

But I think you really want this:

SELECT a.*
     , sum(ct_elem_row) OVER () AS ct_elem_total
     , count(*)         OVER () AS ct_rows
FROM   segments s
JOIN   LATERAL (
   SELECT json_agg(j.elem) AS filtered_payload, count(*) AS ct_elem_row
   FROM   jsonb_array_elements(s.payload) j(elem)
   WHERE  j.elem @> '{"kind":"person"}'
   ) a ON ct_elem_row > 0
WHERE  s.payload @> '[{"kind":"person"}]';

Returns (for a table with more entries):

filtered_payload                                     | ct_elem_row | ct_elem_total | ct_rows
-----------------------------------------------------+-------------+---------------+---------
[{"kind": "person", ... }]                           | 1           | 4             | 3
[{"kind": "person", ... }]                           | 1           | 4             | 3
[{"kind": "person", ... }, {"kind": "person", ... }] | 2           | 4             | 3

This identifies matching rows, then select matching array elements and builds an array per row with only those. Plus counts.

For best performance you would have a jsonb_path_ops GIN index like:

CREATE INDEX segments_path_ops_gin_idx ON segments 
USING  gin (payload jsonb_path_ops);

(But a more generic index to serve more different queries may be a better choice.)

Related:

Terminology

We are dealing with a JSON object containing a JSON array, saved as Postgres jsonb data type - a "JSON array" for short, but not an "array of JSON".