Union query distinct on one column

Marek M. picture Marek M. · Jul 29, 2017 · Viewed 8.5k times · Source

I would like results from the second query to override results from the first query:

SELECT "panel_restaurants_restaurant"."id",
       "panel_restaurants_restaurant"."name",
       "panel_restaurants_restaurant"."logo",
       "panel_restaurants_restaurantfeatures"."currency" AS "currency",
       ST_DistanceSphere(location, ST_GeomFromText('POINT(0.0 0.0)',4326)) AS "distance",
       "panel_meals_meal"."id" AS "meal_id",
       "panel_meals_meal"."status" AS "meal_status",
       "panel_meals_meal"."available_count" AS "available_dishes",
       "panel_meals_meal"."discount_price" AS "discount_price",
       "panel_meals_meal"."normal_price" AS "normal_price",
       "panel_meals_meal"."collection_from" AS "pickup_from",
       "panel_meals_meal"."collection_to" AS "pickup_to",
       "panel_meals_meal"."description" AS "meal_description"
FROM "panel_restaurants_restaurant"
INNER JOIN "panel_restaurants_restaurantfeatures" ON (
    "panel_restaurants_restaurantfeatures"."restaurant_id" = "panel_restaurants_restaurant"."id")
LEFT OUTER JOIN "panel_meals_meal" ON ("panel_restaurants_restaurant"."id" = "panel_meals_meal"."restaurant_id"
                AND "panel_meals_meal"."status" = 0
                AND (
                ("panel_meals_meal"."collection_from" AT TIME ZONE 'Europe/Warsaw')::date = DATE 'today' OR
                ("panel_meals_meal"."collection_from" AT TIME ZONE 'Europe/Warsaw')::date = DATE 'tomorrow'
                )
                AND "panel_meals_meal"."collection_to" > '2017-07-29 19:33:47.992075+00:00'
                AND "panel_meals_meal"."available_count" > 0)
WHERE "panel_restaurants_restaurant"."status" = 2
UNION
SELECT "panel_restaurants_restaurant"."id",
       "panel_restaurants_restaurant"."name",
       "panel_restaurants_restaurant"."logo",
       "panel_restaurants_restaurantfeatures"."currency" AS "currency",
       ST_DistanceSphere(location, ST_GeomFromText('POINT(0.0 0.0)',4326)) AS "distance",
       "panel_meals_meal"."id" AS "meal_id",
       "panel_meals_meal"."status" AS "meal_status",
       "panel_meals_meal"."initial_count" AS "available_dishes",
       "panel_meals_meal"."discount_price" AS "discount_price",
       "panel_meals_meal"."normal_price" AS "normal_price",
       "panel_meals_meal"."collection_from" AS "pickup_from",
       "panel_meals_meal"."collection_to" AS "pickup_to",
       "panel_meals_meal"."description" AS "meal_description"
FROM "panel_restaurants_restaurant"
INNER JOIN "panel_restaurants_restaurantfeatures" ON (
       "panel_restaurants_restaurantfeatures"."restaurant_id" = "panel_restaurants_restaurant"."id")
LEFT OUTER JOIN "panel_meals_meal" ON (
    "panel_restaurants_restaurant"."id" = "panel_meals_meal"."restaurant_id" AND
    "panel_meals_meal"."status" = 0)
INNER JOIN "panel_meals_mealrepeater" ON (
    "panel_meals_mealrepeater"."meal_id" = "panel_meals_meal"."id")
WHERE "panel_restaurants_restaurant"."status" = 2    AND "panel_meals_mealrepeater"."saturday" = true
ORDER BY distance ASC

For example - the first query may return nulls for what comes from the panel_meals_meal table, but the second one will return something - in that situation I will have the same values for id, name, logo, currency, distance and different values (nulls returned from the first query, and something from the other one) for all the other ones.

So the question is - how do I make this UNION distinct on a certain range of columns (actually only one would suffice - id)?

Answer

joanolo picture joanolo · Jul 29, 2017

You can do what you want by means of a FULL OUTER JOIN instead of a UNION, and use COALESCE to your advantage.

I simplify your scenario to concentrate on the FULL OUTER JOIN part:

This are the tables (think of them as the result of your first SELECT before whe UNION, and the second SELECT after said UNION):

CREATE TABLE table_a
(
    id INTEGER NOT NULL PRIMARY KEY,
    name TEXT,
    logo TEXT
) ;
CREATE TABLE table_b
(
    id INTEGER NOT NULL PRIMARY KEY,
    name TEXT,
    logo TEXT
) ;

These are the data we have in them:

INSERT INTO 
   table_a
   (id, name, logo)
VALUES
   (1, 'Name1-A', 'Logo1-A'),
   (2, NULL, NULL),
   (3, 'Name3-A', NULL),
   (4, NULL, 'Logo4-A'),
   (5, 'Name5-only-in-A', NULL);

INSERT INTO 
   table_b
   (id, name, logo)
VALUES
   (1, 'Name1-B', 'Logo1-B'),
   (2, 'Name2-B', NULL),
   (3, 'Name3-B', 'Logo3-B'),
   (4, 'Name4-B', 'Logo4-B'),
   (6, 'Name6-only-in-B', 'Logo6-B');

The query you're looking for is done by joining in such a way that you retrieve all the rows from both table_a and table_b. Then, you use:

SELECT
    id, 
    COALESCE(a.name, b.name) AS name,
    COALESCE(a.logo, b.logo) AS logo
FROM
    table_a AS a
    FULL OUTER JOIN table_b AS b USING(id) 
ORDER BY
    id ;
id | name            | logo   
-: | :-------------- | :------
 1 | Name1-A         | Logo1-A
 2 | Name2-B         | null   
 3 | Name3-A         | Logo3-B
 4 | Name4-B         | Logo4-A
 5 | Name5-only-in-A | null   
 6 | Name6-only-in-B | Logo6-B

dbfiddle here


In your case, substitute table_a AS a by your full first (SELECT ...) AS a, and the same for b. I've assumed id are your primary keys.

References: