Say I run the following query on psql:
> select a.c1, b.c2 into temp_table from db.A as a inner join db.B as b
> on a.x = b.x limit 10;
I get the following message:
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'c1' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
DISTRIBUTED BY
column? temp_table
stored? Is it stored on my client or on the server?DISTRIBUTED BY is how Greenplum determines which segment will store each row. Because Greenplum is an MPP database in most production databases you will have multiple segment servers. You want to make sure that the Distribution column is the column you will join on usaly.
temp_table is a table that will be created for you on the Greenplum cluster. If you haven't set search_path to something else it will be in the public schema.