I am using the 'preview' Google DataProc Image 1.1 with Spark 2.0.0. To complete one of my operations I have to complete a cartesian product. Since version 2.0.0 there has been a spark configuration parameter created (spark.sql.cross Join.enabled) that prohibits cartesian products and an Exception is thrown. How can I set spark.sql.crossJoin.enabled=true, preferably by using an initialization action?
spark.sql.crossJoin.enabled=true
Spark >= 3.0
spark.sql.crossJoin.enable
is true by default (SPARK-28621).
Spark >= 2.1
You can use crossJoin
:
df1.crossJoin(df2)
It makes your intention explicit and keeps more conservative configuration in place to protect you from unintended cross joins.
Spark 2.0
SQL properties can be set dynamically on runtime with RuntimeConfig.set
method so you should be able to call
spark.conf.set("spark.sql.crossJoin.enabled", true)
whenever you want to explicitly allow Cartesian product.