How to implement "Cross Join" in Spark?

Shawn Guo picture Shawn Guo · Jul 21, 2014 · Viewed 12.7k times · Source

We plan to move Apache Pig code to the new Spark platform.

Pig has a "Bag/Tuple/Field" concept and behaves similarly to a relational database. Pig provides support for CROSS/INNER/OUTER joins.

For CROSS JOIN, we can use alias = CROSS alias, alias [, alias …] [PARTITION BY partitioner] [PARALLEL n];

But as we move to the Spark platform I couldn't find any counterpart in the Spark API. Do you have any idea?

Answer

Daniel Darabos picture Daniel Darabos · Jul 21, 2014

It is oneRDD.cartesian(anotherRDD).