How does createOrReplaceTempView work in Spark?

Abir Chokraborty picture Abir Chokraborty · May 16, 2017 · Viewed 104.8k times · Source

I am new to Spark and Spark SQL.

How does createOrReplaceTempView work in Spark?

If we register an RDD of objects as a table will spark keep all the data in memory?

Answer

Garren S picture Garren S · May 17, 2017

createOrReplaceTempView creates (or replaces if that view name already exists) a lazily evaluated "view" that you can then use like a hive table in Spark SQL. It does not persist to memory unless you cache the dataset that underpins the view.

scala> val s = Seq(1,2,3).toDF("num")
s: org.apache.spark.sql.DataFrame = [num: int]

scala> s.createOrReplaceTempView("nums")

scala> spark.table("nums")
res22: org.apache.spark.sql.DataFrame = [num: int]

scala> spark.table("nums").cache
res23: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [num: int]

scala> spark.table("nums").count
res24: Long = 3

The data is cached fully only after the .count call. Here's proof it's been cached:

Cached nums temp view/table

Related SO: spark createOrReplaceTempView vs createGlobalTempView

Relevant quote (comparing to persistent table): "Unlike the createOrReplaceTempView command, saveAsTable will materialize the contents of the DataFrame and create a pointer to the data in the Hive metastore." from https://spark.apache.org/docs/latest/sql-programming-guide.html#saving-to-persistent-tables

Note : createOrReplaceTempView was formerly registerTempTable