Spark Dataset 2.0 provides two functions createOrReplaceTempView
and createGlobalTempView
. I am not able to understand the basic difference between both functions.
According to API documents:
createOrReplaceTempView: The lifetime of this
temporary view is tied to the [[SparkSession]] that was used to create this Dataset.
So, when I call sparkSession.close()
the defined will be destroyed. is it true?
createGlobalTempView: The lifetime of this temporary view is tied to this Spark application.
when this type of view will be destroyed? any example. like sparkSession.close()?
The Answer to your questions is basically understanding the difference of a Spark Application and a Spark Session.
Spark application can be used:
A SparkSession on the other hand is associated to a Spark Application:
Global temporary views are introduced in Spark 2.1.0 release. This feature is useful when you want to share data among different sessions and keep alive until your application ends.Please see a shot sample I wrote to illustrate the use for createTempView
and createGlobalTempView
object NewSessionApp {
def main(args: Array[String]): Unit = {
val logFile = "data/README.md" // Should be some file on your system
val spark = SparkSession.
builder.
appName("Simple Application").
master("local").
getOrCreate()
val logData = spark.read.textFile(logFile).cache()
logData.createGlobalTempView("logdata")
spark.range(1).createTempView("foo")
// within the same session the foo table exists
println("""spark.catalog.tableExists("foo") = """ + spark.catalog.tableExists("foo"))
//spark.catalog.tableExists("foo") = true
// for a new session the foo table does not exists
val newSpark = spark.newSession
println("""newSpark.catalog.tableExists("foo") = """ + newSpark.catalog.tableExists("foo"))
//newSpark.catalog.tableExists("foo") = false
//both session can access the logdata table
spark.sql("SELECT * FROM global_temp.logdata").show()
newSpark.sql("SELECT * FROM global_temp.logdata").show()
spark.stop()
}
}