Top "Apache-spark-2.0" questions

Use for questions specific to Apache Spark 2.0. For general questions related to Apache Spark use the tag [apache-spark].

What are the various join types in Spark?

I looked at the docs and it says the following join types are supported: Type of join to perform. Default …

scala apache-spark apache-spark-sql spark-dataframe apache-spark-2.0
Reading csv files with quoted fields containing embedded commas

I am reading a csv file in Pyspark as follows: df_raw=spark.read.option("header","true").csv(csv_path) …

csv apache-spark pyspark apache-spark-sql apache-spark-2.0
Spark parquet partitioning : Large number of files

I am trying to leverage spark partitioning. I was trying to do something like data.write.partitionBy("key").parquet("/location") …

apache-spark spark-dataframe rdd apache-spark-2.0 bigdata
How to create SparkSession from existing SparkContext

I have a Spark application which using Spark 2.0 new API with SparkSession. I am building this application on top of …

scala apache-spark apache-spark-2.0
dynamically bind variable/parameter in Spark SQL?

How to bind variable in Apache Spark SQL? For example: val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) …

scala apache-spark apache-spark-sql apache-spark-2.0
Timeout Exception in Apache-Spark during program Execution

I am running a Bash Script in MAC. This script calls a spark method written in Scala language for a …

scala apache-spark spark-graphx apache-spark-2.0
How to traverse/iterate a Dataset in Spark Java?

I am trying to traverse a Dataset to do some string similarity calculations like Jaro winkler or Cosine Similarity. I …

java apache-spark iterator apache-spark-2.0 apache-spark-dataset
Apache Spark vs Apache Spark 2

What are the improvements Apache Spark2 brings compared to Apache Spark? From architecture perspective From application point of view or …

apache-spark apache-spark-2.0
spark join raises "Detected cartesian product for INNER join"

I have a dataframe and I want to add for each row new_col=max(some_column0) grouped by some …

pyspark spark-dataframe apache-spark-2.0
How to use dataset to groupby

I have a request to use rdd to do so: val test = Seq(("New York", "Jack"), ("Los Angeles", "Tom"), ("Chicago", "…

apache-spark dataset apache-spark-2.0