Use for questions specific to Apache Spark 2.0. For general questions related to Apache Spark use the tag [apache-spark].
I looked at the docs and it says the following join types are supported: Type of join to perform. Default …
scala apache-spark apache-spark-sql spark-dataframe apache-spark-2.0I am reading a csv file in Pyspark as follows: df_raw=spark.read.option("header","true").csv(csv_path) …
csv apache-spark pyspark apache-spark-sql apache-spark-2.0I am trying to leverage spark partitioning. I was trying to do something like data.write.partitionBy("key").parquet("/location") …
apache-spark spark-dataframe rdd apache-spark-2.0 bigdataI have a Spark application which using Spark 2.0 new API with SparkSession. I am building this application on top of …
scala apache-spark apache-spark-2.0How to bind variable in Apache Spark SQL? For example: val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) …
scala apache-spark apache-spark-sql apache-spark-2.0I am running a Bash Script in MAC. This script calls a spark method written in Scala language for a …
scala apache-spark spark-graphx apache-spark-2.0I am trying to traverse a Dataset to do some string similarity calculations like Jaro winkler or Cosine Similarity. I …
java apache-spark iterator apache-spark-2.0 apache-spark-datasetWhat are the improvements Apache Spark2 brings compared to Apache Spark? From architecture perspective From application point of view or …
apache-spark apache-spark-2.0I have a dataframe and I want to add for each row new_col=max(some_column0) grouped by some …
pyspark spark-dataframe apache-spark-2.0I have a request to use rdd to do so: val test = Seq(("New York", "Jack"), ("Los Angeles", "Tom"), ("Chicago", "…
apache-spark dataset apache-spark-2.0