Popular "rdd" questions

How can I convert an RDD (org.apache.spark.rdd.RDD[org.apache.spark.sql.Row]) to a Dataframe org.…

scala apache-spark apache-spark-sql rdd

According to Learning Spark Keep in mind that repartitioning your data is a fairly expensive operation. Spark also has an …

apache-spark distributed-computing rdd

I have a text file on HDFS and I want to convert it to a Data Frame in Spark. I …

scala apache-spark dataframe apache-spark-sql rdd

I'm just wondering what is the difference between an RDD and DataFrame (Spark 2.0.0 DataFrame is a mere type alias for …

dataframe apache-spark apache-spark-sql rdd apache-spark-dataset

What's the difference between an RDD's map and mapPartitions method? And does flatMap behave like map or like mapPartitions? Thanks. (…

performance scala apache-spark rdd

In Spark version 1.2.0 one could use subtract with 2 SchemRDDs to end up with only the different content from the first …

apache-spark dataframe rdd

How to give more column conditions when joining two dataframes. For example I want to run the following : val Lead_…

apache-spark apache-spark-sql rdd

In terms of RDD persistence, what are the differences between cache() and persist() in spark ?

apache-spark distributed-computing rdd

Trying to read a file located in S3 using spark-shell: scala> val myRdd = sc.textFile("s3n://myBucket/myFile1.…

java scala apache-spark rdd hortonworks-data-platform

I am writing a Spark application and want to combine a set of Key-Value pairs (K, V1), (K, V2), ..., (K, …

python apache-spark mapreduce pyspark rdd

Top "Rdd" questions