Popular "rdd" questions | Page 3

I'm working through these two concepts right now and would like some clarity. From working through the command line, I've …

apache-spark pyspark rdd

I'm looking for a way to split an RDD into two or more RDDs. The closest I've seen is Scala …

apache-spark pyspark rdd

How would you perform basic joins in Spark using python? In R you could use merg() to do this. What …

python join apache-spark pyspark rdd

I am trying to leverage spark partitioning. I was trying to do something like data.write.partitionBy("key").parquet("/location") …

apache-spark spark-dataframe rdd apache-spark-2.0 bigdata

I prefer Python over Scala. But, as Spark is natively written in Scala, I was expecting my code to run …

scala performance apache-spark pyspark rdd

Is there a way to concatenate datasets of two different RDDs in spark? Requirement is - I create two intermediate …

scala apache-spark apache-spark-sql distributed-computing rdd

I have a simple line: line = "Hello, world" I would like to convert it to an RDD with only one …

python apache-spark pyspark distributed-computing rdd

The Spark research paper has prescribed a new distributed programming model over classic Hadoop MapReduce, claiming the simplification and vast …

apache-spark rdd directed-acyclic-graphs

I want to create a DataFrame from a list of string that could match existing schema. Here is my code. …

scala apache-spark dataframe rdd union-all

I am new to Spark and Scala. I was confused about the way reduceByKey function works in Spark. Suppose we …

scala apache-spark rdd

Top "Rdd" questions