Popular "rdd" questions | Page 6

There is not an isEmpty method on RDD's, so what is the most efficient way of testing if an RDD …

scala apache-spark rdd

I am using Spark 1.0.1 to process a large amount of data. Each row contains an ID number, some with duplicate …

apache-spark filter rdd

I am new to Spark. can someone please clear my doubt: Lets assume below is my code: a = sc.textFile(…

apache-spark pyspark rdd

I have an RDD whose elements are of type (Long, String). For some reason, I want to save the whole …

scala apache-spark hdfs rdd bigdata

I have the following table as a RDD: Key Value 1 y 1 y 1 y 1 n 1 n 2 y 2 n 2 n I want …

python apache-spark rdd

What will happen for large files in these cases? 1) Spark gets a location from NameNode for data . Will Spark stop …

apache-spark rdd partition

Is it possible to pass extra arguments to the mapping function in pySpark? Specifically, I have the following code recipe: …

python apache-spark pyspark rdd

The Spark documentation shows how to create a DataFrame from an RDD, using Scala case classes to infer a schema. …

scala apache-spark dataframe apache-spark-sql rdd

I understand that partitionBy function partitions my data. If I use rdd.partitionBy(100) it will partition my data by key …

python apache-spark pyspark partitioning rdd

From my Spark UI. What does it mean by skipped?

apache-spark rdd

Top "Rdd" questions