Popular "apache-spark" questions | Page 7

I am trying to read a csv file into a dataframe. I know what the schema of my dataframe should …

scala apache-spark dataframe apache-spark-sql spark-csv

Suppose I give three files paths to a Spark context to read and each file has a schema in the …

scala csv apache-spark

I have a DataFrame generated as follow: df.groupBy($"Hour", $"Category") .agg(sum($"value") as "TotalValue") .sort($"Hour".asc, $"TotalValue".…

sql scala apache-spark dataframe apache-spark-sql

I am using the randomSplitfunction to get a small amount of a dataframe to use in dev purposes and I …

scala apache-spark

I am trying to run a spark program where i have multiple jar files, if I had only one jar …

submit apache-spark classpath

I want to check the spark version in cdh 5.7.0. I have searched on the internet but not able to understand. …

apache-spark hadoop cloudera

In Spark version 1.2.0 one could use subtract with 2 SchemRDDs to end up with only the different content from the first …

apache-spark dataframe rdd

How to give more column conditions when joining two dataframes. For example I want to run the following : val Lead_…

apache-spark apache-spark-sql rdd

I am using CDH 5.2. I am able to use spark-shell to run the commands. How can I run the file(…

scala apache-spark cloudera-cdh cloudera-manager

In terms of RDD persistence, what are the differences between cache() and persist() in spark ?

apache-spark distributed-computing rdd

Top "Apache-spark" questions