Top "Spark-dataframe" questions

Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.

Save Spark dataframe as dynamic partitioned table in Hive

I have a sample application working to read from csv files into a dataframe. The dataframe can be stored to …

hadoop apache-spark hive apache-spark-sql spark-dataframe
PySpark: How to fillna values in dataframe for specific columns?

I have the following sample DataFrame: a | b | c | 1 | 2 | 4 | 0 | null | null| null | 3 | 4 | And I want to replace null values only …

apache-spark pyspark spark-dataframe
Overwrite specific partitions in spark dataframe write method

I want to overwrite specific partitions instead of all in spark. I am trying the following command: df.write.orc(…

apache-spark apache-spark-sql spark-dataframe
AttributeError: 'DataFrame' object has no attribute 'map'

I wanted to convert the spark data frame to add using the code below: from pyspark.mllib.clustering import KMeans …

python apache-spark pyspark spark-dataframe apache-spark-mllib
What are the various join types in Spark?

I looked at the docs and it says the following join types are supported: Type of join to perform. Default …

scala apache-spark apache-spark-sql spark-dataframe apache-spark-2.0
Upacking a list to select multiple columns from a spark data frame

I have a spark data frame df. Is there a way of sub selecting a few columns using a list …

apache-spark apache-spark-sql spark-dataframe
Add column sum as new column in PySpark dataframe

I'm using PySpark and I have a Spark dataframe with a bunch of numeric columns. I want to add a …

python apache-spark pyspark spark-dataframe
Reading DataFrame from partitioned parquet file

How to read partitioned parquet with condition as dataframe, this works fine, val dataframe = sqlContext.read.parquet("file:///home/msoproj/…

scala apache-spark parquet spark-dataframe
Apache spark dealing with case statements

I am dealing with transforming SQL code to PySpark code and came across some SQL statements. I don't know how …

apache-spark pyspark spark-dataframe rdd pyspark-sql
spark 2.1.0 session config settings (pyspark)

I am trying to overwrite the spark session/spark context default configs, but it is picking entire node/cluster resource. …

python apache-spark pyspark spark-dataframe