Popular "spark-dataframe" questions | Page 2

I have a sample application working to read from csv files into a dataframe. The dataframe can be stored to …

hadoop apache-spark hive apache-spark-sql spark-dataframe

I have the following sample DataFrame: a | b | c | 1 | 2 | 4 | 0 | null | null| null | 3 | 4 | And I want to replace null values only …

apache-spark pyspark spark-dataframe

I want to overwrite specific partitions instead of all in spark. I am trying the following command: df.write.orc(…

apache-spark apache-spark-sql spark-dataframe

I wanted to convert the spark data frame to add using the code below: from pyspark.mllib.clustering import KMeans …

python apache-spark pyspark spark-dataframe apache-spark-mllib

I looked at the docs and it says the following join types are supported: Type of join to perform. Default …

scala apache-spark apache-spark-sql spark-dataframe apache-spark-2.0

I have a spark data frame df. Is there a way of sub selecting a few columns using a list …

apache-spark apache-spark-sql spark-dataframe

I'm using PySpark and I have a Spark dataframe with a bunch of numeric columns. I want to add a …

python apache-spark pyspark spark-dataframe

How to read partitioned parquet with condition as dataframe, this works fine, val dataframe = sqlContext.read.parquet("file:///home/msoproj/…

scala apache-spark parquet spark-dataframe

I am dealing with transforming SQL code to PySpark code and came across some SQL statements. I don't know how …

apache-spark pyspark spark-dataframe rdd pyspark-sql

I am trying to overwrite the spark session/spark context default configs, but it is picking entire node/cluster resource. …

python apache-spark pyspark spark-dataframe

Top "Spark-dataframe" questions