Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.
I have a sample application working to read from csv files into a dataframe. The dataframe can be stored to …
hadoop apache-spark hive apache-spark-sql spark-dataframeI have the following sample DataFrame: a | b | c | 1 | 2 | 4 | 0 | null | null| null | 3 | 4 | And I want to replace null values only …
apache-spark pyspark spark-dataframeI want to overwrite specific partitions instead of all in spark. I am trying the following command: df.write.orc(…
apache-spark apache-spark-sql spark-dataframeI wanted to convert the spark data frame to add using the code below: from pyspark.mllib.clustering import KMeans …
python apache-spark pyspark spark-dataframe apache-spark-mllibI looked at the docs and it says the following join types are supported: Type of join to perform. Default …
scala apache-spark apache-spark-sql spark-dataframe apache-spark-2.0I have a spark data frame df. Is there a way of sub selecting a few columns using a list …
apache-spark apache-spark-sql spark-dataframeI'm using PySpark and I have a Spark dataframe with a bunch of numeric columns. I want to add a …
python apache-spark pyspark spark-dataframeHow to read partitioned parquet with condition as dataframe, this works fine, val dataframe = sqlContext.read.parquet("file:///home/msoproj/…
scala apache-spark parquet spark-dataframeI am dealing with transforming SQL code to PySpark code and came across some SQL statements. I don't know how …
apache-spark pyspark spark-dataframe rdd pyspark-sqlI am trying to overwrite the spark session/spark context default configs, but it is picking entire node/cluster resource. …
python apache-spark pyspark spark-dataframe