Top "Pyspark-sql" questions

Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.

How to change dataframe column names in pyspark?

I come from pandas background and am used to reading data from CSV files into a dataframe and then simply …

python apache-spark pyspark pyspark-sql
Convert pyspark string to date format

I have a date pyspark dataframe with a string column in the format of MM-dd-yyyy and I am attempting to …

apache-spark pyspark apache-spark-sql pyspark-sql
show distinct column values in pyspark dataframe: python

Please suggest pyspark dataframe alternative for Pandas df['col'].unique(). I want to list out all the unique values in …

pyspark pyspark-sql
Join two data frames, select all columns from one and some columns from the other

Let's say I have a spark data frame df1, with several columns (among which the column 'id') and data frame …

pyspark pyspark-sql
Pyspark: Filter dataframe based on multiple conditions

I want to filter dataframe according to the following conditions firstly (d<5) and secondly (value of col2 not equal …

sql filter pyspark apache-spark-sql pyspark-sql
Split Spark Dataframe string column into multiple columns

I've seen various people suggesting that Dataframe.explode is a useful way to do this, but it results in more …

apache-spark pyspark apache-spark-sql spark-dataframe pyspark-sql
How to find count of Null and Nan values for each column in a PySpark dataframe efficiently?

import numpy as np df = spark.createDataFrame( [(1, 1, None), (1, 2, float(5)), (1, 3, np.nan), (1, 4, None), (1, 5, float(10)), (1, 6, float('nan')), (1, 6, float('nan'))], ('session', "timestamp1", "id2")) …

apache-spark pyspark apache-spark-sql pyspark-sql
How to get name of dataframe column in pyspark?

In pandas, this can be done by column.name. But how to do the same when its column of spark …

pyspark pyspark-sql
Trim string column in PySpark dataframe

I'm beginner on Python and Spark. After creating a DataFrame from CSV file, I would like to know how I …

apache-spark pyspark apache-spark-sql trim pyspark-sql
Apache spark dealing with case statements

I am dealing with transforming SQL code to PySpark code and came across some SQL statements. I don't know how …

apache-spark pyspark spark-dataframe rdd pyspark-sql