Top "Pyspark" questions

The Spark Python API (PySpark) exposes the apache-spark programming model to Python.

How to change dataframe column names in pyspark?

I come from pandas background and am used to reading data from CSV files into a dataframe and then simply …

python apache-spark pyspark pyspark-sql
Load CSV file with Spark

I'm new to Spark and I'm trying to read CSV data from a file with Spark. Here's what I am …

python csv apache-spark pyspark
How do I add a new column to a Spark DataFrame (using PySpark)?

I have a Spark DataFrame (using PySpark 1.5.1) and would like to add a new column. I've tried the following without …

python apache-spark dataframe pyspark apache-spark-sql
Filter Pyspark dataframe column with None value

I'm trying to filter a PySpark dataframe that has None as a row value: df.select('dt_mvmt').distinct().collect() […

python apache-spark dataframe pyspark apache-spark-sql
Spark DataFrame groupBy and sort in the descending order (pyspark)

I'm using pyspark(Python 2.7.9/Spark 1.3.1) and have a dataframe GroupObject which I need to filter & sort in the descending …

python apache-spark dataframe pyspark apache-spark-sql
how to change a Dataframe column from String type to Double type in pyspark

I have a dataframe with column as String. I wanted to change the column type to Double type in PySpark. …

python apache-spark dataframe pyspark apache-spark-sql
How to delete columns in pyspark dataframe

>>> a DataFrame[id: bigint, julian_date: string, user_id: bigint] >>> b DataFrame[id: bigint, …

apache-spark apache-spark-sql pyspark
Convert pyspark string to date format

I have a date pyspark dataframe with a string column in the format of MM-dd-yyyy and I am attempting to …

apache-spark pyspark apache-spark-sql pyspark-sql
Convert spark DataFrame column to python list

I work on a dataframe with two column, mvv and count. +---+-----+ |mvv|count| +---+-----+ | 1 | 5 | | 2 | 9 | | 3 | 3 | | 4 | 1 | i would like …

python apache-spark pyspark spark-dataframe
show distinct column values in pyspark dataframe: python

Please suggest pyspark dataframe alternative for Pandas df['col'].unique(). I want to list out all the unique values in …

pyspark pyspark-sql