Top "Pyspark" questions

The Spark Python API (PySpark) exposes the apache-spark programming model to Python.

Filtering DataFrame using the length of a column

I want to filter a DataFrame using a condition related to the length of a column, this question might be …

python apache-spark dataframe pyspark apache-spark-sql
How to convert Spark RDD to pandas dataframe in ipython?

I have a RDD and I want to convert it to pandas dataframe. I know that to convert and RDD …

python pandas ipython pyspark rdd
How do I read a parquet in PySpark written from Spark?

I am using two Jupyter notebooks to do different things in an analysis. In my Scala notebook, I write some …

python scala apache-spark pyspark data-science-experience
How to flatten a struct in a Spark dataframe?

I have a dataframe with the following structure: |-- data: struct (nullable = true) | |-- id: long (nullable = true) | |-- keyNote: …

java apache-spark pyspark apache-spark-sql
pyspark : NameError: name 'spark' is not defined

I am copying the pyspark.ml example from the official document website: http://spark.apache.org/docs/latest/api/python/…

apache-spark machine-learning pyspark distributed-computing apache-spark-ml
Add an empty column to Spark DataFrame

As mentioned in many other locations on the web, adding a new column to an existing DataFrame is not straightforward. …

python apache-spark dataframe pyspark apache-spark-sql
spark dataframe drop duplicates and keep first

Question: in pandas when dropping duplicates you can specify which columns to keep. Is there an equivalent in Spark Dataframes? …

dataframe apache-spark pyspark apache-spark-sql duplicates
AttributeError: 'DataFrame' object has no attribute 'map'

I wanted to convert the spark data frame to add using the code below: from pyspark.mllib.clustering import KMeans …

python apache-spark pyspark spark-dataframe apache-spark-mllib
Viewing the content of a Spark Dataframe Column

I'm using Spark 1.3.1. I am trying to view the values of a Spark dataframe column in Python. With a Spark …

python apache-spark dataframe pyspark
Pyspark: Parse a column of json strings

I have a pyspark dataframe consisting of one column, called json, where each row is a unicode string of json. …

python json apache-spark pyspark