Top "Pyspark" questions

The Spark Python API (PySpark) exposes the apache-spark programming model to Python.

How to add a constant column in a Spark DataFrame?

I want to add a column in a DataFrame with some arbitrary value (that is the same for each row). …

python apache-spark dataframe pyspark apache-spark-sql
Concatenate two PySpark dataframes

I'm trying to concatenate two PySpark dataframes with some columns that are only on each of them: from pyspark.sql.…

python apache-spark pyspark
Spark Kill Running Application

I have a running Spark application where it occupies all the cores where my other applications won't be allocated any …

apache-spark yarn pyspark
Join two data frames, select all columns from one and some columns from the other

Let's say I have a spark data frame df1, with several columns (among which the column 'id') and data frame …

pyspark pyspark-sql
Best way to get the max value in a Spark dataframe column

I'm trying to figure out the best way to get the largest value in a Spark dataframe column. Consider the …

python apache-spark pyspark apache-spark-sql
importing pyspark in python shell

This is a copy of someone else's question on another forum that was never answered, so I thought I'd re-ask …

python apache-spark pyspark
PySpark 2.0 The size or shape of a DataFrame

I am trying to find out the size/shape of a DataFrame in PySpark. I do not see a single …

dataframe size pyspark shape
how to loop through each row of dataFrame in pyspark

E.g sqlContext = SQLContext(sc) sample=sqlContext.sql("select Name ,age ,city from user") sample.show() The above statement print …

apache-spark dataframe for-loop pyspark apache-spark-sql
PySpark: withColumn() with two conditions and three outcomes

I am working with Spark and PySpark. I am trying to achieve the result equivalent to the following pseudocode: df = …

apache-spark hive pyspark apache-spark-sql hiveql
Spark Dataframe distinguish columns with duplicated name

So as I know in Spark Dataframe, that for multiple columns can have the same name as shown in below …

python apache-spark dataframe pyspark apache-spark-sql