Top "Pyspark-sql" questions

Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.

Cannot find col function in pyspark

In pyspark 1.6.2, I can import col function by from pyspark.sql.functions import col but when I try to look …

python apache-spark pyspark apache-spark-sql pyspark-sql
Applying a Window function to calculate differences in pySpark

I am using pySpark, and have set up my dataframe with two columns representing a daily asset price as follows: …

pyspark spark-dataframe window-functions pyspark-sql
Median / quantiles within PySpark groupBy

I would like to calculate group quantiles on a Spark dataframe (using PySpark). Either an approximate or exact result would …

apache-spark pyspark apache-spark-sql pyspark-sql
Pyspark - Load file: Path does not exist

I am a newbie to Spark. I'm trying to read a local csv file within an EMR cluster. The file …

apache-spark pyspark emr amazon-emr pyspark-sql
How to set the number of partitions/nodes when importing data into Spark

Problem: I want to import data into Spark EMR from S3 using: data = sqlContext.read.json("s3n://.....") Is there …

sql apache-spark database-partitioning pyspark-sql
Py4JJavaError: An error occurred while calling

I am new to PySpark. I have been writing my code with a test sample. Once I run the code …

python pyspark pyspark-sql py4j
Trying to connect to Oracle from Spark

I am trying to connect to Oracle to Spark and want pull data from some table and SQL queries. But …

apache-spark-sql pyspark-sql oracleclient
How to spark-submit a python file in spark 2.1.0?

I am currently running spark 2.1.0. I have worked most of the time in PYSPARK shell, but I need to spark-submit …

apache-spark pyspark apache-spark-sql pyspark-sql spark-submit
Difference between createOrReplaceTempView and registerTempTable

I am new to spark and was trying out a few commands in sparkSql using python when I came across …

apache-spark pyspark apache-spark-sql pyspark-sql sparkr
Passing Array to Python Spark Lit Function

Let's say I have a numpy array a that contains the numbers 1-10. So a is [1 2 3 4 5 6 7 8 9 10]. Now, I also have …

python apache-spark pyspark literals pyspark-sql