Top "Spark-dataframe" questions

Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.

Convert spark DataFrame column to python list

I work on a dataframe with two column, mvv and count. +---+-----+ |mvv|count| +---+-----+ | 1 | 5 | | 2 | 9 | | 3 | 3 | | 4 | 1 | i would like …

python apache-spark pyspark spark-dataframe
how to filter out a null value from spark dataframe

I created a dataframe in spark with the following schema: root |-- user_id: long (nullable = false) |-- event_id: …

scala apache-spark apache-spark-sql spark-dataframe
Converting Pandas dataframe into Spark dataframe error

I'm trying to convert Pandas DF into Spark one. DF head: 10000001,1,0,1,12:35,OK,10002,1,0,9,f,NA,24,24,0,3,9,0,0,1,1,0,0,4,543 10000001,2,0,1,12:36,OK,10002,1,0,9,f,NA,24,24,0,3,9,2,1,1,3,1,3,2,611 10000002,1,0,4,12:19,PA,10003,1,1,7,f,NA,74,74,0,2,15,2,0,2,3,1,2,2,691 …

python pandas apache-spark spark-dataframe
Updating a dataframe column in spark

Looking at the new spark dataframe api, it is unclear whether it is possible to modify dataframe columns. How would …

python apache-spark pyspark apache-spark-sql spark-dataframe
Split Spark Dataframe string column into multiple columns

I've seen various people suggesting that Dataframe.explode is a useful way to do this, but it results in more …

apache-spark pyspark apache-spark-sql spark-dataframe pyspark-sql
Pyspark: display a spark data frame in a table format

I am using pyspark to read a parquet file like below: my_df = sqlContext.read.parquet('hdfs://myPath/myDB.db/…

python pandas pyspark spark-dataframe
How does createOrReplaceTempView work in Spark?

I am new to Spark and Spark SQL. How does createOrReplaceTempView work in Spark? If we register an RDD of …

apache-spark apache-spark-sql spark-dataframe
Spark RDD to DataFrame python

I am trying to convert the Spark RDD to a DataFrame. I have seen the documentation and example where the …

python apache-spark pyspark spark-dataframe
Take n rows from a spark dataframe and pass to toPandas()

I have this code: l = [('Alice', 1),('Jim',2),('Sandra',3)] df = sqlContext.createDataFrame(l, ['name', 'age']) df.withColumn('age2', df.age + 2).…

python apache-spark-sql spark-dataframe
How to import multiple csv files in a single load?

Consider I have a defined schema for loading 10 csv files in a folder. Is there a way to automatically load …

apache-spark apache-spark-sql spark-dataframe