Top "Apache-spark" questions

Apache Spark is an open source distributed data processing engine written in Scala providing a unified API and distributed data sets to users for both batch and streaming processing.

Spark DataFrame groupBy and sort in the descending order (pyspark)

I'm using pyspark(Python 2.7.9/Spark 1.3.1) and have a dataframe GroupObject which I need to filter & sort in the descending …

python apache-spark dataframe pyspark apache-spark-sql
Write single CSV file using spark-csv

I am using https://github.com/databricks/spark-csv , I am trying to write a single CSV, but not able to, …

scala csv apache-spark spark-csv
How to print the contents of RDD?

I'm attempting to print the contents of a collection to the Spark console. I have a type: linesWithSessionId: org.apache.…

scala apache-spark
how to change a Dataframe column from String type to Double type in pyspark

I have a dataframe with column as String. I wanted to change the column type to Double type in PySpark. …

python apache-spark dataframe pyspark apache-spark-sql
How to show full column content in a Spark Dataframe?

I am using spark-csv to load data into a DataFrame. I want to do a simple query and display the …

apache-spark dataframe spark-csv output-formatting
Spark - repartition() vs coalesce()

According to Learning Spark Keep in mind that repartitioning your data is a fairly expensive operation. Spark also has an …

apache-spark distributed-computing rdd
How to export a table dataframe in PySpark to csv?

I am using Spark 1.3.1 (PySpark) and I have generated a table using a SQL query. I now have an object …

python apache-spark dataframe apache-spark-sql export-to-csv
How to delete columns in pyspark dataframe

>>> a DataFrame[id: bigint, julian_date: string, user_id: bigint] >>> b DataFrame[id: bigint, …

apache-spark apache-spark-sql pyspark
Convert pyspark string to date format

I have a date pyspark dataframe with a string column in the format of MM-dd-yyyy and I am attempting to …

apache-spark pyspark apache-spark-sql pyspark-sql
Extract column values of Dataframe as List in Apache Spark

I want to convert a string column of a data frame to a list. What I can find from the …

scala apache-spark apache-spark-sql