Apache Spark is an open source distributed data processing engine written in Scala providing a unified API and distributed data sets to users for both batch and streaming processing.
I'm using pyspark(Python 2.7.9/Spark 1.3.1) and have a dataframe GroupObject which I need to filter & sort in the descending …
python apache-spark dataframe pyspark apache-spark-sqlI am using https://github.com/databricks/spark-csv , I am trying to write a single CSV, but not able to, …
scala csv apache-spark spark-csvI'm attempting to print the contents of a collection to the Spark console. I have a type: linesWithSessionId: org.apache.…
scala apache-sparkI have a dataframe with column as String. I wanted to change the column type to Double type in PySpark. …
python apache-spark dataframe pyspark apache-spark-sqlI am using spark-csv to load data into a DataFrame. I want to do a simple query and display the …
apache-spark dataframe spark-csv output-formattingAccording to Learning Spark Keep in mind that repartitioning your data is a fairly expensive operation. Spark also has an …
apache-spark distributed-computing rddI am using Spark 1.3.1 (PySpark) and I have generated a table using a SQL query. I now have an object …
python apache-spark dataframe apache-spark-sql export-to-csv>>> a DataFrame[id: bigint, julian_date: string, user_id: bigint] >>> b DataFrame[id: bigint, …
apache-spark apache-spark-sql pysparkI have a date pyspark dataframe with a string column in the format of MM-dd-yyyy and I am attempting to …
apache-spark pyspark apache-spark-sql pyspark-sqlI want to convert a string column of a data frame to a list. What I can find from the …
scala apache-spark apache-spark-sql