Top "Apache-spark-sql" questions

Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.

How can I change column types in Spark SQL's DataFrame?

Suppose I'm doing something like: val df = sqlContext.load("com.databricks.spark.csv", Map("path" -> "cars.csv", "header" …

scala apache-spark apache-spark-sql
Spark - load CSV file as DataFrame?

I would like to read a CSV in spark and convert it as DataFrame and store it in HDFS with …

scala apache-spark hadoop apache-spark-sql hdfs
How do I add a new column to a Spark DataFrame (using PySpark)?

I have a Spark DataFrame (using PySpark 1.5.1) and would like to add a new column. I've tried the following without …

python apache-spark dataframe pyspark apache-spark-sql
Concatenate columns in Apache Spark DataFrame

How do we concatenate two columns in an Apache Spark DataFrame? Is there any function in Spark SQL which we …

sql apache-spark dataframe apache-spark-sql
How to convert rdd object to dataframe in spark

How can I convert an RDD (org.apache.spark.rdd.RDD[org.apache.spark.sql.Row]) to a Dataframe org.…

scala apache-spark apache-spark-sql rdd
Filter Pyspark dataframe column with None value

I'm trying to filter a PySpark dataframe that has None as a row value: df.select('dt_mvmt').distinct().collect() […

python apache-spark dataframe pyspark apache-spark-sql
How to sort by column in descending order in Spark SQL?

I tried df.orderBy("col1").show(10) but it sorted in ascending order. df.sort("col1").show(10) also sorts in descending …

scala apache-spark apache-spark-sql
Spark DataFrame groupBy and sort in the descending order (pyspark)

I'm using pyspark(Python 2.7.9/Spark 1.3.1) and have a dataframe GroupObject which I need to filter & sort in the descending …

python apache-spark dataframe pyspark apache-spark-sql
how to change a Dataframe column from String type to Double type in pyspark

I have a dataframe with column as String. I wanted to change the column type to Double type in PySpark. …

python apache-spark dataframe pyspark apache-spark-sql
How to export a table dataframe in PySpark to csv?

I am using Spark 1.3.1 (PySpark) and I have generated a table using a SQL query. I now have an object …

python apache-spark dataframe apache-spark-sql export-to-csv