Top "Apache-spark-sql" questions

Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.

Trim string column in PySpark dataframe

I'm beginner on Python and Spark. After creating a DataFrame from CSV file, I would like to know how I …

apache-spark pyspark apache-spark-sql trim pyspark-sql
How to aggregate values into collection after groupBy?

I have a dataframe with schema as such: [visitorId: string, trackingIds: array<string>, emailIds: array<string>] …

scala apache-spark apache-spark-sql
spark access first n rows - take vs limit

I want to access the first 100 rows of a spark data frame and write the result back to a CSV …

apache-spark apache-spark-sql limit
How to pivot Spark DataFrame?

I am starting to use Spark DataFrames and I need to be able to pivot the data to create multiple …

scala apache-spark dataframe apache-spark-sql pivot
How to connect to remote hive server from spark

I'm running spark locally and want to to access Hive tables, which are located in the remote Hadoop cluster. I'm …

apache-spark hive apache-spark-sql spark-thriftserver
Querying Spark SQL DataFrame with complex types

How Can I query an RDD with complex types such as maps/arrays? for example, when I was writing this …

sql scala apache-spark dataframe apache-spark-sql
Spark: How to translate count(distinct(value)) in Dataframe API's

I'm trying to compare different ways to aggregate my data. This is my input data with 2 elements (page,visitor): (PAG1,…

count apache-spark distinct dataframe apache-spark-sql
How to write unit tests in Spark 2.0+?

I've been trying to find a reasonable way to test SparkSession with the JUnit testing framework. While there seem to …

scala unit-testing apache-spark junit apache-spark-sql
How to use Column.isin with list?

val items = List("a", "b", "c") sqlContext.sql("select c1 from table") .filter($"c1".isin(items)) .collect .foreach(println) The …

scala apache-spark apache-spark-sql
How to convert a dataframe to dataset in Apache Spark in Scala?

I need to convert my dataframe to a dataset and I used the following code: val final_df = Dataframe.withColumn( "…

scala apache-spark apache-spark-sql apache-spark-encoders