Top "Apache-spark-sql" questions

Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.

Reading csv files with quoted fields containing embedded commas

I am reading a csv file in Pyspark as follows: df_raw=spark.read.option("header","true").csv(csv_path) …

csv apache-spark pyspark apache-spark-sql apache-spark-2.0
The value of "spark.yarn.executor.memoryOverhead" setting?

The value of spark.yarn.executor.memoryOverhead in a Spark job with YARN should be allocated to App or just …

apache-spark apache-spark-sql spark-streaming apache-spark-mllib
org.apache.spark.sql.AnalysisException: cannot resolve given input columns

exitTotalDF .filter($"accid" === "dc215673-ef22-4d59-0998-455b82000015") .groupBy("exiturl") .agg(first("accid"), first("segment"), $"exiturl", sum("session"), …

scala apache-spark dataframe apache-spark-sql spark-jobserver
How to change case of whole column to lowercase?

I want to Change case of whole column to Lowercase in Spark Dataset Desired Input +------+--------------------+ |ItemID| Category name| +…

java apache-spark apache-spark-sql apache-spark-dataset
How to convert the datasets of Spark Row into string?

I have written the code to access the Hive table using SparkSQL. Here is the code: SparkSession spark = SparkSession .builder() .…

java string apache-spark apache-spark-sql apache-spark-dataset
How to use collect_set and collect_list functions in windowed aggregation in Spark 1.6?

In Spark 1.6.0 / Scala, is there an opportunity to get collect_list("colC") or collect_set("colC").over(Window.partitionBy("colA").…

scala apache-spark apache-spark-sql apache-spark-1.6
Exploding nested Struct in Spark dataframe

I'm working through a Databricks example. The schema for the dataframe looks like: > parquetDF.printSchema root |-- department: struct (…

scala apache-spark apache-spark-sql distributed-computing databricks
Get the size/length of an array column

I'm new in Scala programming and this is my question: How to count the number of string for each row? …

scala apache-spark apache-spark-sql databricks
Automatically and Elegantly flatten DataFrame in Spark SQL

All, Is there an elegant and accepted way to flatten a Spark SQL table (Parquet) with columns that are of …

scala apache-spark apache-spark-sql
Concatenating datasets of different RDDs in Apache spark using scala

Is there a way to concatenate datasets of two different RDDs in spark? Requirement is - I create two intermediate …

scala apache-spark apache-spark-sql distributed-computing rdd