Top "Apache-spark-sql" questions

Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.

How to define schema for custom type in Spark SQL?

The following example code tries to put some case objects into a dataframe. The code includes the definition of a …

scala apache-spark apache-spark-sql case-class
Dropping a nested column from Spark DataFrame

I have a DataFrame with the schema root |-- label: string (nullable = true) |-- features: struct (nullable = true) | |-- feat1: …

scala apache-spark dataframe apache-spark-sql apache-spark-ml
Using Spark to write a parquet file to s3 over s3a is very slow

I'm trying to write a parquet file out to Amazon S3 using Spark 1.6.1. The small parquet that I'm generating is ~2…

scala amazon-s3 apache-spark apache-spark-sql parquet
to_date fails to parse date in Spark 3.0

I am trying to parse date using to_date() but I get the following exception. SparkUpgradeException: You may get a …

apache-spark pyspark apache-spark-sql spark3
How to create an empty dataFrame in Spark

I have a set of Avro based hive tables and I need to read data from them. As Spark-SQL uses …

scala apache-spark apache-spark-sql avro spark-avro
Why Presto is faster than Spark SQL

Why is Presto faster than Spark SQL? Besides what is the difference between Presto and Spark SQL in computing architectures …

apache-spark-sql presto
How to avoid Spark executor from getting lost and yarn container killing it due to memory limit?

I have the following code which fires hiveContext.sql() most of the time. My task is I want to create …

memory apache-spark apache-spark-sql yarn executors
Spark SQL saveAsTable is not compatible with Hive when partition is specified

Kind of edge case, when saving parquet table in Spark SQL with partition, #schema definitioin final StructType schema = DataTypes.createStructType(…

hive apache-spark-sql partitioning parquet
Why does SparkContext randomly close, and how do you restart it from Zeppelin?

I am working in Zeppelin writing spark-sql queries and sometimes I suddenly start getting this error (after not changing code): …

apache-spark pyspark apache-spark-sql apache-zeppelin
How to aggregate over rolling time window with groups in Spark

I have some data that I want to group by a certain column, then aggregate a series of fields based …

sql apache-spark pyspark apache-spark-sql window-functions