Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.
The following example code tries to put some case objects into a dataframe. The code includes the definition of a …
scala apache-spark apache-spark-sql case-classI have a DataFrame with the schema root |-- label: string (nullable = true) |-- features: struct (nullable = true) | |-- feat1: …
scala apache-spark dataframe apache-spark-sql apache-spark-mlI'm trying to write a parquet file out to Amazon S3 using Spark 1.6.1. The small parquet that I'm generating is ~2…
scala amazon-s3 apache-spark apache-spark-sql parquetI am trying to parse date using to_date() but I get the following exception. SparkUpgradeException: You may get a …
apache-spark pyspark apache-spark-sql spark3I have a set of Avro based hive tables and I need to read data from them. As Spark-SQL uses …
scala apache-spark apache-spark-sql avro spark-avroWhy is Presto faster than Spark SQL? Besides what is the difference between Presto and Spark SQL in computing architectures …
apache-spark-sql prestoI have the following code which fires hiveContext.sql() most of the time. My task is I want to create …
memory apache-spark apache-spark-sql yarn executorsKind of edge case, when saving parquet table in Spark SQL with partition, #schema definitioin final StructType schema = DataTypes.createStructType(…
hive apache-spark-sql partitioning parquetI am working in Zeppelin writing spark-sql queries and sometimes I suddenly start getting this error (after not changing code): …
apache-spark pyspark apache-spark-sql apache-zeppelinI have some data that I want to group by a certain column, then aggregate a series of fields based …
sql apache-spark pyspark apache-spark-sql window-functions