Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.
Short version of the question! Consider the following snippet (assuming spark is already set to some SparkSession): from pyspark.sql …
python apache-spark pyspark apache-spark-sql apache-spark-mlI'm using pyspark, loading a large csv file into a dataframe with spark-csv, and as a pre-processing step I need …
python apache-spark pyspark apache-spark-sql user-defined-functionsSpark now offers predefined functions that can be used in dataframes, and it seems they are highly optimized. My original …
performance apache-spark pyspark apache-spark-sql user-defined-functionsI made a simple UDF to convert or extract some values from a time field in a temptabl in spark. …
scala apache-spark apache-spark-sql apache-zeppelinI'm having trouble finding a library that allows Parquet files to be written using Python. Bonus points if I can …
python apache-spark apache-spark-sql parquet snappyI have a dataframe in Spark in which one of the columns contains an array.Now,I have written a …
arrays apache-spark pyspark apache-spark-sql user-defined-functionsI have data in a parquet file which has 2 fields: object_id: String and alpha: Map<>. It is …
scala apache-spark dataframe apache-spark-sql apache-spark-datasetWhen running sparkJob on a cluster past a certain data size(~2,5gb) I am getting either "Job cancelled because SparkContext …
scala apache-spark yarn apache-spark-sqlI am using Spark SQL (I mention that it is in Spark in case that affects the SQL syntax - …
sql apache-spark apache-spark-sql hiveqlI'm trying to read the messages from kafka (version 10) in spark and trying to print it. import spark.implicits._ val …
scala apache-spark-sql spark-streaming