Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.
I have a very large dataset that is loaded in Hive. It consists of about 1.9 million rows and 1450 columns. I …
python apache-spark dataframe pyspark apache-spark-sqlWhat are the differences between Apache Spark SQLContext and HiveContext ? Some sources say that since the HiveContext is a superset …
apache-spark hive apache-spark-sqlI have a DataFrame generated as follows: df.groupBy($"Hour", $"Category") .agg(sum($"value").alias("TotalValue")) .sort($"Hour".asc,$"TotalValue".…
scala apache-spark apache-spark-sql spark-dataframe parquetAs a simplified example, I have a dataframe "df" with columns "col1,col2" and I want to compute a row-wise …
python apache-spark pyspark apache-spark-sqlI am trying to use the Spark Dataset API but I am having some issues doing a simple join. Let's …
scala apache-spark apache-spark-sql apache-spark-datasetI'm wondering how I can achieve the following in Spark (Pyspark) Initial Dataframe: +--+---+ |id|num| +--+---+ |4 |9.0| +--+…
python apache-spark dataframe pyspark apache-spark-sqlI want to parse the date columns in a DataFrame, and for each date column, the resolution for the date …
scala apache-spark apache-spark-sql user-defined-functionsI am new to Spark SQL. We are migrating data from SQL server to Databricks. I am using SPARK SQL . …
apache-spark-sql datediff databricksI am trying to test how to write data in HDFS 2.7 using Spark 2.1. My data is a simple sequence of …
scala apache-spark apache-spark-sql parquetI am trying to run random forest classification by using Spark ML api but I am having issues with creating …
scala apache-spark apache-spark-sql apache-spark-mllib