Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.
I am reading a csv file in Pyspark as follows: df_raw=spark.read.option("header","true").csv(csv_path) …
csv apache-spark pyspark apache-spark-sql apache-spark-2.0The value of spark.yarn.executor.memoryOverhead in a Spark job with YARN should be allocated to App or just …
apache-spark apache-spark-sql spark-streaming apache-spark-mllibexitTotalDF .filter($"accid" === "dc215673-ef22-4d59-0998-455b82000015") .groupBy("exiturl") .agg(first("accid"), first("segment"), $"exiturl", sum("session"), …
scala apache-spark dataframe apache-spark-sql spark-jobserverI want to Change case of whole column to Lowercase in Spark Dataset Desired Input +------+--------------------+ |ItemID| Category name| +…
java apache-spark apache-spark-sql apache-spark-datasetI have written the code to access the Hive table using SparkSQL. Here is the code: SparkSession spark = SparkSession .builder() .…
java string apache-spark apache-spark-sql apache-spark-datasetIn Spark 1.6.0 / Scala, is there an opportunity to get collect_list("colC") or collect_set("colC").over(Window.partitionBy("colA").…
scala apache-spark apache-spark-sql apache-spark-1.6I'm working through a Databricks example. The schema for the dataframe looks like: > parquetDF.printSchema root |-- department: struct (…
scala apache-spark apache-spark-sql distributed-computing databricksI'm new in Scala programming and this is my question: How to count the number of string for each row? …
scala apache-spark apache-spark-sql databricksAll, Is there an elegant and accepted way to flatten a Spark SQL table (Parquet) with columns that are of …
scala apache-spark apache-spark-sqlIs there a way to concatenate datasets of two different RDDs in spark? Requirement is - I create two intermediate …
scala apache-spark apache-spark-sql distributed-computing rdd