Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.
I am having a Spark SQL DataFrame with data and what I'm trying to get is all the rows preceding …
sql apache-spark pyspark apache-spark-sql window-functionsI have registertemptable in Apache Spark using Zeppelin below: val hvacText = sc.textFile("...") case class Hvac(date: String, time: String, …
scala apache-spark apache-spark-sql apache-zeppelinAssume df1 and df2 are two DataFrames in Apache Spark, computed using two different mechanisms, e.g., Spark SQL vs. …
scala apache-spark dataframe apache-spark-sql rddI have a table of two string type columns (username, friend) and for each username, I want to collect all …
apache-spark aggregate-functions apache-spark-sqlI was wondering if there is some way to specify a custom aggregation function for spark dataframes over multiple columns. …
scala apache-spark dataframe apache-spark-sql orcI know how to write a UDF in Spark SQL: def belowThreshold(power: Int): Boolean = { return power < -40 } sqlContext.…
scala apache-spark apache-spark-sql aggregate-functions user-defined-functionsWhen I m trying to do the same thing in my code as mentioned below dataframe.map(row => { val …
scala apache-spark apache-spark-sql apache-spark-dataset apache-spark-encodersI am running this query in Spark shell but it gives me error, sqlContext.sql( "select sal from samplecsv where …
sql apache-spark subquery apache-spark-sqlI'm just wondering what is the difference between an RDD and DataFrame (Spark 2.0.0 DataFrame is a mere type alias for …
apache-spark apache-spark-sql rdd apache-spark-datasetI am trying to convert a column which contains Array[String] to String, but I consistently get this error org.…
scala apache-spark dataframe apache-spark-sql user-defined-functions