Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.
Let's say I have a rather large dataset in the following form: data = sc.parallelize([('Foo',41,'US',3), ('Foo',39,'UK',1), ('Bar',57,…
apache-spark apache-spark-sql pysparkLooking at the new spark dataframe api, it is unclear whether it is possible to modify dataframe columns. How would …
python apache-spark pyspark apache-spark-sql spark-dataframeI want to select a column that equals to a certain value. I am doing this in scala and having …
scala apache-spark dataframe apache-spark-sqlI'm just wondering what is the difference between an RDD and DataFrame (Spark 2.0.0 DataFrame is a mere type alias for …
dataframe apache-spark apache-spark-sql rdd apache-spark-datasetI've seen various people suggesting that Dataframe.explode is a useful way to do this, but it results in more …
apache-spark pyspark apache-spark-sql spark-dataframe pyspark-sqlI would like to modify the cell values of a dataframe column (Age) where currently it is blank and I …
python apache-spark dataframe pyspark apache-spark-sqlI am trying to read a csv file into a dataframe. I know what the schema of my dataframe should …
scala apache-spark dataframe apache-spark-sql spark-csvI have a DataFrame generated as follow: df.groupBy($"Hour", $"Category") .agg(sum($"value") as "TotalValue") .sort($"Hour".asc, $"TotalValue".…
sql scala apache-spark dataframe apache-spark-sqlHow to give more column conditions when joining two dataframes. For example I want to run the following : val Lead_…
apache-spark apache-spark-sql rddI am new to Spark and Spark SQL. How does createOrReplaceTempView work in Spark? If we register an RDD of …
apache-spark apache-spark-sql spark-dataframe