Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.
I've successfully create a row_number() partitionBy by in Spark using Window, but would like to sort this by descending, …
python apache-spark pyspark apache-spark-sql window-functionsI am using Spark 1.3.0 with python api. While transforming huge dataframes, I cache many DFs for faster execution; df1.cache() …
apache-spark apache-spark-sql spark-streamingThere's a DataFrame in pyspark with data as below: user_id object_id score user_1 object_1 3 user_1 object_1 1 user_1 object_2 2 …
python apache-spark dataframe pyspark apache-spark-sqlIn pyspark 1.6.2, I can import col function by from pyspark.sql.functions import col but when I try to look …
python apache-spark pyspark apache-spark-sql pyspark-sqlOnce I have got in Spark some Row class, either Dataframe or Catalyst, I want to convert it to a …
scala apache-spark apache-spark-sqlI have a DF with a huge parseable metadata as a single string column in a Dataframe, lets call it …
scala apache-spark dataframe apache-spark-sql user-defined-functionsThe seemingly simple code below throws the following error: Traceback (most recent call last): File "/home/nirmal/process.py", line 165, …
python apache-spark pyspark apache-spark-sql user-defined-functionsI would like to repartition / coalesce my data so that it is saved into one Parquet file per partition. I …
apache-spark apache-spark-sqlI'm trying to load an SVM file and convert it to a DataFrame so I can use the ML module (…
python apache-spark pyspark apache-spark-sql rddI know we can load parquet file using Spark SQL and using Impala but wondering if we can do the …
hadoop hive apache-spark-sql hiveql parquet