Popular "apache-spark-sql" questions | Page 10

I've successfully create a row_number() partitionBy by in Spark using Window, but would like to sort this by descending, …

python apache-spark pyspark apache-spark-sql window-functions

I am using Spark 1.3.0 with python api. While transforming huge dataframes, I cache many DFs for faster execution; df1.cache() …

apache-spark apache-spark-sql spark-streaming

There's a DataFrame in pyspark with data as below: user_id object_id score user_1 object_1 3 user_1 object_1 1 user_1 object_2 2 …

python apache-spark dataframe pyspark apache-spark-sql

In pyspark 1.6.2, I can import col function by from pyspark.sql.functions import col but when I try to look …

python apache-spark pyspark apache-spark-sql pyspark-sql

Once I have got in Spark some Row class, either Dataframe or Catalyst, I want to convert it to a …

scala apache-spark apache-spark-sql

I have a DF with a huge parseable metadata as a single string column in a Dataframe, lets call it …

scala apache-spark dataframe apache-spark-sql user-defined-functions

The seemingly simple code below throws the following error: Traceback (most recent call last): File "/home/nirmal/process.py", line 165, …

python apache-spark pyspark apache-spark-sql user-defined-functions

I would like to repartition / coalesce my data so that it is saved into one Parquet file per partition. I …

apache-spark apache-spark-sql

I'm trying to load an SVM file and convert it to a DataFrame so I can use the ML module (…

python apache-spark pyspark apache-spark-sql rdd

I know we can load parquet file using Spark SQL and using Impala but wondering if we can do the …

hadoop hive apache-spark-sql hiveql parquet

Top "Apache-spark-sql" questions