The Spark Python API (PySpark) exposes the apache-spark programming model to Python.
I'd like to perform some basic stemming on a Spark Dataframe column by replacing substrings. What's the quickest way to …
python apache-spark pysparkI'm trying to run pyspark on my macbook air. When i try starting it up I get the error: Exception: …
java python macos apache-spark pysparkIn pandas, this can be done by column.name. But how to do the same when its column of spark …
pyspark pyspark-sqlI am trying to convert the Spark RDD to a DataFrame. I have seen the documentation and example where the …
python apache-spark pyspark spark-dataframeI need to use the (rdd.)partitionBy(npartitions, custom_partitioner) method that is not available on the DataFrame. All of …
python apache-spark pysparkI have a large pyspark.sql.dataframe.DataFrame and I want to keep (so filter) all rows where the URL …
python apache-spark pyspark apache-spark-sqlI'm trying to get the path to spark.worker.dir for the current sparkcontext. If I explicitly set it as …
apache-spark config pysparkI am using CDH5.5 I have a table created in HIVE default database and able to query it from the …
hive pysparkI am writing a Spark application and want to combine a set of Key-Value pairs (K, V1), (K, V2), ..., (K, …
python apache-spark mapreduce pyspark rddI am using Spark 1.3 and would like to join on multiple columns using python interface (SparkSQL) The following works: I …
python apache-spark join pyspark apache-spark-sql