Apache Spark is an open source distributed data processing engine written in Scala providing a unified API and distributed data sets to users for both batch and streaming processing.
Looking at the new spark dataframe api, it is unclear whether it is possible to modify dataframe columns. How would …
python apache-spark pyspark apache-spark-sql spark-dataframeI'm using spark 1.4.0-rc2 so I can use python 3 with spark. If I add export PYSPARK_PYTHON=python3 to my .…
apache-spark pysparkI want to select a column that equals to a certain value. I am doing this in scala and having …
scala apache-spark dataframe apache-spark-sqlI'm just wondering what is the difference between an RDD and DataFrame (Spark 2.0.0 DataFrame is a mere type alias for …
dataframe apache-spark apache-spark-sql rdd apache-spark-datasetWhat's the difference between an RDD's map and mapPartitions method? And does flatMap behave like map or like mapPartitions? Thanks. (…
performance scala apache-spark rddI've seen various people suggesting that Dataframe.explode is a useful way to do this, but it results in more …
apache-spark pyspark apache-spark-sql spark-dataframe pyspark-sqlTrue ... it has been discussed quite a lot. However there is a lot of ambiguity and some of the answers …
java scala apache-spark jar spark-submitI installed Spark using the AWS EC2 guide and I can launch the program fine using the bin/pyspark script …
python scala apache-spark hadoop pysparkI would like to modify the cell values of a dataframe column (Age) where currently it is blank and I …
python apache-spark dataframe pyspark apache-spark-sqlI'm not able to run a simple spark job in Scala IDE (Maven spark project) installed on Windows 7 Spark core …
eclipse scala apache-spark