Apache Spark is an open source distributed data processing engine written in Scala providing a unified API and distributed data sets to users for both batch and streaming processing.
Can someone explain to me the difference between map and flatMap and what is a good use case for each? …
apache-sparkI'm trying to convert Pandas DF into Spark one. DF head: 10000001,1,0,1,12:35,OK,10002,1,0,9,f,NA,24,24,0,3,9,0,0,1,1,0,0,4,543 10000001,2,0,1,12:36,OK,10002,1,0,9,f,NA,24,24,0,3,9,2,1,1,3,1,3,2,611 10000002,1,0,4,12:19,PA,10003,1,1,7,f,NA,74,74,0,2,15,2,0,2,3,1,2,2,691 …
python pandas apache-spark spark-dataframeI'm trying to concatenate two PySpark dataframes with some columns that are only on each of them: from pyspark.sql.…
python apache-spark pysparkI've started using Spark SQL and DataFrames in Spark 1.4.0. I'm wanting to define a custom partitioner on DataFrames, in Scala, …
scala apache-spark dataframe apache-spark-sql partitioningI have a running Spark application where it occupies all the cores where my other applications won't be allocated any …
apache-spark yarn pysparkI am trying to convert a column which is in String format to Date format using the to_date function …
apache-spark apache-spark-sqlI'm trying to figure out the best way to get the largest value in a Spark dataframe column. Consider the …
python apache-spark pyspark apache-spark-sqlThis is a copy of someone else's question on another forum that was never answered, so I thought I'd re-ask …
python apache-spark pysparkHow can I increase the memory available for Apache spark executor nodes? I have a 2 GB file that is suitable …
memory apache-sparkI have constructed two dataframes. How can we join multiple Spark dataframes ? For Example : PersonDf, ProfileDf with a common column …
scala apache-spark dataframe apache-spark-sql