Apache Spark is an open source distributed data processing engine written in Scala providing a unified API and distributed data sets to users for both batch and streaming processing.
Suppose I'm doing something like: val df = sqlContext.load("com.databricks.spark.csv", Map("path" -> "cars.csv", "header" …
scala apache-spark apache-spark-sqlI would like to read a CSV in spark and convert it as DataFrame and store it in HDFS with …
scala apache-spark hadoop apache-spark-sql hdfsI come from pandas background and am used to reading data from CSV files into a dataframe and then simply …
python apache-spark pyspark pyspark-sqlI'm new to Spark and I'm trying to read CSV data from a file with Spark. Here's what I am …
python csv apache-spark pysparkI have a Spark DataFrame (using PySpark 1.5.1) and would like to add a new column. I've tried the following without …
python apache-spark dataframe pyspark apache-spark-sqlHow do we concatenate two columns in an Apache Spark DataFrame? Is there any function in Spark SQL which we …
sql apache-spark dataframe apache-spark-sqlHow can I convert an RDD (org.apache.spark.rdd.RDD[org.apache.spark.sql.Row]) to a Dataframe org.…
scala apache-spark apache-spark-sql rddI'm trying to filter a PySpark dataframe that has None as a row value: df.select('dt_mvmt').distinct().collect() […
python apache-spark dataframe pyspark apache-spark-sqlI tried df.orderBy("col1").show(10) but it sorted in ascending order. df.sort("col1").show(10) also sorts in descending …
scala apache-spark apache-spark-sqlMy cluster: 1 master, 11 slaves, each node has 6 GB memory. My settings: spark.executor.memory=4g, Dspark.akka.frameSize=512 Here is …
out-of-memory apache-spark