Top "Apache-spark" questions

Apache Spark is an open source distributed data processing engine written in Scala providing a unified API and distributed data sets to users for both batch and streaming processing.

How to set up Spark on Windows?

I am trying to setup Apache Spark on Windows. After searching a bit, I understand that the standalone mode is …

windows apache-spark
Convert spark DataFrame column to python list

I work on a dataframe with two column, mvv and count. +---+-----+ |mvv|count| +---+-----+ | 1 | 5 | | 2 | 9 | | 3 | 3 | | 4 | 1 | i would like …

python apache-spark pyspark spark-dataframe
How to overwrite the output directory in spark

I have a spark streaming application which produces a dataset for every minute. I need to save/overwrite the results …

apache-spark
How to stop INFO messages displaying on spark console?

I'd like to stop various messages that are coming on spark shell. I tried to edit the log4j.properties …

apache-spark log4j spark-submit
Renaming column names of a DataFrame in Spark Scala

I am trying to convert all the headers / column names of a DataFrame in Spark-Scala. as of now I come …

scala apache-spark dataframe apache-spark-sql
How to load local file in sc.textFile, instead of HDFS

I'm following the great spark tutorial so i'm trying at 46m:00s to load the README.md but fail to …

scala apache-spark
Task not serializable: java.io.NotSerializableException when calling function outside closure only on classes not objects

Getting strange behavior when calling function outside of a closure: when function is in a object everything is working when …

scala apache-spark serialization
how to filter out a null value from spark dataframe

I created a dataframe in spark with the following schema: root |-- user_id: long (nullable = false) |-- event_id: …

scala apache-spark apache-spark-sql spark-dataframe
How to save DataFrame directly to Hive?

Is it possible to save DataFrame in spark directly to Hive? I have tried with converting DataFrame to Rdd and …

scala apache-spark hive apache-spark-sql
How to add a constant column in a Spark DataFrame?

I want to add a column in a DataFrame with some arbitrary value (that is the same for each row). …

python apache-spark dataframe pyspark apache-spark-sql