Apache Spark is an open source distributed data processing engine written in Scala providing a unified API and distributed data sets to users for both batch and streaming processing.
I am trying to setup Apache Spark on Windows. After searching a bit, I understand that the standalone mode is …
windows apache-sparkI work on a dataframe with two column, mvv and count. +---+-----+ |mvv|count| +---+-----+ | 1 | 5 | | 2 | 9 | | 3 | 3 | | 4 | 1 | i would like …
python apache-spark pyspark spark-dataframeI have a spark streaming application which produces a dataset for every minute. I need to save/overwrite the results …
apache-sparkI'd like to stop various messages that are coming on spark shell. I tried to edit the log4j.properties …
apache-spark log4j spark-submitI am trying to convert all the headers / column names of a DataFrame in Spark-Scala. as of now I come …
scala apache-spark dataframe apache-spark-sqlI'm following the great spark tutorial so i'm trying at 46m:00s to load the README.md but fail to …
scala apache-sparkGetting strange behavior when calling function outside of a closure: when function is in a object everything is working when …
scala apache-spark serializationI created a dataframe in spark with the following schema: root |-- user_id: long (nullable = false) |-- event_id: …
scala apache-spark apache-spark-sql spark-dataframeIs it possible to save DataFrame in spark directly to Hive? I have tried with converting DataFrame to Rdd and …
scala apache-spark hive apache-spark-sqlI want to add a column in a DataFrame with some arbitrary value (that is the same for each row). …
python apache-spark dataframe pyspark apache-spark-sql