Spark Streaming is an extension of the core Apache Spark API that enables high-throughput, fault-tolerant stream processing of live data streams.
I am fairly new to Spark . I tried searching but I couldn't get a proper solution . I have installed hadoop 2.7.2 …
hadoop apache-spark yarn spark-streamingI would like to convert a DStream into an array, list, etc. so I can then translate it to json …
scala apache-spark spark-streaming dstreamI have, I believe, a relatively common use case for spark streaming: I have a stream of objects that I …
java scala apache-spark spark-streaming broadcastIs that possible to pipe Spark RDD to Python? Because I need a python library to do some calculation on …
python scala apache-spark pyspark spark-streamingI have been trying to run spark-shell in YARN client mode, but I am getting a lot of ClosedChannelException errors. …
hadoop apache-spark spark-streaming yarnI'm using Spark Streaming 2.1. I'd like to refresh some cached table (loaded by spark provided DataSource like parquet, MySQL or …
apache-spark apache-spark-sql spark-streamingIs it possible to limit the size of the batches returned by the Kafka consumer for Spark Streaming? I am …
apache-spark apache-kafka spark-streaming kafka-consumer-apiWill rdd1.join(rdd2) cause a shuffle to happen if rdd1 and rdd2 have the same partitioner?
apache-spark spark-streaming rddIn spark streaming, every batch interval of data always generate one and only one RDD, why do we use foreachRDD() …
apache-spark spark-streamingI have a Spark Streaming job which has been running continuously. How do I stop the job gracefully? I have …
apache-spark spark-streaming