Top "Spark-streaming" questions

Spark Streaming is an extension of the core Apache Spark API that enables high-throughput, fault-tolerant stream processing of live data streams.

Spark Job running on Yarn Cluster java.io.FileNotFoundException: File does not exits , eventhough the file exits on the master node

I am fairly new to Spark . I tried searching but I couldn't get a proper solution . I have installed hadoop 2.7.2 …

hadoop apache-spark yarn spark-streaming
For each RDD in a DStream how do I convert this to an array or some other typical Java data type?

I would like to convert a DStream into an array, list, etc. so I can then translate it to json …

scala apache-spark spark-streaming dstream
How can I update a broadcast variable in spark streaming?

I have, I believe, a relatively common use case for spark streaming: I have a stream of objects that I …

java scala apache-spark spark-streaming broadcast
How to Use both Scala and Python in a same Spark project?

Is that possible to pipe Spark RDD to Python? Because I need a python library to do some calculation on …

python scala apache-spark pyspark spark-streaming
How to know what is the reason for ClosedChannelExceptions with spark-shell in YARN client mode?

I have been trying to run spark-shell in YARN client mode, but I am getting a lot of ClosedChannelException errors. …

hadoop apache-spark spark-streaming yarn
How to refresh a table and do it concurrently?

I'm using Spark Streaming 2.1. I'd like to refresh some cached table (loaded by spark provided DataSource like parquet, MySQL or …

apache-spark apache-spark-sql spark-streaming
Limit Kafka batches size when using Spark Streaming

Is it possible to limit the size of the batches returned by the Kafka consumer for Spark Streaming? I am …

apache-spark apache-kafka spark-streaming kafka-consumer-api
Does a join of co-partitioned RDDs cause a shuffle in Apache Spark?

Will rdd1.join(rdd2) cause a shuffle to happen if rdd1 and rdd2 have the same partitioner?

apache-spark spark-streaming rdd
What's the meaning of DStream.foreachRDD function?

In spark streaming, every batch interval of data always generate one and only one RDD, why do we use foreachRDD() …

apache-spark spark-streaming
How do I stop a spark streaming job?

I have a Spark Streaming job which has been running continuously. How do I stop the job gracefully? I have …

apache-spark spark-streaming