Top "Spark-streaming" questions

Spark Streaming is an extension of the core Apache Spark API that enables high-throughput, fault-tolerant stream processing of live data streams.

Spark Streaming + Kafka: SparkException: Couldn't find leader offsets for Set

I'm trying to setup Spark Streaming to get messages from Kafka queue. I'm getting the following error: py4j.protocol.…

apache-spark apache-kafka spark-streaming
How to specify multiple dependencies using --packages for spark-submit?

I have the following as the command line to start a spark streaming job. spark-submit --class com.biz.test \ --packages \ …

apache-spark hbase spark-streaming
How to save latest offset that Spark consumed to ZK or Kafka and can read back after restart

I am using Kafka 0.8.2 to receive data from AdExchange then I use Spark Streaming 1.4.1 to store data to MongoDB. My …

apache-spark apache-kafka spark-streaming kafka-consumer-api
spark ssc.textFileStream is not streamining any files from directory

I am trying to execute below code using eclipse (with maven conf) with 2 worker and each have 2 core or also …

filesystems apache-spark spark-streaming data-stream
what is exact difference between Spark Transform in DStream and map.?

I am trying to understand transform on Spark DStream in Spark Streaming. I knew that transform in much superlative compared …

apache-spark spark-streaming
Adding custom jars to pyspark in jupyter notebook

I am using the Jupyter notebook with Pyspark with the following docker image: Jupyter all-spark-notebook Now I would like to …

python-3.x apache-kafka pyspark spark-streaming jupyter-notebook
Amazon s3a returns 400 Bad Request with Spark

For checkout purpose I try to set up an Amazon S3 bucket as checkpoint file. val checkpointDir = "s3a://bucket-name/…

amazon-web-services amazon-s3 apache-spark hdfs spark-streaming
Spark: processing multiple kafka topic in parallel

I am using spark 1.5.2. I need to run spark streaming job with kafka as the streaming source. I need to …

apache-spark apache-kafka spark-streaming
How to set and get static variables from spark?

I have a class as this: public class Test { private static String name; public static String getName() { return name; } public …

java apache-spark spark-streaming
Spark Scala Get Data Back from rdd.foreachPartition

I have some code like this: println("\nBEGIN Last Revs Class: "+ distinctFileGidsRDD.getClass) val lastRevs = distinctFileGidsRDD. foreachPartition(iter => { SetupJDBC(…

scala apache-spark spark-streaming scalikejdbc