Top "Spark-streaming" questions

Spark Streaming is an extension of the core Apache Spark API that enables high-throughput, fault-tolerant stream processing of live data streams.

Spark streaming with Kafka - createDirectStream vs createStream

We have been using spark streaming with kafka for a while and until now we were using the createStream method …

apache-spark apache-kafka spark-streaming
SBT Test Error: java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream

Getting Below exception , when i tried to perform unit tests for my spark streaming code on SBT windows using scalatest. …

scala apache-spark sbt spark-streaming scalatest
Kafka topic partitions to Spark streaming

I have some use cases that I would like to be more clarified, about Kafka topic partitioning -> spark …

apache-spark apache-kafka spark-streaming
How to pass data from Kafka to Spark Streaming?

I am trying to pass data from kafka to spark streaming. This is what I've done till now: Installed both …

apache-spark apache-kafka spark-streaming kafka-python
'Connection Refused' error while running Spark Streaming on local machine

I know there are many threads already on 'spark streaming connection refused' issues. But most of these are in Linux …

scala apache-spark spark-streaming
Spark streaming checkpoints for DStreams

In Spark Streaming it is possible (and mandatory if you're going to use stateful operations) to set the StreamingContext to …

apache-spark spark-streaming checkpointing
wordCounts.dstream().saveAsTextFiles("LOCAL FILE SYSTEM PATH", "txt"); does not write to file

I am trying to write JavaPairRDD into file in local system. Code below: JavaPairDStream<String, Integer> wordCounts = words.…

apache-spark streaming pyspark spark-streaming hadoop-streaming
Spark Dataframe validating column names for parquet writes (scala)

I'm processing events using Dataframes converted from a stream of JSON events which eventually gets written out as as Parquet …

apache-spark apache-spark-sql spark-streaming spark-dataframe parquet
How to create Spark RDD from an iterator?

To make it clear, I am not looking for RDD from an array/list like List<Integer> list = …

apache-spark spark-streaming