Top "Spark-streaming" questions

Spark Streaming is an extension of the core Apache Spark API that enables high-throughput, fault-tolerant stream processing of live data streams.

Use Spring together with Spark

I'm developing a Spark Application and I'm used to Spring as a Dependency Injection Framework. Now I'm stuck with the …

java spring apache-spark spark-streaming
How to optimize shuffle spill in Apache Spark application

I am running a Spark streaming application with 2 workers. Application has a join and an union operations. All the batches …

apache-spark spark-streaming apache-spark-1.4
Spark DataFrame: does groupBy after orderBy maintain that order?

I have a Spark 2.0 dataframe example with the following structure: id, hour, count id1, 0, 12 id1, 1, 55 .. id1, 23, 44 id2, 0, 12 id2, 1, 89 .. id2, 23, 34 etc. …

scala apache-spark apache-spark-sql spark-streaming spark-dataframe
Queries with streaming sources must be executed with writeStream.start();

I'm trying to read the messages from kafka (version 10) in spark and trying to print it. import spark.implicits._ val …

scala apache-spark-sql spark-streaming
Error: Could not find or load main class org.test.spark.streamExample

I was trying to execute sample basic sparkstreaming example in Scala IDE, but I am getting below error: Error: Could …

spark-streaming scala-ide
Could not parse Master URL: 'spark:http://localhost:18080'

When I'm trying to run my code it throws this Exception: Exception in thread "main" org.apache.spark.SparkException: Could …

java twitter spark-streaming
how do i delete files in hdfs directory after reading it using scala

I use fileStream to read files in the hdfs directory from Spark (streaming context). In case my Spark shut down …

scala hadoop apache-spark spark-streaming
How to fix Connection reset by peer message from apache-spark?

I keep getting the the following exception very frequently and I wonder why this is happening? After researching I found …

apache-spark spark-streaming
How to specify which java version to use in spark-submit command?

I want to run a spark streaming application on a yarn cluster on a remote server. The default java version …

java yarn spark-streaming