I know there are many threads already on 'spark streaming connection refused' issues. But most of these are in Linux or at least pointing to HDFS. I am running this on my local laptop with Windows.
I am running a very simple basic Spark streaming standalone application, just to see how the streaming works. Not doing anything complex here:-
import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.SparkConf
object MyStream
{
def main(args:Array[String])
{
val sc = new StreamingContext(new SparkConf(),Seconds(10))
val mystreamRDD = sc.socketTextStream("localhost",7777)
mystreamRDD.print()
sc.start()
sc.awaitTermination()
}
}
I am getting the following error:-
2015-07-25 18:13:07 INFO ReceiverSupervisorImpl:59 - Starting receiver
2015-07-25 18:13:07 INFO ReceiverSupervisorImpl:59 - Called receiver onStart
2015-07-25 18:13:07 INFO SocketReceiver:59 - Connecting to localhost:7777
2015-07-25 18:13:07 INFO ReceiverTracker:59 - Registered receiver for stream 0 from 192.168.19.1:11300
2015-07-25 18:13:08 WARN ReceiverSupervisorImpl:92 - Restarting receiver with delay 2000 ms: Error connecting to localhost:7777
java.net.ConnectException: Connection refused
I have tried using different port numbers, but it doesn't help. So it keeps retrying in loop and keeps on getting same error. Does anyone have an idea?
Within the code for socketTextStream
, Spark creates an instance of SocketInputDStream
which uses java.net.Socket
https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/SocketInputDStream.scala#L73
java.net.Socket
is a client socket, which means it is expecting there to be a server already running at the address and port you specify. Unless you have some service running a server on port 7777 of your local machine, the error you are seeing is as expected.
To see what I mean, try the following (you may not need to set master
or appName
in your environment).
import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.SparkConf
object MyStream
{
def main(args:Array[String])
{
val sc = new StreamingContext(new SparkConf().setMaster("local").setAppName("socketstream"),Seconds(10))
val mystreamRDD = sc.socketTextStream("bbc.co.uk",80)
mystreamRDD.print()
sc.start()
sc.awaitTermination()
}
}
This doesn't return any content because the app doesn't speak HTTP to the bbc website but it does not get a connection refused exception.
To run a local server when on linux, I would use netcat with a simple command such as
cat data.txt | ncat -l -p 7777
I'm not sure what your best approach is in Windows. You could write another application which listens as a server on that port and sends some data.