I am using spark 1.5.2
. I need to run spark streaming job with kafka as the streaming source. I need to read from multiple topics within kafka and process each topic differently.
I made the following observations, in case its helpful for someone:
Creating multiple streams would help in two ways: 1. You don't need to apply the filter operation to process different topics differently. 2. You can read multiple streams in parallel (as opposed to one by one in case of single stream). To do so, there is an undocumented config parameter spark.streaming.concurrentJobs*
. So, I decided to create multiple streams.
sparkConf.set("spark.streaming.concurrentJobs", "4");