I am trying to share the task among the multiple spouts. I have a situation, where I'm getting one tuple/message at a time from external source and I want to have multiple instances of a spout, main intention behind is to share the load and increase performance efficiency.
I can do the same with one Spout itself, but I want to share the load across multiple spouts. I am not able to get the logic to spread the load. Since the offset of messages will not be known until the particular spout finishes the consuming the part (i.e based on buffer size set).
Can anyone please put some bright light on the how to work-out on the logic/algorithm?
Advance Thanks for your time.
5
)builder.setSpout("spout", new KafkaSpout(cfg), 5);
Tested by flooding with 800 MB
data on each partition and it took ~22 sec
to finish read.
Again, used the code with parallelism_hint = 1
i.e. builder.setSpout("spout", new KafkaSpout(cfg), 1);
Now it took more ~23 sec
! Why?
According to Storm Docs setSpout() declaration is as follows:
public SpoutDeclarer setSpout(java.lang.String id,
IRichSpout spout,
java.lang.Number parallelism_hint)
where,
parallelism_hint - is the number of tasks that should be assigned to execute this spout. Each task will run on a thread in a process somewhere around the cluster.
I had come across a discussion in storm-user which discuss something similar.
Read Relationship between Spout parallelism and number of kafka partitions.
2 things to note while using kafka-spout for storm
So if we have a case where kafka partitions per host is configured as 1 and the number of hosts is 2. Even if we set the spout parallelism as 10, the max value which is repected will only be 2 which is the number of partitions.
How To mention the number of partition in the Kafka-spout?
List<HostPort> hosts = new ArrayList<HostPort>();
hosts.add(new HostPort("localhost",9092));
SpoutConfig objConfig=new SpoutConfig(new KafkaConfig.StaticHosts(hosts, 4), "spoutCaliber", "/kafkastorm", "discovery");
As you can see, here brokers can be added using hosts.add
and the partion number is specified as 4 in the new KafkaConfig.StaticHosts(hosts, 4)
code snippet.
How To mention the parallelism hint in the Kafka-spout?
builder.setSpout("spout", spout,4);
You can mention the same while adding your spout into the topology using setSpout
method. Here 4 is the parallelism hint.
More links that might help
Understanding-the-parallelism-of-a-Storm-topology
what-is-the-task-in-twitter-storm-parallelism
Disclaimer: !! i am new to both storm and java !!!! So pls edit/add if its required some where.