"parallelism hint" is used in storm to parallelise a running storm topology. I know there are concepts like worker process, executor and tasks. Would it make sense to make the parallelism hint as big as possible so that your topologies are parallelised as much as possible?
My question is How to find a perfect parallelism hint number for my storm topologies. Is it depending on the scale of my storm cluster or it's more like a topology/job specific setting, it varies from one topology to another? or it depends on both?
Adding to what @Chiron explained
"parallelism hint" is used in storm to parallelise a running storm topology
Actually in storm the term parallelism hint
is used to specify the initial number of executor (threads) of a component (spout, bolt) e.g
topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)
The above statement tells storm to allot 2 executor thread initially (this can be changed in the run time). Again
topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2).setNumTasks(4)
the setNumTasks(4)
indicate to run 4 associated tasks (this will be same throughout the lifetime of a topology). So in this case each storm will be running two tasks per executor. By default, the number of tasks is set to be the same as the number of executors, i.e. Storm will run one task per thread.
Would it make sense to make the parallelism hint as big as possible so that your topologies are parallelised as much as possible
One key thing to note that if you intent to run more than one tasks per executor it does not increase the level of parallelism. Because executor uses one single thread to process all the tasks i.e tasks run serially on an executor.
The purpose of configuring more than 1 task per executor is it is possible to change the number of executor(thread) using the re-balancing mechanism in the runtime (remember the number of tasks are always the same through out the life cycle of a topology) while the topology is still running.
Increasing the number of workers (responsible for running one or more executors for one or more components) might also gives you a performance benefit, but this also relative as I found from this discussion where nathanmarz says
Having more workers might have better performance, depending on where your bottleneck is. Each worker has a single thread that passes tuples on to the 0mq connections for transfer to other workers, so if you're bottlenecked on CPU and each worker is dealing with lots of tuples, more workers will probably net you better throughput.
So basically there is no definite answer to this, you should try different configuration based on your environment and design.