Storm topology configuration

blockcipher picture blockcipher · Aug 5, 2013 · Viewed 16k times · Source

How do you provide a custom configuration to a storm topology? For example, if I have a topology that I built that connects to a MySQL cluster and I want to be able to change which servers I need to to connect to without recompiling, how would I do that? My preference would be to use a config file, but my concern is that the file itself is not deployed to the cluster, therefore it won't be run (unless my understanding of how a cluster works is flawed). The only way I've seen so far to pass configuration options into a storm topology at runtime is via a command-line parameter, but that is messy when you get a good number of parameters.

One thought did have is to leverage a shell script to read the file into a variable and pass the contents of that variable in as a string to the topology, but I'd like something a little cleaner if possible.

Has anyone else encountered this? If so, how did you solve it?

EDIT:

It appears to need to provide more clarification. My scenario is that I have a topology that I want to be able to deploy in different environments without having to recompile it. Normally, I'd create a config file that contains things like database connection parameters and have that passed in. I'd like to know how to do something like that in Storm.

Answer

veroxii picture veroxii · Feb 23, 2014

You can specify a configuration (via a yaml file typically) which you submit with your topology. How we manage this ourselves in our own project is we have separate config files for development and one for production, and inside it we store our server, redis and db IPs and Ports etc. Then when we run our command to build the jar and submit the topology to storm it includes the correct config file depending on your deployment environment. The bolts and spouts simply read the configuration they require from the stormConf map which is passed to them in your bolt's prepare() method.

From http://storm.apache.org/documentation/Configuration.html :

Every configuration has a default value defined in defaults.yaml in the Storm codebase. You can override these configurations by defining a storm.yaml in the classpath of Nimbus and the supervisors. Finally, you can define a topology-specific configuration that you submit along with your topology when using StormSubmitter. However, the topology-specific configuration can only override configs prefixed with "TOPOLOGY".

Storm 0.7.0 and onwards lets you override configuration on a per-bolt/per-spout basis.

You'll also see on http://nathanmarz.github.io/storm/doc/backtype/storm/StormSubmitter.html that submitJar and submitTopology is passed a map called conf.

Hope this gets you started.