I have been playing with Cloudera and I define the number of clusters before I start my job then use the cloudera manager to make sure everything is running.
I’m working on a new project that instead of using hadoop is using message queues to distribute the work but the results of the work are stored in HBase. I might launch 10 servers to process the job and store to Hbase but I’m wondering if I later decided to add a few more worker nodes can I easily (read: programmable) make them automatically connect to the running cluster so they can locally add to clusters HBase/HDFS?
Is this possible and what would I need to learn in order to do it?