We have a computationally intensive service which is used to do a number of transformations. Its largely computationally bound (CPU bound) process. Essentially what happens is we have a message broker which sends messages to the processing service via Thrift.
Now we have multiple different processing services which run different algos to do processing on the messages - these messages are routed to one or more processing algos. Our message volumes are variable and so are the needs of the processing algos (i.e. we can get many messages that contain XYZ then send to algo 1 otherwise send to algo 2).
We would like to expand this into something that is horizontally scalable. So we have multiple nodes which are running the processing algos. Now depending on the messaging loads our Thrift requests should be sent to different servers (assume that all services are running an instance of each processing Algo1 to 3). Say for example we are getting a large number of messages which we want to process on Algo 1 then we have two servers running algo 1 and the 3rd server looks after requests for the other two algos (Algo 2 & 3).
So the system looks like this:
Client ----Request-------|
-----------|--------------------
| Coord & Load Balancer Service | ... like zookeeper
--------------------------------
<--|-->
| Route messages to servers...
Server1: Server2: Server 3:
Algo1 instance Algo1 instance Algo2 instance
Algo3 instance
All processes are written in Java.
So how easy would something like this be to setup using Zookeeper. I know that as we add or change algos we can easily use Zookeeper to handle the config side of things (i.e. servers listen for algo updates or additions and serve them as configured) but how do we manage the load-balancing aspect?
Cheers!
You guys probably want something like Norbert from LinkedIn: http://sna-projects.com/norbert/ They use persistent peer-to-peer communication between clients and servers and use zookeeper for service registry and out-of-band signaling. Pretty cool stuff. It enables you to just fire up another processing node that can help out to handle requests during high load.
/ Jonas