Loadbalancer and Solrcloud

kee picture kee · Mar 20, 2014 · Viewed 8.1k times · Source

I am wondering how loadbalancer can be set up on top of SolrCloud or a load-balancer is not needed?

If the former, shard leaders need to be added to the loadbalancer? Then what if the shard leader changes for some reason? Or all machines in the cluster (including replica) better be added to the load balancer?

If the latter, I guess a cname needs to point to the SolrCloud cluster and it should be round robin DNS?

Any advice from some actual Solrcloud operation experience would be really appreicated.

Answer

ymonad picture ymonad · Mar 20, 2014

Usually SolrCloud is used with combination of ZooKeeper, the client uses CloudSolrServer to access to SolrCloud.

The query will be done in following flow.

Note that I only read the source code of Solr partially and there are lot of guesses. Also what I read was source code of Solr 4.1, so it might be outdated.

  1. ZooKeeper holds the list of IPAddress:Port of all SolrCloud servers.
  2. (Client Side) The instance of CloudSolrServer retrieves the list of servers from ZooKeeper.
  3. (Client Side) The instance of CloudSolrServer chooses one of SolrCloud server randomly and sends query to it. (Also LBHttpSolrServer chooses the server in round-robin?)
  4. (Server Side) The SolrCloud server which recieved the query chooses randomly from replica of shards (one server per shard) from server list and redirects the query to it. (Note that all the SolrCloud server holds the server list which can be recieved from ZooKeeper)

The update will be done in same manner as above but also be populated to all servers.

Note that as for SolrCloud, the leader and replica has small difference and we can send query/update to any of the server. It is automatically redirected to other servers.

In short, the loadbalancing is done in both client side and server side. So you don't need to worry about it.