I'm trying to tweak our system status check to see the state of the Solr nodes in our SolrCloud. I'm facing the following problems:
We send a query to each of the Solr nodes separately. If we get a response and the status of the response is 0, we assume the node is running. Unfortunately, we've seen cases in which the node is recovering or even down and select queries are still handled.
In hope to prevent this, we've added a check which sends a ping request to solr. If the status returned by this is request reads 'OK' we assume the node is up. Unfortunately even with this request, if the node is recovering or down, this check won't fail.
My question is: What is the correct way to check the status of a node in SolrCloud?
If you are using a SolrCloud, it's recommended to maintain an explicit zookeeper ensemble as well. Because zookeeper ensemble maintains the SolrCloud's current status of each node and each shard wise. This status is actually get reflected from the SolrCloud admin window.
/clusterstate.json
to view the SolrCloud status.This (clusterstate.json) json file holds the SolrCloud status information. Now if you are running an explicit zookeeper ensemble, following are the steps to get SolrCloud status.
./zkCli.sh -server ZK_IP:ZK_PORT
(E.g ./zkCli.sh -server localhost:2181)get /clusterstate.json
You'll find the SolrCloud status.
Note : ZK_IP - The HOST IP where zoopeeper is running. ZK_PORT - Zookeeper's client port.