Hbase client ConnectionLoss for /hbase error

CharlesS picture CharlesS · May 27, 2011 · Viewed 44.4k times · Source

I'm going completely crazy:

Installed Hadoop/Hbase, all is running;

/opt/jdk1.6.0_24/bin/jps
23261 ThriftServer
22582 QuorumPeerMain
21969 NameNode
23500 Jps
23021 HRegionServer
22211 TaskTracker
22891 HMaster
22117 SecondaryNameNode
21779 DataNode
22370 Main
22704 JobTracker

Pseudo distributed environment.

hbase shell

is working and coming up with correct results running 'list' and;

hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.90.1-cdh3u0, r, Fri Mar 25 16:10:51 PDT 2011

hbase(main):001:0> status
1 servers, 0 dead, 8.0000 average load

When connecting via ruby & thrift, everything is working fine; we are adding data, it's getting in the system, we can query/scan it. Everything seems fine.

However, when connecting with Java:

groovy> import org.apache.hadoop.hbase.HBaseConfiguration 
groovy> import org.apache.hadoop.hbase.client.HBaseAdmin 
groovy> conf = HBaseConfiguration.create() 
groovy> conf.set("hbase.master","127.0.0.1:60000"); 
groovy> hbase = new HBaseAdmin(conf); 

Exception thrown

org.apache.hadoop.hbase.ZooKeeperConnectionException: org.apache.hadoop.hbase.ZooKeeperConnectionException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1000)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:303)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:294)
    at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:156)
    at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:84)

I've been trying to find the cause, but I really have no clue at all. Everything seems to be correctly installed.

netstat -lnp|grep 60000
tcp6       0      0 :::60000                :::*                    LISTEN      22891/java  

Looks fine as well.

# telnet localhost 60000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

Connects and dies if you type anything + enter (not sure if that's the idea, thrift on 9090 does the same).

Can anyone help me?

Answer

Cosmin Lehene picture Cosmin Lehene · May 30, 2011

This is a Zookeeper(ZK) error. The HBase client tries to get the /hbase node from Zookeeper and fails.

You can get a ZK dump from the HBase master web interface. You should see all the connections to ZK and figure out if something is exhausting them.

Before diving into anything else you could try restarting your ZK cluster and see if it fixes your problem. (It's strange that you see that with a single client).

HBase has a setting to increase the number of connections to ZK. It's

hbase.zookeeper.property.maxClientCnxns

There were a few updates (see below) lately related to the default number of connections (there's a hbase-default.xml file that has all the default configurations). You can override this in your hbase-site.xml file (under HBase conf dir) and raise it to 100 or more. But make sure you're not masking the real problem this way, you shouldn't see this problem with a single client.

We've had a similar situation, but it was happening during heavy operations from map-reduce jobs, after upgrading to HBase-0.90.

Here are a couple of issue related to your problem:

If you still can't figure it out send an email to the hbase-users list or join the #hbase channel on freenode and ask live questions.