Writing to hbase table from python (happybase)

BenjiMan picture BenjiMan · Feb 11, 2015 · Viewed 8.8k times · Source

I am running a map-reduce job and now I want to enter values into hbase. I stream values from the map-reduce job over stdin and have a python script that inserts (puts) rows over happybase.

I am running into different kinds of problems, doing the put from python. The most recent problem, seems to have to do with library compatibility issues, as I understand it. The error log show problems with iteritems. The happybase manual refers to additional python libraries required for sorted queries, which are not necessary starting from python version 2.7 (I am running 2.7.6).

Did anyone encounter similar problems? Can they be easily fixed or would you recommend using a different interface?

More details

I have hadoop (2.6.0) and hbase (0.98.10 - 2/5/2015) installed and running in stand-alone configuration. They are started up. I can interface with hbase over the shell, create tables, put in values, and scan them.

I can scan and print tables from python over happybase, which shows at least that the connection works. But put always fails. This short example shows the problem:

For the sake of this example, my table is called test (created in the hbase shell). It has one column f1.

hbase(main)> create 't1','f1'
hbase(main)> put 't1','1','f1','hello'

Now python:

>>> import happybase
>>> connection = happybase.Connection('localhost')
>>> table = connection.table('t1')
>>> print(table.row('1')) # {'f1:': 'hello'}
>>> table.put('2',{'f1','hey'}) # fails, see log

Even more details:

Thrift is running.

# hbase thrift start -threadpool


hduser@box> hbase -version

java version "1.8.0_31" Java(TM) SE Runtime Environment (build 1.8.0_31-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)

Error log:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-15-56dab4cd31ef> in <module>()
----> 1 table.put('2',{'f1','hey'})

/usr/local/lib/python2.7/dist-packages/happybase/table.pyc in put(self, row, data, timestamp, wal)
    437         """
    438         with self.batch(timestamp=timestamp, wal=wal) as batch:
--> 439             batch.put(row, data)
    440 
    441     def delete(self, row, columns=None, timestamp=None, wal=True):

/usr/local/lib/python2.7/dist-packages/happybase/batch.pyc in put(self, row, data, wal)
     81                 value=value,
     82                 writeToWAL=wal)
---> 83             for column, value in data.iteritems())
     84 
     85         self._mutation_count += len(data)

AttributeError: 'set' object has no attribute 'iteritems'

Answer

wouter bolsterlee picture wouter bolsterlee · May 3, 2015

Happybase author here.

This line in your code contains an error:

>>> table.put('2',{'f1','hey'}) # fails, see log

The {'f1', 'hey'} is a set literal, while you should pass a dict instead. I assume you meant this?

>>> table.put('2',{'f1': 'hey'})