I am running a map-reduce job and now I want to enter values into hbase. I stream values from the map-reduce job over stdin and have a python script that inserts (puts) rows over happybase.
I am running into different kinds of problems, doing the put from python. The most recent problem, seems to have to do with library compatibility issues, as I understand it. The error log show problems with iteritems. The happybase manual refers to additional python libraries required for sorted queries, which are not necessary starting from python version 2.7 (I am running 2.7.6).
Did anyone encounter similar problems? Can they be easily fixed or would you recommend using a different interface?
More details
I have hadoop (2.6.0) and hbase (0.98.10 - 2/5/2015) installed and running in stand-alone configuration. They are started up. I can interface with hbase over the shell, create tables, put in values, and scan them.
I can scan and print tables from python over happybase, which shows at least that the connection works. But put always fails. This short example shows the problem:
For the sake of this example, my table is called test (created in the hbase shell). It has one column f1.
hbase(main)> create 't1','f1'
hbase(main)> put 't1','1','f1','hello'
Now python:
>>> import happybase
>>> connection = happybase.Connection('localhost')
>>> table = connection.table('t1')
>>> print(table.row('1')) # {'f1:': 'hello'}
>>> table.put('2',{'f1','hey'}) # fails, see log
Even more details:
Thrift is running.
# hbase thrift start -threadpool
hduser@box> hbase -version
java version "1.8.0_31" Java(TM) SE Runtime Environment (build 1.8.0_31-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
Error log:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-15-56dab4cd31ef> in <module>()
----> 1 table.put('2',{'f1','hey'})
/usr/local/lib/python2.7/dist-packages/happybase/table.pyc in put(self, row, data, timestamp, wal)
437 """
438 with self.batch(timestamp=timestamp, wal=wal) as batch:
--> 439 batch.put(row, data)
440
441 def delete(self, row, columns=None, timestamp=None, wal=True):
/usr/local/lib/python2.7/dist-packages/happybase/batch.pyc in put(self, row, data, wal)
81 value=value,
82 writeToWAL=wal)
---> 83 for column, value in data.iteritems())
84
85 self._mutation_count += len(data)
AttributeError: 'set' object has no attribute 'iteritems'
Happybase author here.
This line in your code contains an error:
>>> table.put('2',{'f1','hey'}) # fails, see log
The {'f1', 'hey'}
is a set literal, while you should pass a dict instead. I assume you meant this?
>>> table.put('2',{'f1': 'hey'})