HBase shell scan bytes to string conversion

Yuri Levinsky picture Yuri Levinsky · Jul 18, 2013 · Viewed 16k times · Source

I would like to scan hbase table and see integers as strings (not their binary representation). I can do the conversion but have no idea how to write scan statement by using Java API from hbase shell:

org.apache.hadoop.hbase.util.Bytes.toString(
  "\x48\x65\x6c\x6c\x6f\x20\x48\x42\x61\x73\x65".to_java_bytes)

 org.apache.hadoop.hbase.util.Bytes.toString("Hello HBase".to_java_bytes)

I will be very happy to have examples of scan, get that searching binary data (long's) and output normal strings. I am using hbase shell, not JAVA.

Answer

Lorand Bendig picture Lorand Bendig · Jul 21, 2013

HBase stores data as byte arrays (untyped). Therefore if you perform a table scan data will be displayed in a common format (escaped hexadecimal string), e.g:
"\x48\x65\x6c\x6c\x6f\x20\x48\x42\x61\x73\x65" -> Hello HBase

If you want to get back the typed value from the serialized byte array you have to do this manually. You have the following options:

  • Java code (Bytes.toString(...))
  • hack the to_string function in $HBASE/HOME/lib/ruby/hbase/table.rb : replace toStringBinary with toInt for non-meta tables
  • write a get/scan JRuby function which converts the byte array to the appropriate type

Since you want it HBase shell, then consider the last option:
Create a file get_result.rb :

import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.HTable
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Result;
import java.util.ArrayList;

# Simple function equivalent to scan 'test', {COLUMNS => 'c:c2'}
def get_result()
  htable = HTable.new(HBaseConfiguration.new, "test")
  rs = htable.getScanner(Bytes.toBytes("c"), Bytes.toBytes("c2"))
  output = ArrayList.new
  output.add "ROW\t\t\t\t\t\tCOLUMN\+CELL"
  rs.each { |r| 
    r.raw.each { |kv|
      row = Bytes.toString(kv.getRow)
      fam = Bytes.toString(kv.getFamily)
      ql = Bytes.toString(kv.getQualifier)
      ts = kv.getTimestamp
      val = Bytes.toInt(kv.getValue)
      output.add " #{row} \t\t\t\t\t\t column=#{fam}:#{ql}, timestamp=#{ts}, value=#{val}"
    }
  }
  output.each {|line| puts "#{line}\n"}
end

load it in the HBase shell and use it:

require '/path/to/get_result'
get_result

Note: modify/enhance/fix the code according to your needs