Using Scan in HBase with start row, end row and a filter

Andrea picture Andrea · Aug 23, 2012 · Viewed 14.3k times · Source

I need to use a Scan in HBase for scanning all rows that meet certain criteria: that's the reason why I will use a filter (really a compound filter list that includes two SingleColumnValueFilter). Now, I have my rowKeys structured in this way:

a.b.x|1|1252525  
a.b.x|1|2373273  
a.b.x|1|2999238  
...  
a.b.x|2|3000320  
a.b.x|2|4000023  
...  
a.b.y|1|1202002  
a.b.y|1|1778949  
a.b.y|1|2738273  

and as an additional requirement, I need to iterate only those rows having a rowKey starting with "a.b.x|1"

Now, the questions

  1. if I use an additional PrefixFilter in my filter list does the scanner always scan all rows (and on each of them applies the filter)?
  2. if I instantiate the Scan passing a startRow (prefix) and the filterlist (without the PrefixFilter), I understood that the scan starts from the given row prefix. So, assume I'm using an "a.b.x." as startRow, does the scan will scan also the a.b.y?
  3. What is the behaviour if I use new Scan(startRow, endRow) and then setFilter? In any words: what about the missing constructor Scan(byte [] start, byte [] end, Filter filter)?

Thanks in advance
Andrea

Answer

srav picture srav · Nov 1, 2012

Row keys are sorted(lexical) in hbase. Hence all the "a.b.x|1"s would come before "a.b.x|2"s and so on.. As rows keys are stored as byte arrays and are lexicographically sorted, be careful with non fixed length row keys and when you are mixing up different character classes. But for your requirement something on this lines should work:

Scan scan = new Scan(Bytes.toBytes("a.b.x|1"),Bytes.toBytes("a.b.x|2"); //creating a scan object with start and stop row keys

scan.setFilter(colFilter);//set the Column filters you have to this scan object.

//And then you can get a scanner object and iterate through your results
ResultScanner scanner = table.getScanner(scan);
for (Result result = scanner.next(); result != null; result = scanner.next())
{
    //Use the result object
}

update: ToBytes should be toBytes