We are using HBase for storing data which is sqooped from oracle to hdfs. Here we designed the row key as byte array value. Row key is framed with composite key like (Md5(schema name).getBytes() + Md5(date (format = yyyy-mm-dd)).getBytes() + ByteBuffer.allocate(8).putLong(pkid).array())
. Here PKID is a long value.
If I want to get all the rows for a particular schema and for particular date, I can a query the hbase table using startrow and endrow, or any other way to query like this?
When I store my row key as string like user1_20130123
, ...
, user1_20130127
I am able to filter the table using
scan 'TempTable', {
COLUMNS => ['CF:NAME'],
LIMIT => 10,
STARTROW => 'user1_20100101',
ENDROW => 'user1_20100115'
}
Here I am getting the rows for user1 with in those dates. When I store the row key as like above how can I query?
You have a problem with your rowkeys, if you hash the date you won't be able to use it as a start/stop row for your scans.
Your rowkeys should be something like this:
[16B_schema_MD5_hash][8B_long_timestamp][8B_pkid]
Which you can query like this:
Scan myScan = new Scan(
Bytes.add(Bytes.toBytes(schemaNameMD5Hash), Bytes.toBytes(startTimestamp)),
Bytes.add(Bytes.toBytes(schemaNameMD5Hash), Bytes.toBytes(stopTimestamp))
);