Have been reading up on Hadoop and HBase lately, and came across this term-
HBase is an open-source, distributed, sparse, column-oriented store...
What do they mean by sparse? Does it have something to do with a sparse matrix? I am guessing it is a property of the type of data it can store efficiently, and hence, would like to know more about it.
In a regular database, rows are sparse but columns are not. When a row is created, storage is allocated for every column, irrespective of whether a value exists for that field (a field being storage allocated for the intersection of a row and and a column).
This allows fixed length rows greatly improving read and write times. Variable length data types are handled with an analogue of pointers.
Sparse columns will incur a performance penalty and are unlikely to save you much disk space because the space required to indicate NULL is smaller than the 64-bit pointer required for the linked-list style of chained pointer architecture typically used to implement very large non-contiguous storage.
Storage is cheap. Performance isn't.