Probably there are a lot of similar questions but they dont' answer to my scenario (at least I'm not able to get the point).
I have, lets say, a table in HBase with 4 column families. Main reason is that each column family has different VERSIONS attribute (very different).
All column of all families are not storing big data (such for example fulltexts) but an average of 1KB (identifiers that are long, some short strings, integers and so on)
I need to access data in several ways: scan and get by column family, get all cells of a given row by version (specific version or a range), and last but not least: get the latest version of all columns of a given row.
So, what are, in this scenario, the disadvantages of having 4 column families? Does reads are less efficient because they operate (in case the row is not in memory) on different store files?
There is a limit to the number of column families in HBase. There is one MemStore(Its a write cache which stores new data before writing it into Hfiles) per Column Family, when one is full, they all flush.
The more you add column families there will be more MemStore created and Memstore flush will be more frequent. It will degrade the performance.