Why many refer to Cassandra as a Column oriented database?

Question 1

Why many refer to Cassandra as a Column oriented database?

cassandra nosql data-modeling column-oriented wide-column-store

cesare · Oct 22, 2012 · Viewed 28.7k times · Source

Answer

Answer

If you take a look at the Readme file at Apache Cassandra git repo, it says that,

Cassandra is a partitioned row store. Rows are organized into tables with a required primary key.

Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster.

Row store means that like relational databases, Cassandra organizes data by rows and columns.

Column oriented or columnar databases are stored on disk column wise.

e.g: Table Bonuses table

   ID         Last    First   Bonus
   1          Doe     John    8000
   2          Smith   Jane    4000
   3          Beck    Sam     1000

In a row-oriented database management system, the data would be stored like this: 1,Doe,John,8000;2,Smith,Jane,4000;3,Beck,Sam,1000;
In a column-oriented database management system, the data would be stored like this:
1,2,3;Doe,Smith,Beck;John,Jane,Sam;8000,4000,1000;
Cassandra is basically a column-family store
Cassandra would store the above data as,

     "Bonuses" : {
           row1 : { "ID":1, "Last":"Doe", "First":"John", "Bonus":8000},
           row2 : { "ID":2, "Last":"Smith", "First":"Jane", "Bonus":4000}
           ...
     }

Read this for more details.

Question 2

Reading several papers and documents on internet, I found many contradictory information about the Cassandra data model. There are many which identify it as a column oriented database, other as a row-oriented and then who define it as a hybrid way of both.

According to what I know about how Cassandra stores file, it uses the *-Index.db file to access at the right position of the *-Data.db file where it is stored the bloom filter, column index and then the columns of the required row.

In my opinion, this is strictly row-oriented. Is there something I'm missing?

Why many refer to Cassandra as a Column oriented database?

Answer

Related questions