Why many refer to Cassandra as a Column oriented database?

cesare picture cesare · Oct 22, 2012 · Viewed 28.7k times · Source

Reading several papers and documents on internet, I found many contradictory information about the Cassandra data model. There are many which identify it as a column oriented database, other as a row-oriented and then who define it as a hybrid way of both.

According to what I know about how Cassandra stores file, it uses the *-Index.db file to access at the right position of the *-Data.db file where it is stored the bloom filter, column index and then the columns of the required row.

In my opinion, this is strictly row-oriented. Is there something I'm missing?

Answer

tharindu_DG picture tharindu_DG · Aug 5, 2016

Cassandra is a partitioned row store. Rows are organized into tables with a required primary key.

Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster.

Row store means that like relational databases, Cassandra organizes data by rows and columns.

  • Column oriented or columnar databases are stored on disk column wise.

    e.g: Table Bonuses table

       ID         Last    First   Bonus
       1          Doe     John    8000
       2          Smith   Jane    4000
       3          Beck    Sam     1000
    
  • In a row-oriented database management system, the data would be stored like this: 1,Doe,John,8000;2,Smith,Jane,4000;3,Beck,Sam,1000;

  • In a column-oriented database management system, the data would be stored like this:
    1,2,3;Doe,Smith,Beck;John,Jane,Sam;8000,4000,1000;

  • Cassandra is basically a column-family store

  • Cassandra would store the above data as,

     "Bonuses" : {
           row1 : { "ID":1, "Last":"Doe", "First":"John", "Bonus":8000},
           row2 : { "ID":2, "Last":"Smith", "First":"Jane", "Bonus":4000}
           ...
     }
  • Read this for more details.