What is the byte size of common Cassandra data types - To be used when calculating partition disk usage?

nicgul picture nicgul · Oct 17, 2016 · Viewed 8k times · Source

I am trying to calculate the the partition size for each row in a table with arbitrary amount of columns and types using a formula from the Datastax Academy Data Modeling Course.

In order to do that I need to know the "size in bytes" for some common Cassandra data types. I tried to google this but I get a lot of suggestions so I am puzzled.

The data types I would like to know the byte size of are:

  • A single Cassandra TEXT character (I googled answers from 2 - 4 bytes)
  • A Cassandra DECIMAL
  • A Cassandra INT (I suppose it is 4 bytes)
  • A Cassandra BIGINT (I suppose it is 8 bytes)
  • A Cassandra BOOELAN (I suppose it is 1 byte, .. or is it a single bit)

Any other considerations would of course also be appreciated regarding data types sizes in Cassandra.

Adding more info since it seems confusing to understand that I am only trying to estimate the "worst scenario disk usage" the data would occupy with out any compressions and other optimizations done by Cassandra behinds the scenes.

I am following the Datastax Academy Course DS220 (see link at end) and implement the formula and will use the info from answers here as variables in that formula.



James Fremen picture James Fremen · Jan 18, 2017

I think, from a pragmatic point of view, that it is wise to get a back-of-the-envelope estimate of worst case using the formulae in the ds220 course up-front at design time. The effect of compression often varies depending on algorithms and patterns in the data. From ds220 and http://cassandra.apache.org/doc/latest/cql/types.html:

uuid: 16 bytes
timeuuid: 16 bytes
timestamp: 8 bytes
bigint: 8 bytes
counter: 8 bytes
double: 8 bytes
time: 8 bytes
inet: 4 bytes (IPv4) or 16 bytes (IPV6)
date: 4 bytes
float: 4 bytes
int 4 bytes
smallint: 2 bytes
tinyint: 1 byte
boolean: 1 byte (hopefully.. no source for this)
ascii: equires an estimate of average # chars * 1 byte/char
text/varchar: requires an estimate of average # chars * (avg. # bytes/char for language)
map/list/set/blob: an estimate

hope it helps