I am trying to calculate the the partition size for each row in a table with arbitrary amount of columns and types using a formula from the Datastax Academy Data Modeling Course.
In order to do that I need to know the "size in bytes" for some common Cassandra data types. I tried to google this but I get a lot of suggestions so I am puzzled.
The data types I would like to know the byte size of are:
Any other considerations would of course also be appreciated regarding data types sizes in Cassandra.
Adding more info since it seems confusing to understand that I am only trying to estimate the "worst scenario disk usage" the data would occupy with out any compressions and other optimizations done by Cassandra behinds the scenes.
I am following the Datastax Academy Course DS220 (see link at end) and implement the formula and will use the info from answers here as variables in that formula.
https://academy.datastax.com/courses/ds220-data-modeling/physical-partition-size
I think, from a pragmatic point of view, that it is wise to get a back-of-the-envelope estimate of worst case using the formulae in the ds220 course up-front at design time. The effect of compression often varies depending on algorithms and patterns in the data. From ds220 and http://cassandra.apache.org/doc/latest/cql/types.html:
uuid: 16 bytes
timeuuid: 16 bytes
timestamp: 8 bytes
bigint: 8 bytes
counter: 8 bytes
double: 8 bytes
time: 8 bytes
inet: 4 bytes (IPv4) or 16 bytes (IPV6)
date: 4 bytes
float: 4 bytes
int 4 bytes
smallint: 2 bytes
tinyint: 1 byte
boolean: 1 byte (hopefully.. no source for this)
ascii: equires an estimate of average # chars * 1 byte/char
text/varchar: requires an estimate of average # chars * (avg. # bytes/char for language)
map/list/set/blob: an estimate
hope it helps