I've been looking into Datomic, and it looks really interesting. But while there seems to be very good information on how Datomic works technically, I have not seen much on how one should think about data modeling.
What are some best practices for data modeling in Datomic? Are there any good resources on the subject?
As Datomic is new and my experience with it is limited, this answer shouldn't be considered best practices in any way. Take this instead as an intro to Datomic for those with a relational background and a hankering for a more productive data store.
In Datomic, you model your domain data as Entities that possess Values for Attributes. Because a reference to another Entity can be the Value of an Attribute, you can model Relationships between Entities simply.
At first look, this isn't all that different from the way data is modeled in a traditional relational database. In SQL, table rows are Entities and a table's columns name Attributes that have Values. A Relationship is represented by a foreign key Value in one table row referencing the primary key Value of another table row.
This similarity is nice because you can just sketch out your traditional ER diagrams when modeling your domain. You can rely on relationships just like you would in a SQL database, but don't need to mess around with foreign keys since that's handled for you. Writes in Datomic are transactional and your reads are consistent. So you can separate your data into entities at whatever granularity feels right, relying on joins to provide the bigger picture. That's a convenience you lose with many NoSQL stores, where it's common to have BIG, denormalized entities to achieve some useful level of atomicity during updates.
At this point, you're off to a good start. But Datomic is much more flexible than a SQL database.
Time is inherently part of all Datomic data, so there is no need to specifically include the history of your data as part of your data model. This is probably the most talked about aspect of Datomic.
In Datomic, your schema is not rigidly defined in the "rectangular shape" required by SQL. That is, an entity1 can have whatever attributes it needs to satisfy your model. An entity need not have NULL
or default values for attributes that don't apply to it. And you can add attributes to a particular, individual entity as you see fit.
So you can change the shape of individual entities over the course of time to be responsive to changes in your domain (or changes to your understanding of the domain). So what? This is not unlike Document Stores like MongoDB and CouchDB.
The difference is that with Datomic you can enact schema changes atomically over all affected entities. Meaning that you can issue a transaction to update the shape of all entities, based upon arbitrary domain logic, written in your language[2], that will execute without affecting readers until committed. I'm not aware of anything close to this sort of power in either the relational or document store spaces.
Your entities are not rigidly defined as "living in a single table" either. You decide what defines the "type" of an entity in Datomic. You could choose to be explicit and mandate that every entity in your model will have a :table
attribute that connotes what "type" it is. Or your entities can conform to any number of "types" simply by satisfying the attribute requirements of each type.
For example, your model could mandate that:
:name
, :ssn
, :dob
:name
, :title
, :salary
:name
, :address
:id
, :plan
, :expiration
Which means an entity like me:
{:name "Brian" :ssn 123-45-6789 :dob 1976-09-15
:address "400 South State St, Chicago, IL 60605"
:id 42 :plan "Basic" :expiration 2012-05-01}
can be inferred to be a Person
, a Resident
and a Member
but NOT an Employee
.
Datomic queries are expressed in Datalog and can incorporate rules expressed in your own language, referencing data and resources that are not stored in Datomic. You can store Database Functions as first-class values inside of Datomic. These resemble Stored Procedures in SQL, but can be manipulated as values inside of a transaction and are also written in your language. Both of these features let you express queries and updates in a more domain-centric way.
Finally, the impedance mismatch between the OO and relational worlds has always frustrated me. Using a functional, data-centric language (Clojure) helps with that, but Datomic looks to provide a durable data store that doesn't require mental gymnastics to bridge from code to storage.
As an example, an entity fetched from Datomic looks and acts like a Clojure (or Java) map. It can be passed up to higher levels of an application without translation into an object instance or general data structure. Traversing that entity's relationships will fetch the related entities from Datomic lazily. But with the guarantee that they will be consistant with the original query, even in the face of concurrent updates. And those entities will appear to be plain old maps nested inside the first entity.
This makes data modeling more natural and much, much less of a fight in my opinion.
Conflicting attributes
The example above illustrates a potential pitfall in your model. What if you later decide that :id
is also an attribute of an Employee
? The solution is to organize your attributes into namespaces. So you would have both :member/id
and :employee/id
. Doing this ahead of time helps avoid conflict later on.
An attribute's definition can't be changed (yet)
Once you've defined an attribute in your Datomic as a particular type, indexed or not, unique, etc. you can't change that later. We're talking ALTER TABLE ALTER COLUMN
in SQL parlance here. For now, you could create a replacement attribute with the right definition and move your existing data.
This may sound terrible, but it's not. Because transactions are serialized, you can submit one that creates the new attribute, copies your data to it, resolves conflicts and removes the old attribute. It will run without interference from other transactions and can take advantage of domain-specific logic in your native language to do it's thing. It's essentially what an RDBMS is doing behind the scenes when you issue an ALTER TABLE
, but you name the rules.
Don't be "a kid in a candy store"
Flexible schema doesn't mean no data model. I'd advise some upfront planning to model things in a sane way, same as you would for any other data store. Leverage Datomic's flexibility down the road when you have to, not just because you can.
Avoid storing large, constantly changing data
Datomic isn't a good data store for BLOBs or very large data that's constantly changing. Because it keeps a historical record of previous values and there isn't a method to purge older versions (yet). This kind of thing is almost always a better fit for an object store like S3. Update: There is a way to disable history on a per-attribute basis. Update: There is now also a way to excise data; however, storing references to external objects rather than the objects themselves may still stil be the best approach to handling BLOBs. Compare this strategy with using byte arrays.