ACID

Datomic transactions are ACID: Atomic, Consistent, Isolated, and Durable. This document defines the four components of ACID, explains how Datomic works, and explores the implications of Datomic's approach.

Atomicity

Atomicity requires that each transaction is "all or nothing". If one part of a transaction fails, the entire transaction fails, and the database is left unchanged.

A Datomic transaction is written to durable storage in a single atomic write, so there is no possibility of partial work. Inside peer processes, the in-memory representation of a database is a pointer to a persistent data structure, and is also updated via a single atomic operation.

Consistency

Consistency ensures that any transaction will take the database from one valid state to another. Datomic makes the following consistency guarantees:

  • Every peer sees completed transactions as of a particular point in time, called a time basis.
  • The time basis of transactions is a global ordering of transactions for a particular system. Peers always see all transactions up to their time basis, in order, with no gaps.

Datomic also provides first class support for accessing the time basis of information:

  • Every fact in the database knows its time basis via the transaction component of a datom.
  • A database value knows its time basis via Database.basisT.
  • Peers can synchronize on a time basis via Connection.sync.

To correctly implement domain models, transactions need to be able to derive new facts based on existing facts (e.g. adding to a bank balance), and to enforce domain-specific functional constraints (e.g. a valid account must include name, email, and password hash). Datomic's transaction functions support arbitrary functional constraints within transactions, including both optimistic and pessimistic concurrency strategies.

Isolation

The Isolation property ensures that concurrent transactions result in the same system state that would result if the transactions were executed serially.

Datomic is a single-writer system. A single thread in a single process is responsible for writing transactions. The Isolation property follows automatically from this, because there are no concurrent transactions. Transactions are always executed serially.

Durability

Durability means that once a transaction has been committed, it has been recorded in durable storage. Datomic is fully durable–it always awaits acknowledgment from storage before reporting that a transaction is complete.

How It Works

Datomic uses storage engines to store blocks (not individual datoms). Datomic keeps two trees of datoms in block storage:

  • The index is updated periodically in the background and contains datoms sorted in various orders.
  • The log is updated as part of every transaction and contains datoms grouped by transaction and sorted by time.

Both these trees are stored as sets of immutable values, and are compatible with eventually-consistent storage.

Pointers to the roots of trees are stored in durable references (refs). These refs are not compatible with eventual consistency. Therefore, refs are always updated by conditional put operations, which can ensure consistency of updates.

Conditional put is implemented in different ways, depending on the storage being used:

  • Couchbase uses conditional put (called check-and-set).
  • DynamoDB uses conditional put.
  • Infinispan uses conditional put (called replaceWithVersion).
  • Riak uses Zookeeper.
  • Cassandra uses lightweight transactions.
  • SQL uses transactions.

Note that Datomic is tuned for efficient writes, so the details are more complex than this overview of the basic concepts. In particular, a Datomic system under load uses a combination of batching, transient data structures, and persistent data structures so that the average number of writes per transaction can be one, or even less than one.

Implications

Datomic provides strong consistency of the entire database, while requiring only eventual consistency for the majority of actual writes.

Most Datomic writes are of tree nodes. These writes are compatible with eventually consistent storage, because the semantics of immutable values are beautifully simple: In an immutable system with no updates, there are only two possibilities:

  • a value is present
  • a value is not present yet

A few writes require the stronger semantics of conditional put:

  • Conditional put of the log root pointer guarantees consistency at the transaction level.
  • Conditional put of the index root pointer guarantees atomic adoption of a new index.

Another way to understand this is to consider the failure mode introduced by an eventually consistent node that is not up-to-date yet. Datomic will always see a correct log pointer, which was placed via conditional put. If some of the tree nodes are not yet visible underneath that pointer, Datomic is consistent but partially unavailable, and will become fully available when eventually happens.