Datomic Cloud Architecture
Datomic's data model - based on immutable facts stored over time - enables a physical design that is fundamentally different from traditional RDBMSs. Instead of processing all requests in a single server component, Datomic distributes transactions, queries, indexing, and caching to provide high availability, horizontal scaling, and elasticity. Datomic also allows for dynamic assignment of compute resources to tasks without any kind of preassignment or sharding.
The durable elements managed by Datomic are called Storage Resources, including:
- the DynamoDB Transaction Log
- S3 storage of Indexes
- an EFS cache layer
- operational logs
- A VPC and subnets in which computational resources will run
These resources are retained even when no computational resources are active, so you can shut down all the active elements of Datomic while maintaining your data.
Datomic leverages the attributes of multiple AWS storage options to satisfy its semantic and performance characteristics. As indicated in the tables below, different AWS storage services provide different latencies, costs, and semantic behaviors.
Datomic utilizes a stratified approach to provide high performance, low cost, and strong reliability guarantees. Specifically:
- ACID semantics are ensured via conditional writes with DynamoDB
- S3 provides highly reliable low cost persistence
- EFS and EC2 instance SSD storage provide very fast local caching
|Storage of Record||S3|
|Cache||Memory > SSD > EFS|
|Reliability||S3 + DDB + EFS|
|S3||low-cost, high reliability|
|EFS||durable cache survives restarts|
|Memory & SSD||speed|
This multi-layered persistence architecture ensures high reliability, as data missing from any given layer can be recovered from deeper within the stack, as well as excellent cache locality and latency via the multi-level distributed cache.
Every running system has a single primary compute stack which provides computational resources and a means to access those resources. A Primary Compute Stack consists of:
- a primary compute group dedicated to transactions, indexing, and caching.
- Route53 and/or Network Load Balancer (NLB) endpoints
- a Bastion Server
The specific composition of the primary compute group is determined by your choice of Topology: either Solo or Production. The Datomic programming model is entirely the same in both Solo and Production Topologies.
The Solo Topology provides an inexpensive way to access Datomic's full programming model for development, testing, and personal projects. The Solo Topology includes Storage Resources plus the a Primary Compute Stack with:
- a dedicated VPC
- a Route53 endpoint
- a single t2.small Node
- a Bastion Server
Data outlives code, and database systems often serve more than one application. Each application can have its own:
- computational requirements
- cacheable working set
A query groups is an independent unit of computation and caching that is a distinct application deployment target. Each query group
- extends the abilities of an existing production topology system
- is a deployment target for its own distinct application code
- has its own clustered nodes
- manages its own working set cache
- can elastically autoscale application reads without any up-front planning or sharding
Query groups deliver the entire semantic model of Datomic. In particular:
- Client code does not know or care whether it is talking to the primary compute group or to a query group.
- Query groups read Datomic data at memory speeds, just as the primary compute group does.
You can add, modify, or remove query groups at any time. For example, you might initially release a transactional application that uses only a primary compute group. Later, you might decide to split out multiple query groups:
- an autoscaling query group for transactional load
- a fixed query group with one large instance for analytic queries
- a fixed query group with a smaller instance for support
A Datomic application manages deployments for software that you design to perform a group of related tasks or activites. Every Datomic compute group is associated with an application that can be used as follows:
- A Datomic ion is your application code, plus a tiny amount of configuration.
- The push operation creates a revision, packaging your ion so that is ready for reproducible deployment.
- The deploy operation create a deployment, installing a revision onto a compute group.
Datomic Cloud is designed to be a complete solution for Clojure application development on AWS. In particular, you can:
- develop and test with realtime feedback at a local REPL
- rapidly deploy to AWS with no downtime
- reproducibly deploy across different development stages
- deploy multiple applications that share a common Datomic system
- elastically scale your entire application instead of many separate elements
- automatically generate AWS Lambda entry points without writing any Lambda code
Datomic is designed to follow AWS security best practices, including:
- All authorization is performed using AWS HMAC, with key transfer via S3, enabling access control governed by IAM roles.
- All data in Datomic is encrypted at rest using AWS KMS.
- All Datomic resources are isolated in a private VPC, with optional access through a network bastion.
- EC2 instances run in an IAM role configured for least privilege.
For security, Datomic nodes all run inside a dedicated VPC that is not accessible from the internet. To provide e.g. developer access to a system, you can configure a bastion server that is open to a range of IP addresses outside the Datomic VPC, forwarding traffic to Datomic nodes.