Datomic Cloud Architecture

Datomic's data model - based on immutable facts stored over time - enables a physical design that is fundamentally different from traditional RDBMSs. Instead of processing all requests in a single server component, Datomic distributes transactions, queries, indexing, and caching to provide High Availability, horizontal scaling, and elasticity. Datomic also allows for dynamic assignment of compute resources to tasks without any kind of preassignment or sharding.

System

A complete Datomic installation is called a System. A System consists of:

../images/system-elided.png

Storage Resources

The durable elements managed by Datomic are called Storage Resources, including:

  • the DynamoDB Transaction Log
  • S3 storage of Indexes
  • an EFS cache layer
  • operational logs
  • A VPC and subnets in which computational resources will run

These resources are retained even when no computational resources are active, so you can shut down all the active elements of Datomic while maintaining your data.

../images/storage-resources.png

How Datomic Uses Storage

Datomic leverages the attributes of multiple AWS storage options to satisfy its semantic and performance characteristics. As indicated in the tables below, different AWS storage services provide different latencies, costs, and semantic behaviors.

Datomic utilizes a stratified approach to provide high performance, low cost, and strong reliability guarantees. Specifically:

  • ACID semantics are ensured via conditional writes with DynamoDB
  • S3 provides highly reliable low cost persistence
  • EFS and EC2 instance SSD storage provide very fast local caching

Stratified Durability

PurposeTechnology
ACIDDynamoDB
Storage of RecordS3
CacheMemory > SSD > EFS
ReliabilityS3 + DDB + EFS
TechnologyProperties
DynamoDBlow-latency CAS
S3low-cost, high reliability
EFSdurable cache survives restarts
Memory & SSDspeed

This multi-layered persistence architecture ensures high reliability, as data missing from any given layer can be recovered from deeper within the stack, as well as excellent cache locality and latency via the multi-level distributed cache.

Primary Compute Resources

Every running system has a single set of Primary Compute Resources which provide computational resources and a means to access those resources. Primary Compute Resources consist of:

  • Primary Compute Nodes dedicated to transactions, indexing, and caching.
  • Route53 and/or Application Load Balancer (ALB) endpoints
  • a Bastion Server

The specific composition of Primary Compute Resources is determined by your choice of Topology: either Solo or Production. The Datomic programming model is entirely the same in both Solo and Production Topologies.

Solo Topology

../images/solo-topology-2.png

The Solo Topology provides an inexpensive way to access Datomic's full programming model for development, testing, and personal projects. The Solo Topology includes Storage Resources plus the following Primary Compute Resources:

  • a dedicated VPC
  • a Route53 endpoint
  • a single t2.small Node
  • a Bastion Server

Production Topology

../images/production-topology-2.png

The Production Topology includes Storage Resources plus the following Primary Compute Resources:

  • a Route53 endpoint
  • an Application Load Balancer for High Availability (HA)
  • two or more i3.large Nodes
  • a Bastion Server

The Production Topology also allows for the addition of optional Query Groups for different loads (e.g. transactional vs. analytic vs. developer queries). Query Groups can AutoScale to meet changes in demand.

Nodes

Nodes are the computational resources for a Datomic System. Nodes provide

Security

Datomic is designed to follow AWS security best practices, including:

  • All authorization is performed using AWS HMAC, with key transfer via S3, enabling access control governed by IAM roles.
  • All data in Datomic is encrypted at rest using AWS KMS.
  • All Datomic resources are isolated in a private VPC, with optional access through a network bastion.
  • EC2 instances run in an IAM role configured for least privilege.

Bastion

For security, Datomic nodes all run inside a dedicated VPC that is not accessible from the internet. To provide e.g. developer access to a system, you can configure a bastion server that is open to a range of IP addresses outside the Datomic VPC, forwarding traffic to Datomic nodes.