Planning Your System
IMPORTANT - Datomic Cloud is configured with a set of well-tested default parameters for scaling and provisioning. You should NOT alter system settings (i.e. storage provisioning and scaling, changes to network or instance configurations, alterations to the default CloudFormation templates) without first contacting Datomic Support unless the changes are explicitly described in the Datomic Documentation.
Datomic can grow as your application(s) grow. This document walks you through a series of topologies to address specific objectives.
|moving to production||production|
|separate development stages||query groups for stages|
|elastic autoscaling||query groups for scaling|
|multiple applications, microservices||multiple applications|
|production isolation||separate production topologies|
|transaction functions||primary compute group|
In the descriptions below, each of the topologies builds on the previous one. If your system doesn't have all these objectives, simply omit the elements that are not relevant to you.
Naming Systems and Applications
System and application names cannot be changed, so choose good names. Points to consider:
- Datomic must concatenate various names to create unique AWS resource names. Keep names short, e.g. fewer than 24 characters.
- Multiple systems might serve the same primary database in different development stages. So you might name systems via the convention "[db]-[stage]", e.g."inventory-dev", "inventory-staging" and "inventory-prod".
- A system can serve more than one application via different query groups, although one application is often primary. So you might name applications "[db](-[app])", e.g. "inventory" and "inventory-analytics".
- Datomic often prepends the string "datomic-[system]" to AWS resources names, so don't also name your system "datomic".
Getting Started with Solo
The Solo topology provides an inexpensive way to access Datomic's full programming model for development, testing, and personal projects.
The developer workflow for a Solo system serving a web application is shown below. You can use the client API to develop directly against the system, and can install application code via commit+push+deploy.
Solo serves the entire Datomic model from a single small compute node. When you need more capability, you can upgrade to Production.
Moving to Production
Like Solo, the Production topology provides the entire semantic model of Datomic. Compared to Solo, Production also provides:
- high availability (HA)
- more compute capacity
- more memory
Additional compute capacity and memory allow you to:
- create larger databases
- serve a larger number of databases
- serve more concurrent requests
- process queries and transactions faster
The developer workflow in a basic Production system is the same as with Solo:
Query Groups for Development Stages
Writing an application takes place in multiple overlapping stages, e.g. development, testing, staging, CI, and production.
In Datomic, you can create a separate query group for each development stage, as shown in the diagram below:
- When you create each new query group, you will have a chance to set its associated application name. Choose the same name used by the development system! All the resources associated with the same application share a color (green) in the diagram.
- You can then deploy a different revision of the application code to each group. In the diagram production is running stable r53, while the other query groups are trying out newer versions.
- You can use parameters to let each group (and each developer client) work against different databases.
Note that each query group can be independently sized and scaled to balance cost with operational requirements. For example, the diagram shows the non-production stages using smaller t2.medium instances in an Auto Scaling Group limited to one instance per group.
Query Groups for Elastic Autoscaling
As your application volume increases, you may want to increase the query and cache capacity of your system. With Datomic, you can create query groups that elastically auto-scale your entire application as needed.
With this approach:
- Create a new query group with the same application name used by the original system.
- Direct production traffic to this query group, instead of the primary compute group. (Though you are no longer directing traffic to the primary compute group, Datomic will automatically use it for transactions, indexing, and caching).
Note that you do not have to worry about separately scaling database reads, caches, and application processing–the query group scales all three.
If your application serves more than one kind of load, you can create multiple query groups (not shown in the diagram) to isolate the loads from each other, and to give each load its own working set cache.
Systems are often composed of more than one application. Each application can have its own:
- computational requirements
- cache-able working set
Imagine that your system began as a web application, and now you want to perform some interactive analysis on the data you have accumulated over several years. The analysis code is totally different from your web application, and you certainly don't want the analysis code to compete with the web application for compute resources. On the other hand, you would like for your analysis code work against live, up-to-the-minute data. To accomplish this:
- Create a new analysis query group, but this time, choose a new application name.
That's it! You now have a new deployment target, and a new query group to run your new application. The diagram below shows the second (analysis) application in yellow:
As with all query groups, you have an independent choice about size and scale. The diagram shows the analysis group having a larger i3.xlarge instance to support larger queries, but only a single instance since the analysis team does not share the web application's requirement for HA.
As with the original application, you could of course make additional query groups (not shown in the diagram) for other stages of the analysis application, e.g. dev/CI/production.
Separate Production Topology for Isolation
The are many reasons that you might want to totally isolate production from all other stages, for example:
- production may be operated by a different team, and secured by different IAM policy
- production may have sensitive data that developers should not access
To accomplish this, simply create a second production topology stack. When you create the second stack and its query groups (if any), make sure that you select the application name(s) you used in the first system:
As you can see in the picture, applications (and therefore reproducible deployments) can span multiple Datomic systems.
Primary Compute Group for Transaction Functions
The primary compute group of a system performs all transactions, while queries can be performed by any compute group (primary or query).
This puts custom transaction functions in a special category: because they run inside a transaction, they must be deployed to the primary compute group.
- In a single-application system, simply include the code for any transaction functions along with your application code.
- In a multi-application system, you can create create a separate application containing the transaction functions needed by all the other applications.
Note: Transaction functions are pure functions, so you do not need to deploy them anywhere for testing. You can simply invoke them as ordinary code in your REPL or test suite.
The picture below shows the latter scenario. In addition to the
analysis applications, there is a new
tx application that
includes all transaction functions. A revision of this
is deployed to the primary compute group of each system.
Choosing EC2 Instance Sizes
Datomic runs on only certain EC2 instance sizes, selected and tested to meet the semantic and performance requirements of the system.
- The Solo topology is optimized for price, and runs only t2.small.
- The Production topology requires a fast SSD cache, and therefore runs on i3.large or i3.xlarge. i3.large is the default, and you should increase to i3.xlarge only for very high write volumes.
- Query groups serve more varied purposes, and therefore offer a larger set of choices: t2.medium, m5.large, i3.large, or i3.xlarge. t2.medium is the default, and is likely well-suited to non-production stages. m5.large adds compute and memory, and is a good fit for e.g. production web applications. The i3s add a local SSD cache, and are suitable for larger databases.