Planning Your System
Datomic can grow as your application(s) grow. This document walks you through a series of topologies to address specific objectives.
|moving to production||production|
|separate development stages||query groups for stages|
|elastic autoscaling||query groups for scaling|
|multiple applications, microservices||multiple applications|
|production isolation||separate production topologies|
|transaction functions||primary compute group|
In the descriptions below, each of the topologies builds on the previous one. If your system doesn't have all these objectives, simply omit the elements that are not relevant to you.
The Solo topology provides an inexpensive way to access Datomic's full programming model for development, testing, and personal projects. When you reach the limit of Solo's compute capacity, or you want High Availability, you can upgrade to Production.
The developer workflow for a Solo system serving a web application is shown below. You can use the client API to develop directly against the system, and can install application code via commit+push+deploy.
The smallest Production topology setup provides
- the entire semantic model of Datomic
- high availability
- capacity for large databases
The developer workflow in a basic Production system is the same as with Solo:
Writing an application takes place in multiple overlapping stages, e.g. development, testing, staging, CI, and production.
In Datomic, you can create a separate query group for each development stage, as shown in the diagram below:
- When you create each new query group, you will have a chance to set its associated application name. Choose the same name used by the development system! All the resources associated with the same application share a color (green) in the diagram.
- You can then deploy a different revision of the application code to each group. In the diagram production is running stable r53, while the other query groups are trying out newer versions.
- You can use parameters to let each group (and each developer client) work against different databases.
Note that each query group can be independently sized and scaled to balance cost with operational requirements. For example, the diagram shows the non-production stages using smaller t2.medium instances in an Auto Scaling Group limited to one instance per group.
As your application volume increases, you may want to increase the query and cache capacity of your system. With Datomic, you can create query groups that elastically autoscale your entire application as needed.
With this approach:
- Create a new query group with the same application name used by the original system.
- Direct production traffic to this query group, instead of the primary compute group. (Though you are no longer directing traffic to the primary compute group, Datomic will automatically use it for transactions, indexing, and caching).
Note that you do not have to worry about separately scaling database reads, caches, and application processing–the query group scales all three.
If your application serves more than one kind of load, you can create multiple query groups (not shown in the diagram) to isolate the loads from each other, and to give each load its own working set cache.
Systems are often composed of more than one application. Each application can have its own:
- computational requirements
- cacheable working set
Imagine that your system began as a web application, and now you want to perform some interactive analysis on the data you have accumulated over several years. The analysis code is totally different from your web application, and you certainly don't want the analysis code to compete with the web application for compute resources. On the other hand, you would like for your analysis code work against live, up-to-the-minute data. To accomplish this:
- Create a new analysis query group, but this time, choose a new application name.
That's it! You now have a new deployment target, and a new query group to run your new application. The diagram below shows the second (analysis) application in yellow:
As with all query groups, you have an independent choice about size and scale. The diagram shows the analysis group having a larger i3.xlarge instance to support larger queries, but only a single instance since the analysis team does not share the web application's requirement for HA.
As with the original application, you could of course make additional query groups (not shown in the diagram) for other stages of the analysis application, e.g. dev/CI/production.
The are many reasons that you might want to totally isolate production from all other stages, for example:
- production may be operated by a different team, and secured by different IAM policy
- production may have sensitive data that developers shoud not access
To accomplish this, simply create a second production topology stack. When you create the second stack and its query groups (if any), make sure that you select the application name(s) you used in the first system:
As you can see in the picture, applications (and therefore reproducible deployments) can span multiple Datomic systems.
The primary compute group of a system performs all transactions, while queries can be performed by any compute group (primary or query).
This puts custom transaction functions in a special category: because they run inside a transaction, they must be deployed to the primary compute group.
- In a single-application system, simply include the code for any transaction functions along with your application code.
- In a multi-application system, you can create create a separate application containing the transaction functions needed by all the other applications.
Note: Transaction functions are pure functions, so you do not need to deploy them anywhere for testing. You can simply invoke them as ordinary code in your REPL or test suite.
The picture below shows the latter scenario. In addition to the
analysis applications, there is a new
tx application that
includes all transaction functions. A revision of this
is deployed to the primary compute group of each system.
Datomic runs on only certain EC2 instance sizes, selected and tested to meet the semantic and performance requirements of the system.
- The Solo topology is optimized for price, and runs only t2.small.
- The Production topology requires a fast SSD cache, and therefore runs on i3.large or i3.xlarge. i3.large is the default, and you should increase to i3.xlarge only for very high write volumes.
- Query groups serve more varied purposes, and therefore offer a larger set of choices: t2.medium, m5.large, i3.large, or i3.xlarge. t2.medium is the default, and is likely well-suited to non-production stages. m5.large adds compute and memory, and is a good fit for e.g. production web applications. The i3s add a local SSD cache, and are suitable for larger databases.