Growing Your System
This page covers topics that will help you improve the operation of Datomic as your system grows:
- Instance Sizes
- Primary Compute Group
- Adding and Scaling a Query Group
- Task-specific Query Groups
- Valcache for Large Datasets
- Scaling Storage
- Understanding Your AWS Bill
- Reducing AWS Costs
- Development Workflow
There is a lot of information here, and we are glad to help you adapt it to your particular needs. If you are in doubt, please ask before making changes.
You can change your Datomic Cloud instance size at any time by updating the CloudFormation template. As the table below shows, moving up the t3 instance size chart provides more memory and more CPUs.
- Additional memory can improve performance of query and transations.
- Increasing the number of CPUs allows Datomic to serve more requests simultaneously, and can also improve the performance of individual queries and transactions.
|1 instance on-demand||$0||$34||$74||$127||$216||$395||$216||$395|
|2 instances on-demand||n/a||$49||$118||$216||$396||$754||$396||$754|
|1 instance reserved||n/a||$28||$62||$104||$172||$306||$172||$306|
|2 instances reserved||n/a||$37||$96||$172||$307||$576||$307||$576|
The CloudFormation template provides settings for AutoScaling and Metrics level which are based on instance size:
|AutoScaling During Update||1||1||2||2||2||2||2|
Primary Compute Group
The primary compute group of a system performs all transactions, while queries can be performed by any compute group (primary or query). The primary compute group should not autoscale, and should run a fixed number of instances based on your transaction load and database count.
For a system with a single primary database or with low total write volume, you should fix the size of the primary compute group at two instances. When you need to scale queries beyond what these two instances can support, add a query group.
If your system serves a high write volume across more than one database, you may want to run more than two instances in your primary compute group. Please contact Datomic support with your specific needs and we can guide you.
Adding and Scaling a Query Group
When you first launch a Datomic system it has only a single compute group, the primary compute group described above. When you are ready to support higher query volume, you can add a query group. To add your first query group:
- Create a new query group with the same application name used by the original system.
- Direct production traffic to this query group instead of the primary compute group.
Datomic will automatically coordinate between query groups and the primary group, using the primary group for transactions and indexing.
Query Group Instance Size
Query group instances should typically be at least as large as the instances in the primary compute group. There are two reasons for this:
- Most systems spend more resources reading than writing.
- Query group instances "follow" the recent transactions in the primary compute group, which consumes both CPU and memory. If the query group instances are underpowered compared to the primary compute group, this following may consume so many resources that performance suffers.
Query Group Instance Count
You can set the minimum and maximum number of instances that will run in the group via the Cloudformation template. There are two reasons to increase the number of instances in a query group.
- Any number of instances greater than one will automatically enable High Availability (HA)
- Each instance will increase the total amount of CPUs and memory available for Datomic queries. This will enable your system to serve a higher volume of queries.
You can configure autoscaling triggers for a query group. These triggers should be based on a metric that measures when work is (or is close to) backing up. For example:
- If your system serves remote Datomic clients, then you can
HttpEndpointThrottled, which counts the number of client requests that were throttled because the system is busy.
- If your system serves web requests, then you can scale on
HttpDirectThrottled, which counts the number of web requests that were throttled, or based on a domain-specific metric that you create.
Task-Specific Query Groups
If usage of your system is homogeneous, one query group is all you ever need. You can simply expand that single query group to match increasing query load. But in many cases, a growing system has more than one audience. In those cases, you can create additional query groups for different use cases. Simply start an additional query group for each distinct use case, and direct traffic for that use case to the query group's unique endpoints.
Tasks specific query groups provide
- the ability to scale resources independently for a task, without competing with other tasks
- the ability to turn off resources when not in use, for e.g. intermittent or one-off tasks
- automatic, tuned-to-the-task caching without the need for configuring storage
A typical example of a multi-query group system is the following:
- an "app" query group that supports a production web application
- an "analytics" query group for live analytics, so that expensive analytics operations do not compete with the web application for resources
- a "support" query group for the operations team to query the system, without interfering with other users
Note that each query group can have its own instansize size and group size, tuned to its availability, latency, and scalability requirements.
Valcache for Large Databases
Datomic Cloud automatically manages a multilayer cache that improves the performance of queries and transactions. If your system makes intensive use of a large database, you can further improve performance by enabling valcache, an optional additional cache tier backed by SSDs on the primary compute group.
Datomic enables valcache automatically when you choose an instance type in the i3 family. Before switching to i3, you must increase your AWS i3 instance limit.
Verify Your i3 Instance Limit
The default AWS limits on i3 instances will prevent Datomic Cloud from starting. Before you switch to an i3 instance type, check and update your i3 instance limit:
- click on Limits in the EC2 Dashboard Menu:
- Scroll down to find the
Running On-Demand All...which includes
iinstances in the instance list.
- Open the Limits Calculator
- Add a number of i3.large instance types that is 2+ and corresponds to how high you plan to scale your system.
- Ensure that your
Current limitis greater than the
vCPUs needed. Ignore the "New limit" value, as this is assumes that you are currently at your vCPU limit.
- If your
Current limitexceeds the number of
vCPUs neededthen click the
Request on-demand limit increaselink under "Options".
- Request the number of vCPUs that match your planned system needs.
- If your
Once complete, submit the form and you will receive confirmation from AWS once your instance limit has been raised
Datomic indexes are redundantly stored in S3 and EFS. These high-reliability, high-availability storages are automatically scaled by AWS, with no user configuration required.
Database logs are stored in a DynamoDB table named "[system]". This table is configured with provisioned throughput and a broad range of autoscaling for both read and write. For most systems, these settings are a good sweet spot: when the system is under low transaction load, autoscaling will turn throughput down, saving money. When load increases, autoscaling will inrease throughput to match.
That said, there are two situations where you may want to increase DynamodDB provisioning for the "[system]" table.
- If you have very bursty transaction load, and your system is intolerant of increased latency before autoscaling kicks in, you can increase the DynamoDB write minimum.
- If your system has very high write volume, and you are seeing regular DynamoDB throttling of writes, you can increase the DynamoDB write maximum.
The Datomic catalog is stored in a DynamoDB table named "[system]-catalog", configured with modest provisioned throughput. Catalog operations are very low volume, you should never need to adjust this table.
Understanding Your AWS Bill
The total cost of running a Datomic system is the sum of the base price and the usage price.
- The base price is the total monthly price of services that are billed per unit time, usually per hour (e.g. EC2 instances) or per month (e.g. KMS keys). A running Datomic system has a well-known base price based on its configuration, regardless of how the system used.
- The usage price is the monthly total prices of services that are billed by some measure of utilization, e.g. data stored, data transferred, or autoscaled DDB provisioning.
All prices below are for us-east-1, and were most recently updated on May 17, 2021.
The base price for a Datomic system combines all the hourly and monthly components of Datomic into a single monthly price. These components are enumerated below:
Each compute group will have a single ALB, with an hourly price of $0.0225, plus an instance hourly price based on the numbers and types of instances you choose:
Basic metrics include all the metrics normally used by operators. Detailed metrics include the basic metrics, plus a number of additional metrics that are useful for Datomic support. The metrics dashboard is enabled whenever you select at least basic metrics. The metrics prices are approximate because AWS charges only for metrics actually recorded, and systems do not necessarily report every possible metric.
The instance sizes table calculates a base price from the information above using the following assumptions:
- a Datomic system with a single compute group
- default metrics level
- no unlimited mode costs (see the next section)
Your AWS bill will show line items for each AWS service separately.
Datomic uses several AWS services that have a pricing model based on use:
- indexes are stored in S3 and EFS
- the log is stored in DynamoDB
- internet access is via API Gateway
- t3 family unlimited mode
While the usage price is specific to your specific usage pattern, there are a few very useful rules of thumb:
- All of the usage-base services used by Datomic participate in the AWS "Always Free" Tier. A system at low utilization (e.g. a system used to interactively develop a prototype) will typically fit entirely in the free tier and have a usage cost of $0.
- The base price will typically by higher than the usage price, until you have a lot of data or high request volume.
- Datomic's t3 instances are configured to run in unlimited mode. If an instance is under continuous high CPU load, there is an additional $0.05 per-instance-hour CPU credit charge. See the AWS unlimited mode documentation for additional details. If your system incurs these CPU credit charges, you should consider switching to the i3 instance family.
Reducing AWS Costs
There are a number of things you can do to reduce the AWS costs for Datomic Cloud:
- Develop and test your code with dev-local, which has no cost.
- Choose the smallest EC2 instance size that can handle your performance requirements.
- If you intend to run a compute group for a long period of time, purchase AWS Reserved Instances.
- If you have a compute group that is used intermittently, turn off the compute instances when they are not in use.
If you plan to support multiple environments with Datomic systems, keep the following points in mind:
- The Datomic system is the unit of AWS IAM access control. If different environments need to have different AWS controls, put them in separate Datomic sysems.
- Within a Datomic system, the compute group is the unit of ion deployment. So a single system can host many applications simultaneously. These applications could serve entirely separate purposes, or or they could be various revisions of the same fuctionality.
- Transaction functions execute only in the primary compute group, so you must deploy them there (not to a query group).
- The CodeDeploy application name is the unit of ion push. If you give different Datomic systems the same application name, they will all be able to deploy from the same push. This is probably what you want if you e.g. test an application in one system and release it in another.
The picture below shows an example deployment designed to meet the following dev workflow objectives:
- Keep production code and data separate from all development work.
- Support two different user bases, webapp and analytics, that have different performance requirements.
- Demo and test code before deploying to production.
- Allow developers to access test data from their laptops.