Memory and Caching
Datomic provides a number of caches, all of which are transparent to application code.
Datomic caches only immutable data, so all caches are valid forever.
Recent datoms are cached in the memory index on the transactor and on all peers. The memory index size is controlled by the memory-index-threshold and memory-index-max settings on the transactor. The actual size of the memory index will rarely rise much above memory-index-threshold, except during data imports when it may reach memory-index-max.
Setting memory index configuration on a peer is meaningless; peers must be able to handle the memory index given to them by the transactor.
The memory index is managed as follows:
On the Transactor
- The transactor builds the memory index from the log before it begins serving the database.
- The transactor updates the memory index after it writes a transaction to the log.
- When an indexing job completes, the transactor drops the portion of the memory index that is no longer needed, as it is now in the durable index.
On the Peer
- A peer builds the memory index from the log before the call to connect returns.
- A peer updates the memory index when notified by the transactor of a transaction.
- A peer drops the portion of the memory index that is no longer needed when notified by the transactor that an index job has completed.
Data retrieved via get (and touch) on the Entity API is cached for the lifetime of the Entity instance.
Datomic processes maintain an LRU Java object cache that contains segments of index or log, typically a few thousand datoms per segment. When a Datomic process needs data, Datomic looks in the object cache first. If the data is unavailable locally, Datomic will read the data from memcached or from storage, and then store the data in the cache.
The object cache size is limited by the object-cache-max property on the transactor and by the datomic.ObjectCacheMax Java system property on the peer.
Unlike the memory index, the object cache can vary (in both configuration and contents) across the different processes in a Datomic system. Because each process maintains its own cache, it will automatically adjust over time to its workload.
Memcached is an optional addition to a Datomic system, providing another level of cache between the in-process object cache and storage. Adding memcached to a system can reduce the load on storage, and can reduce the latency associated with populating the object cache.
You can set up more than one memcached, and each Datomic process can choose via configuration which one(s) it uses. Datomic uses only UUIDs for memcached keys, and so Datomic can coexist with other uses of a memcached installation. Additionally, if you specify more than one memcached server, Datomic will distribute values across all instances, effectively enabling a single cache the size of the sum of all the instances.
When configured to used memcached, a Datomic process will automatically write into memcached any segment it needs that is not already present in memcached. In addition, the transactor (only) will write index and log segments to memcached as the segments are produced.
The datomic.memcachedServers property specifies a fixed list of memcached servers. To change the list of servers, you must change the value of this property and restart the affected peer and transactor processes. If you are using Amazon ElastiCache, a better approach is to use automatic discovery.
Datomic supports SASL authentication to a memcached cluster. If datomic.memcachedUsername and datomic.memcachedPassword (memcached-username and memcached-password in transactor properties) are set, Datomic will use those credentials to authenticate its connection to memcached.
If you are using Cloudwatch to monitor transactors, Datomic will record the following memcached metrics:
- Memcache is the number of read requests to memcached.
- When reading from storage, ReaderMemcachedPutMusec is the latency for successful writes to memcached and ReaderMemcachedPutFailedMusec is the latency for failed writes.
- When writing to storage, WriterMemcachedPutMusec is the latency for successful writes to memcached and WriterMemcachedPutFailedMuSec is the latency for failed writes.
Automatic node discovery
If you are using Amazon ElastiCache, you can take advantage of automatic discovery of cache nodes. Set the memcached property (in transactors) or datomic.memcachedServers property (in peers) to the memcached cluster's configuration endpoint and enable automatic discovery by setting the datomic.memcachedAutoDiscovery parameter to true.
When automatic discovery is enabled, any changes made to the cluster, such as adding or removing a node, will be noticed by Datomic without the need for configuration changes or any kind of intervention. The system will automatically take advantage of new nodes and stop using removed nodes as soon as the changes are observed.
Automatic discovery of Memcached nodes requires Datomic 1.0.6316 or later.
- All forms of Datomic caching are transparent to application code.
- Datomic caching requires no configuration other than the overall size of the object cache. The Capacity Planning docs provide guidance for setting cache sizes.
- The transactor benefits from both the object cache and memcached in the course of ordinary transaction processing. For example, the transactor uses database indexes for both cardinality and uniqueness checks.
- Because the object cache lives in your application's address space, it competes with your application for available memory. Take this into account when planning your object cache and VM RAM settings.
- The independent configuration of the object cache and memcached supports task-specific cache strategies. For example, a production system could have two groups of peers with different memcached clusters, one for OLTP and one for OLAP loads. Meanwhile, a support engineer's machine could connect to the same system using its own local memcached install.