Clients and Peers
Overview
There are two ways to consume Datomic On-Prem: with an in-process peer library, or with a remote client to a peer server. If you are trying Datomic for the first time, we recommend that you begin with Datomic dev-local, which uses the client library in-process. When you are ready to plan an On-Prem system, you can return to this page for guidance on whether to use peers, clients, or both.
Comparison
Area | Peer | Peer Server Client | |
---|---|---|---|
Ops | compatible with Datomic Cloud | no | yes |
Ops | app processes per instance | few | many |
Ops | library requirements | many | few |
Read | hot object cache on app start | no | yes |
Read | hot memcached on app start | yes | yes |
Read | query latency | best | good |
Read | reads | scale horizontally | scale horizontally |
Read | datalog queries | local, sync, eager | remote, async, chunked |
Read | raw index access | local, sync, lazy | remote, async, chunked |
Read | flow control | up to you | automatic |
Read | cross database joins | yes | no |
Write | ACID transactions | transactor, async, eager | remote, async, eager |
Write | writes | single writer, CAS | single writer, CAS |
Write | flow control | automatic | automatic |
Write | transaction ordering | in order issued | in order acknowledged |
Write | tempids | custom type or string, partitions | optional, strings |
Note that you can use both peers and clients within the same system. However, the peer and client APIs are different for reasons discussed below.
Operations
The peer library includes the query engine and object cache in the same process as your application, with the following operational implications:
- Peer processes should typically be large relative to the size of the (possibly virtual) hardware they run on. This amortizes the cost of running the JVM, and allows for large object caches.
- Peer processes must be JVM-based, and include several Java library dependencies.
Remote clients have a much smaller footprint. Remote clients are a good fit for microservices, as the peer server and its cache can be large and long-lived, while serving application processes that may be small, numerous, and short lived.
Reads
Both peers and peer servers provide horizontal read scaling, which is fundamental to Datomic's architecture. You can increase the number of peers or peer servers as needed to handle additional read load.
Datomic read operations include Datalog query and raw index access. With the peer library, these operations all happen locally in process memory, and the APIs are synchronous. Because of this locality, long-lived peer applications deliver the best possible latency for read operations.
Peer server clients make remote calls for read operations. This differs from using the Peer API in two ways.
- Client read operations are asynchronous, and do not consume a thread while waiting on another process. (By contrast, peer read operations are synchronous.)
- Large client API results are delivered a chunk-at-a-time.
Compared to the Peer API, peer server clients introduce a network hop for read operations, increasing latency. On the other hand, the object cache used for client queries lives in a separate process, so it survives a client application restart. This can reduce latency for read operations in applications that are small and short-lived. With both peer servers and peers, Datomic supports integrated memcached so read latencies should be acceptable for the vast majority of applications.
Peer server clients will receive flow control "server busy" errors if they temporarily exceed the capacity of the system. With peer applications, flow control is up to you. You can (and should) implement application-level flow control for peer applications.
Peers can connect to multiple databases, and then issue process-local queries that join across those databases. Clients are designed for a distributed environment where they know nothing about process locality. As a result, client queries can refer only to the database associated with their connection.
Writes
Datomic ensures that transactions are consistent by utilizing a single writer process at any given time, and by using compare-and-swap (CAS) operations against the underlying storage service.
The write model for peers and peer servers is therefore quite similar. The only semantic difference between peer and peer server transactions is in their knowledge of transaction order. Peers know that they have a dedicated connection to a single transacting process, and that transactions will always be processed in the order they were submitted. This permits a pipelining optimization where peers can submit transactions from a single thread as fast as possible, knowing the transactions will be processed in order. Clients do not presume such a dedicated relationship, and so client pipelining code must wait for each transaction to be acknowledged if order is important.
- Tempids
Datomic peers support a dedicated tempid structure for new entity ids. This allows for fine-grained control of partitions, which can be used to improve data locality for some kinds of read operations.
Our experience has been that very few users take advantage of partitions, and that the custom tempid data structure and reader syntax are rarely worth the added complexity. Client tempids are always strings, and partitioning is automatic.
Peer-Only APIs
The following peer functions are not well-suited for a wire protocol, and are not included in the client API:
Peer API | Client Alternative |
---|---|
attribute | pull |
entity, entity-db | pull |
entid & ident | pull |
These peer functions are not portable across On-Prem and Cloud, and are not included in the client API:
Peer API | Cloud Alternative |
---|---|
gc-storage | not needed |
squuid | not needed |
tx report | poll |
The client API supports only the find-rel
form of a query find-spec
.