Clients and Peers

Overview

There are two ways to consume Datomic On-Prem: with an in-process peer library, or with a remote client to a peer server. If you are trying Datomic for the first time, we recommend that you begin with Datomic dev-local, which the client library in-process. When you are ready to plan an On-Prem system, you can return to this page for guidance on whether to use peers, clients, or both.

Comparison

Area   Peer Peer Server Client
Ops compatible with Datomic Cloud no yes
Ops app processes per instance few many
Ops library requirements many few
Read hot object cache on app start no yes
Read hot memcached on app start yes yes
Read query latency best good
Read reads scale horizontally scale horizontally
Read datalog queries local, sync, eager remote, async, chunked
Read raw index access local, sync, lazy remote, async, chunked
Read flow control up to you automatic
Read cross database joins yes no
Write ACID transactions transactor, async, eager remote, async, eager
Write writes single writer, CAS single writer, CAS
Write flow control automatic automatic
Write transaction ordering in order issued in order acknowledged
Write tempids custom type or string, partitions optional, strings

Note that you can use both peers and clients within the same system. However, the peer and client APIs are different for reasons discussed below.

Operations

The peer library includes the query engine and object cache in the same process as your application, with the following operational implications:

  • Peer processes should typically be large relative to the size of the (possibly virtual) hardware they run on. This amortizes the cost of running the JVM, and allows for large object caches.
  • Peer processes must be JVM-based, and include several Java library dependencies.

Remote clients have a much smaller footprint. Remote clients are a good fit for microservices, as the peer server and its cache can be large and long-lived, while serving application processes that may be small, numerous, and short lived.

Reads

Both peers and peer servers provide horizontal read scaling, which is fundamental to Datomic's architecture. You can increase the number of peers or peer servers as needed to handle additional read load.

Datomic read operations include Datalog query and raw index access. With the peer library, these operations all happen locally in process memory, and the APIs are synchronous. Because of this locality, long-lived peer applications deliver the best possible latency for read operations.

Peer server clients make remote calls for read operations. This differs from using the Peer API in two ways.

  • Client read operations are asynchronous, and do not consume a thread while waiting on another process. (By contrast, peer read operations are synchronous.)
  • Large client API results are delivered a chunk-at-a-time.

Compared to the Peer API, peer server clients introduce a network hop for read operations, increasing latency. On the other hand, the object cache used for client queries lives in a separate process, so it survives a client application restart. This can reduce latency for read operations in applications that are small and short-lived. With both peer servers and peers, Datomic supports integrated memcached so read latencies should be acceptable for the vast majority of applications.

Peer server clients will receive flow control "server busy" errors if they temporarily exceed the capacity of the system. With peer applications, flow control is up to you. You can (and should) implement application-level flow control for peer applications.

Peers can connect to multiple databases, and then issue process-local queries that join across those databases. Clients are designed for a distributed environment where they know nothing about process locality. As a result, client queries can refer only to the database associated with their connection.

Writes

Datomic ensures that transactions are consistent by utilizing a single writer process at any given time, and by using compare-and-swap (CAS) operations against the underlying storage service.

The write model for peers and peer servers is therefore quite similar. The only semantic difference between peer and peer server transactions is in their knowledge of transaction order. Peers know that they have a dedicated connection to a single transacting process, and that transactions will always be processed in the order they were submitted. This permits a pipelining optimization where peers can submit transactions from a single thread as fast as possible, knowing the transactions will be processed in order. Clients do not presume such a dedicated relationship, and so client pipelining code must wait for each transaction to be acknowledged if order is important.

  • Tempids

    Datomic peers support a dedicated tempid structure for new entity ids. This allows for fine-grained control of partitions, which can be used to improve data locality for some kinds of read operations.

    Our experience has been that very few users take advantage of partitions, and that the custom tempid data structure and reader syntax are rarely worth the added complexity. Client tempids are always strings, and partitioning is automatic.

Peer-Only APIs

The following peer functions are not well-suited for a wire protocol, and are not included in the client API:

Peer API Client Alternative
attribute pull
entity, entity-db pull
entid & ident pull

These peer functions are not portable across On-Prem and Cloud, and are not included in the client API:

Peer API Cloud Alternative
gc-storage not needed
squuid not needed
tx report poll

The client API supports only the find-rel form of a query find-spec.