Clients and Peers
There are two ways to consume Datomic: with an in-process peer library, or with a client library. If you are trying Datomic for the first time, we recommend that you begin with a client library. When you are ready to plan a system, you can return to this page for guidance on whether to use peers, clients, or both.
Note that the client library is currently in alpha and subject to change.
|Ops||app processes per instance||few||many|
|Read||hot object cache on app start||no||yes|
|Read||hot memcached on app start||yes||yes|
|Read||reads||scale horizontally||scale horizontally|
|Read||datalog queries||local, sync, eager||peer server, async, chunked|
|Read||raw index access||local, sync, lazy||peer server, async, chunked|
|Read||flow control||up to you||automatic|
|Read||cross database joins||yes||no|
|Write||ACID transactions||transactor, async, eager||remote, async, eager|
|Write||writes||single writer, CAS||single writer, CAS|
|Write||transaction ordering||in order issued||in order acknowledged|
|Misc||tempids||required, custom type, partitions||optional, strings|
Note that you can use both peers and clients within the same system. However, the peer and client APIs are different for reasons discussed below.
The peer library includes the query engine and object cache in the same process as your application, with the following operational implications:
- Peer processes should typically be large relative to the size of the (possibly virtual) hardware they run on. This amortizes the cost of running the JVM, and allows for large object caches.
- Peer processes must be JVM-based, and include several Java library dependencies. Clients have a much smaller footprint.
Clients are a better operational fit for microservices, where application processes may be small, numerous, and short lived.
Both clients and peers provide horizontal read scaling, which is fundamental to Datomic's architecture. You can increase the number of peers or clients as needed to handle additional read load.
Datomic read operations include Datalog query and raw index access. With the peer library, these operations all happen locally in process memory, and the APIs are synchronous. Because of this locality, long-lived peer applications deliver the best possible latency for read operations.
Clients make remote calls for read operations. This makes the Client API different from the Peer API in two ways:
- Client read operations are asynchronous, and do not consume a thread while waiting on another process. (By contrast, peer read operations are synchronous.)
- Large client API results are delivered a chunk-at-a-time.
Compared to the Peer API, the Client API introduces a network hop for read operations, increasing latency. On the other hand, the object cache used for client queries lives in a separate process, so it survives a client application restart. This can reduce latency for read operations in applications that are small and short-lived. With both clients and peers, Datomic supports integrated memcached so read latencies should be acceptable for the vast majority of applications.
Client applications will receive flow control "server busy" errors if they temporarily exceed the capacity of the system. With peer applications, flow control is up to you. You can (and should) implement application-level flow control for peer applications.
Peers can connect to multiple databases, and then issue process-local queries that join across those databases. Clients are designed for a distributed environment where they know nothing about process locality. As a result, client queries can refer only to the database associated with their connection.
Datomic ensures that transactions are consistent by utilizing a single writer process at any given time, and by using compare-and-swap (CAS) operations against the underlying storage service.
The write model for peers and clients is therefore quite similar. The transaction API is:
- remote, connecting to a transactor process
- async, not consuming a thread while the write is in process
- eager, in that the transaction data submitted to and returned by a single transaction must fit entirely in your application process memory
The only semantic difference between peer and client transactions is in their knowledge of transaction order. Peers know that they have a dedicated connection to a single transacting process, and that transactions will always be processed in the order they were submitted. This permits a pipelining optimization where peers can submit transactions from a single thread as fast as possible, knowing the transactions will be processed in order. Clients do not presume such a dedicated relationship, and so client pipelining code must wait for each transaction to be acknowledged if order is important.
Datomic peers use a dedicated data tempid structure for new entity ids. This allows for fine-grained control of partitions, which can be used to improve data locality for some kinds of read operations.
Our experience has been that very few users take advantage of partitions, and that the custom tempid data structure and reader syntax are rarely worth the added complexity. Clients simplify tempids in several ways:
- client tempids are optional
- client tempids are strings, which can be meaningful to the reader
- clients do not provide any control over partitioning, which is automatic
Peers continue to support tempid structures, and in addition they also support the new client tempid capabilities.
Datomic also includes a Peer REST server. With the introduction of client libraries, the REST server is no longer supported for new development, which means that:
- new projects should use the Client API
- the REST server will continue to ship with every new release of Datomic, in order to support existing projects that use it
- only critical bug fixes will be made to the REST server going forward