«

Partitions

Partitions

Every Datomic entity is associated with a particular partition. These partitions act as a coarse-grained grouping mechanism for entities, much as file cabinets act as a coarse-grained grouping mechanism for paper files.

The partition is encoded via high bits in the entity ID. Therefore, entities in the same partition are sorted together in Datomic's two E-leading indexes, EAVT and AEVT. If the partitions align with usage patterns, this can improve performance. For example, consider a system with inventory, customers, and order partitions. A query for a particular item in inventory causes an "inventory" segment to be cached, and that segment contains a few thousand other inventory-related facts. If inventory queries tend to be followed by other inventory queries, this locality reduces the number of segments that a peer needs to cache. Conversely, if your application tends to access information about individual customers, it would be advantageous to put all of each particular customer's data in one partition and spread out your customers across thousands of implicit partitions.

Datomic supports two types of partitions: implicit partitions and named partitions. Both can be used in the same database.

Partitions enable two techniques that are handy in larger systems: partition sharding and new entity scans.

Partitions are strictly a locality optimization. You can ignore partitions completely; if you do, all your domain entities will be in the same (default) partition.

Implicit Partitions

Implicit partitions are currently available only in Datomic Pro Edition.

To facilitate entity grouping when using a multitude of partitions, each Datomic database includes 524288 implicit partitions, which can be accessed by calling implicit-part. Unlike named partitions, implicit partitions require no transaction to create. Like named partitions, each implicit partition corresponds to an entity ID.

Applications can employ algorithmic schemes to group related entities to the same partition. For example, an order processing system could assign a customer to an implicit partition (e.g. via hashing), and then put all of a customer's order and line item entities in that partition. This would improve locality of reference when querying for a customer's orders.

Named Partitions

Named partitions are useful when your domain has a modest number of named categories for which you would like to have locality. In addition, named partitions are used internally by Datomic. Every Datomic database comes with three named partitions:

Partition Purpose
:db.part/db System partition, used for schema
:db.part/tx Transaction partition
:db.part/user User partition, for application entities

Schema entities are automatically placed in the :db.part/db partition.

Transaction entities are automatically placed in the :db.part/tx partition.

Default Partition

The :db.part/user partition is the default partition for entities whenever a partition is not otherwise specified. In Datomic Pro, you can override this default by setting default-partition in the transactor properties file:

default-partition=:my.namespace/my-partition

Installing Named Partitions

To install a new named partition, transact an entity with an ident and reference that new entity as a db.install/partition value of the system entity db.part/db.

For example, the transaction data specifies a partition named :communities:

{:db/ident :communities
 :db.install/_partition :db.part/db}

Partition Assignment

Partition assignment is currently available only in Datomic Pro Edition.

To control partition assignment for tempids, entity maps can include the :db/force-partition or :db/match-partition directives. These directives allow you to request partition assignment for any new entity id created by a transaction.

Because partition directives are separate maps, they encourage keeping partition policy separate from entity data, and make it easier to write supporting code that manages partitions across many transactions.

:db/force-partition

The value of :db/force-partition is a map from tempids to desired partitions. Named partitions are specified by a keyword, such as :orders below:

[{:db/id "order"
  :order/lineItems ["item1", "item2"]
 {:db/id "item1"
  :lineItem/product chocolate
  :lineItem/quantity 1}
 {:db/id "item2"
  :lineItem/product whisky
  :lineItem/quantity 2}
  {:db/force-partition {"order" :orders
                        "item1" :orders
                        "item2" :orders}}]

Implicit partitions are specified by entity id, found by calling db.part or implicit-part. The example below transacts "customer" and "order" into implicit partition 343:

[{:db/id "customer"
  :customer/order "order"}
 {:db/id "order"
  :order/id "55555"}
 {:db/force-partition {"customer" (d/implicit-part 343)
                       "order" (d/implicit-part 343)}}]

:db/match-partition

Sometimes it is more convenient to request a partition via an entity in the same partition, rather than the partition id. The value of :db/match-partition is a map from tempids to entities in desired partitions.

The following example transacts line items into the same partition (implicit part 500) as their containing order:

[{:db/id "order"
  :order/lineItems ["item1", "item2"]}
 {:db/id "item1"
  :lineItem/product chocolate
  :lineItem/quantity 1}
 {:db/id "item2"
  :lineItem/product whisky
  :lineItem/quantity 2}
 {:db/force-partition {"order" (d/implicit-part 500)}
  :db/match-partition {"item1" "order"
                       "item2" "order"}}]

Partition Sharding

Partition-based sharding is a technique for distributing entity-scoped read load to application servers. When dividing entities mechanically into N different groups at creation time, grouping together entities that are likely to be accessed together, it is possible to assign each group a different partition in Datomic. Then, at query time, direct entity-related queries to different application servers based on their partition. This causes the application servers to cache disjoint subsets of the total index, substantially increasing the fraction of the index that can fit into application server memory.

Partition-based sharding is an optimization only. Entities in any partition can still be reached by any peer.

New Entity Scans

Because Datomic stores information about time, it is easy to ask questions that apply only to recently created entities. For example, it is trivial in Datomic to run a batch job over everything that changed today, by grabbing the txRange of the log starting at midnight.

Such a scan does, however, have to consider all of today's datoms, filtering out any that are irrelevant to the task at hand.

It is possible to use partitions as a filtering mechanism when designing a system that needs to know about specific new entities (something that wants to respond to new events of a certain type, for example):

  • Assign all the datoms for an event category to the same partition.
  • Instead of using the log, use entidAt to fabricate an event id closest to the chosen start time t.
  • Pass the fabricated entity id to seekDatoms to walk EAVT, walking entities ordered by time of creation and implicitly "filtered" by partition's impact on order.

Partitions in Tempids (Obsolete)

Tempid partitions are obsolete but will continue to be supported. For new applications, use db/force-partition and db/match-partition to assign partitions.

Datomic supports partition assignment inside tempids. Rather than tempid strings, you can create tempids as a special data structure, with a slot for specifying the desired partition.

To assign a partition to a tempid, create a structural tempid by calling d/tempid, passing the partition:

(d/tempid :preferred-customers)

Each call to tempid produces a unique temporary id. If you want two call to tempid to produce the same id, pass a negative number in the range -1 .. -100000 as a second argument. All calls to tempid with the same second argument will return the same id.

Datomic also supports a tempid tagged literal:

#db/id[partition-name value*]