Datomic Queries and Rules

Datomic's query and rules system is an extended form of Datalog. Datalog is a deductive query system, typically consisting of:

  • A database of facts
  • A set of rules for deriving new facts from existing facts
  • a query processor that, given some partial specification of a fact or rule:
    • finds all instances of that specification implied by the database and rules
    • i.e. all the matching facts

Typically a Datalog system would have a global fact database and set of rules. Datomic's query engine instead takes databases (and in fact, many other data sources) and rule sets as inputs.

There is a video introducing Datalog queries.


Example Data

The examples in this document use the mbrainz 1968-1973 sample database. Download and untar this file using the steps listed in the repository.


Queries, sample data, and results in this document are written in the Extensible Data Notation (edn), which is programming language neutral. In your own programs, you can create data programmatically out of basic language data types, e.g. Java Strings, Lists, and Maps. Alternatively, you can pass the pattern argument as a serialized edn string.

The ellipsis is used in query results to shows that a large result set has been elided for brevity.

Example Code

You can follow the examples below in Java or Clojure code.

Why Datalog?

Datomic Datalog is simple, declarative, logic-based, and embedded in your application process.


Datalog is simple. The basic component of Datalog is a clause, which is simply a list that either begins with the name of a rule, or is a data pattern. These clauses can contain variables (symbols beginning with a '?'). The query engine simply finds all combinations of values of the variables that satisfy all of the clauses. There is no complex syntax to learn.


Like SQL and other good query languages, Datalog is declarative. That is, you specify what you want to know and not how to find it. This kind of declarative programming is very powerful, and it is a shame that it has been relegated to the database servers and not available to application programmers. Declarative programs are:

  • More evident - it is easier to tell what their purpose is, both for programmers and stakeholders.
  • More readily optimized - the query engine is free to reorder and parallelize operations to a degree not normally taken on by application programs.
  • Simpler - and thus, more robust.


Even SQL, while fundamentally declarative, still includes many operations that go beyond the query itself, like specifying joins explicitly. Because Datalog is based upon logical implication, joins are implicit, and the query engine figures out when they are needed.


Datomic's queries are further simplified by the fact that its query engine, and the data, are made available locally. Query languages like SQL are oriented around a client-server model where, in a single conversation, you are going to have to both:

  • Answer your fundamental question, e.g. who bought socks this month.
  • Recover any additional information required for reporting and processing, e.g. what are their names and email addresses.

The latter is not really a query, it is just a mechanical navigation to related information. With Datomic, you don't have to combine decisions about how to render the answers with finding them, leading to simpler queries. Given an entity id found in a query, you can at any time later quickly navigate to any related information, freeing yourself from the complex queries forced by a client-server model.

The Database of Facts

The first obvious input for a query is a Datomic database. It ends up that the data sources processed and returned by Datalog are in fact relations, i.e. sets of tuples. A Datomic database is as a universal relation of datoms, i.e. 5-tuples of the form:

[entity attribute value transaction added?]

Datomic's Datalog is of course able to process these tuples, but is not limited to processing 5-tuples. Queries and rules output relations with tuples of varying arity, and Datomic's query engine can accept as inputs relation-like data in arbitrary collections.

Query Grammar

Syntax Used in Grammar

'' literal
"" string
[] = list or vector
{} = map {k1 v1 ...}
() grouping
| choice
? zero or one
+ one or more


query                      = [find-spec with-clause? inputs? where-clauses?]
find-spec                  = ':find' (find-rel | find-coll | find-tuple | find-scalar)
find-rel                   = find-elem+
find-coll                  = [find-elem '...']
find-scalar                = find-elem '.'
find-tuple                 = [find-elem+]
find-elem                  = (variable | pull-expr | aggregate)
pull-expr                  = ["pull" variable pattern]
pattern                    = (input-name | pattern-data-literal)
aggregate                  = [aggregate-fn-name fn-arg+]
fn-arg                     = (variable | constant | src-var)
with-clause                = ':with' variable+
where-clauses              = ':where' clause+
inputs                     = ':in' (src-var | variable | pattern-var | rules-var)+
src-var                    = symbol starting with "$"
variable                   = symbol starting with "?"
rules-var                  = the symbol "%"
plain-symbol               = symbol that does not begin with "$" or "?"
pattern-var                = plain-symbol
and-clause                 = [ 'and' clause+ ]
expression-clause          = (data-pattern | pred-expr | fn-expr | rule-expr)
rule-expr                  = [ src-var? rule-name variable+]
not-clause                 = [ src-var? 'not' clause+ ]
not-join-clause            = [ src-var? 'not-join [variable+] clause+ ]
or-clause                  = [ src-var? 'or' (clause | and-clause)+]
or-join-clause             = [ src-var? 'or-join' rule-vars (clause | and-clause)+ ]
rule-vars                  = [variable+ | ([variable+] variable*)]
clause                     = (not-clause | not-join-clause | or-clause | or-join-clause | expression-clause)
data-pattern               = [ src-var? (variable | constant | '_')+ ]
constant                   = any non-variable data literal
pred-expr                  = [ src-var? [pred fn-arg+] ]
fn-expr                    = [ src-var? [fn fn-arg]+ binding]
binding                    = (bind-scalar | bind-tuple | bind-coll | bind-rel)
bind-scalar                = variable
bind-tuple                 = [variable+]
bind-coll                  = [variable '...']
bind-rel                   = [ [variable+] ]

See pattern grammar for the description of the pattern-data-literal rule.


Note that the rule grammar reuses some terms from the query grammar above.

rule                       = [ [rule-head clause+]+ ]
rule-head                  = [rule-name rule-vars]
rule-name                  = plain-symbol



The basic job of query is, given a set of variables and a set of clauses, find (the set of) all of the (tuples of) variables that satisfy the clauses. The shape of the most basic query looks like this:

[:find variables 
 :where clauses]

Given the following data tuples:

[[sally :age 21] 
 [fred :age 42] 
 [ethel :age 42]
 [fred :likes pizza] 
 [sally :likes opera] 
 [ethel :likes sushi]]

We could perform the query:

[:find ?e 
 :where [?e :age 42]]

And get this result:

[[fred], [ethel]]

Invoking a query takes this basic form:

Peer.query(query, inputs...);

The query above has one variable ?e, and will take one input, a collection of tuples with at least three components. This clause [?e :age 42] is called a data clause. A data clause consists of constants and/or variables, and a tuple satisfies a clause if its constants match. Variabls in the data pattern are then bound to the corresponding part of the matching tuple. All of this matching happens by position.


Unification occurs when a variable appears in more than one data pattern. In the following query, ?e appears twice:

;;which 42-year-olds like what?
[:find ?e ?x 
 :where [?e :age 42] [?e :likes ?x]]

Matches for the variable ?e must unify, i.e. represent the same value in every clause in order to satisfy the set of clauses. So a matching ?e must have both :age 42 and :likes for some ?x:

[[fred pizza], [ethel sushi]]


Sometimes we don't care about certain components of the tuples in a query, but must put something in the clause in order to get to the positions we care about. The underscore symbol _ is blank placeholder, matching anything without binding or unifying.

The query below finds anything that is liked, without caring who does the liking:

;; query
[:find ?x 
 :where [_ :likes ?x]]

;; result
[[fred] [ethel [sally]]

Do not use a dummy variable instead of the blank. This will make the query engine do extra work, tracking binding an unification for a variable that you never intent to use. It will also make human readers do extra work, puzzling out that variable is intentionally not used.

Querying a database

To query a database, we must first get a value of the database from the connection:

Database db = conn.db();

A Database is an immutable value, and will never change. If we use db for several queries we will know the answers are based upon exactly the same data from a single point in time. As we said, the database itself acts as a relation of 5-tuples of

[entity attribute value transaction added?]

The following query finds all release names in the database:

;; query
[:find ?release-name
 :where [_ :release/name ?release-name]

;; inputs

;; result 
  ["Hela roept de akela"] 
  ["Ali Baba"] 
  ["The Power of the True Love Knot"]

Implicit Blanks

Notice that while the release names query targets a database, its data pattern contains only three elements, not five. In data patterns, you can always elide any trailing components you don't care about, rather than explicitly padding with blanks:

;; unnecessary trailing blanks
[_ :release/name ?release-name _ _]


By default, queries expect a single input, a database whose name is the dollar sign $. Also by default, data patterns refer to a database named $.

Queries can choose to explicitly name their inputs via an :in clause. A fully explicit form of the release names query would look like:

;; query
[:find ?release-name
 :in $
 :where [$ _ :release/name ?release-name]]

;; inputs & result same as previous

Here you can see that the one and only :in argument names the one and only input db, and that the data pattern uses a leading $ to choose the database it matches against.

Multiple inputs

Most real-world queries are parameterized at runtime with variable bindings. For example, the following query takes two inputs: a database and a scalar variable binding to limit releases to those perfomed by John Lennon.

;; query
[:find ?release-name
 :in $ ?artist-name
 :where [?artist :artist/name ?artist-name]
        [?release :release/artists ?artist]
        [?release :release/name ?release-name]]

;; inputs
db, "John Lennon"

;; result
#{["Power to the People"] 
  ["Unfinished Music No. 2: Life With the Lions"] 
  ["Live Peace in Toronto 1969"] 
  ["Live Jam"]

Because /?artist-name/ appears second in the /:in/ clause, it is bound
to the second input "John Lennon".


A variable name like ?artist-name is the simplest kind of binding, to a single scalar. Other input shapes can be bound as follows:

Binding FormBinds
[?a ?b]tuple
[?a …]collection
[ [?a ?b ] ]relation

Tuple Binding

A tuple binding binds a set of variables to a single value each, passed in as a collection. The query below binds both artist name and release name to find the entity ids for releases of John Lennon's Mind Games:

;; query
[:find ?release
 :in $ [?artist-name ?release-name]
 :where [?artist :artist/name ?artist-name]
        [?release :release/artists ?artist]
        [?release :release/name ?release-name]]

;; inputs
db, ["John Lennon" "Mind Games"]

;; result

Collection Binding

A collection binding binds a single variable to multiple values passed in as a collection. This can be used to ask "or" questions, i.e. what releases are associated with either Paul McCartney or George Harrison?

;; query
[:find ?release-name
 :in $ [?artist-name ...]
 :where [?artist :artist/name ?artist-name]
        [?release :release/artists ?artist]
        [?release :release/name ?release-name]]

;; inputs
db, ["Paul McCartney" "George Harrison"]

;; result
#{["My Sweet Lord"] 
  ["Electronic Sound"]
  ["Give Me Love (Give Me Peace on Earth)"] 
  ["All Things Must Pass"]

Relation Binding

A relation binding is fully general, binding multiple variables positionally to a relation (collection of tuples) passed in. This can be used to ask "or" questions involving multiple variables. For example, what releases are associated with either John Lennon's Mind Games or Paul McCartney's Ram?

;; query
[:find ?release
 :in $ [[?artist-name ?release-name]]
 :where [?artist :artist/name ?artist-name]
        [?release :release/artists ?artist]
        [?release :release/name ?release-name]]

;; inputs
db,  [["John Lennon" "Mind Games"] 
      ["Paul McCartney" "Ram"]]

;; result

Find Specifications

Where bindings control inputs, find specifications control results.

Find SpecReturnsJava Type Returned
:find ?a ?brelationCollection of Lists
:find [?a …]collectionCollection
:find [?a ?b]single tupleList
:find ?a .single scalarScalar Value

The relation find spec is the most common, and the most general. It will return a tuple for each result, with values in each tuple matching the named variables. All of the examples so far have used the relation find spec. The example below finds a relation spec with two variables and returns a relation of 2-tuples:

;; query
[:find ?artist-name ?release-name
 :where [?release :release/name ?release-name]
        [?release :release/artists ?artist]
        [?artist :artist/name ?artist-name]]

;; inputs

;; result
#{["George Jones" "With Love"] 
  ["Shocking Blue" "Hello Darkness / Pickin' Tomatoes"] 
  ["Junipher Greene" "Friendship"]

The collection find spec is useful when you are only interested in a single variable. The form [?release-name …] below returns values for ?release-name, not wrapped in a one-tuple:

;; query
[:find [?release-name ...]
 :in $ ?artist-name
 :where [?artist :artist/name ?artist-name]
        [?release :release/artists ?artist]
        [?release :release/name ?release-name]]

;; inputs

;; result
["Power to the People" 
 "Unfinished Music No. 2: Life With the Lions" 
 "Live Peace in Toronto 1969" 
 "Live Jam"

The single tuple find spec is useful when you are interested in multiple variables, but expect only a single result. The form [?year ?month ?day] below returns a single triple, not wrapped in a relation.

;; query
[:find [?year ?month ?day]
 :in $ ?name
 :where [?artist :artist/name ?name]
        [?artist :artist/startDay ?day]
        [?artist :artist/startMonth ?month]
        [?artist :artist/startYear ?year]] 

;; inputs

;; result
[1940 10 9]

The scalar find spec is useful when you want to return a single value of a single variable. The form ?year below returns a single scalar value:

;; query
[:find ?year .
 :in $ ?name
 :where [?artist :artist/name ?name]
        [?artist :artist/startYear ?year]]

;; inputs

;; result

Note that the single tuple find spec and the scalar find spec will return only a single value from the query result, even if the result itself has more than one value. These find specs are typically used only when you know in advance that a query will have exactly one result.

Not Clauses

not clauses allow you to express that one or more logic variables inside a query must not satisfy all of a set of predicates. A not clause is written as:

(src-var? 'not' clause+)

and removes already-bound tuples that satisfy the clauses. not clauses target a source named $ unless you specify an explicit src-var.

The following query uses a not clause to find the count of all artists who are not Canadian:

;; query
[:find (count ?eid) .
 :where [?eid :artist/name]
        (not [?eid :artist/country :country/CA])]

;; inputs

;; result

All variables used in a not clause will unify with the surrounding query. This includes both the arguments to nested expression clauses as well as any bindings made by nested function expressions. Datomic will attempt to push the not clause down until all necessary variables are bound, and will throw an exception if that is not possible.

A not-join clause allows you to specify which variables should unify with the surrounding clause; only this list of variables needs binding before the clause can run.

A not-join clause is written as:

(src-var? 'not-join' [var+] clause+)

where var specifies which variables should unify.

In this next query, which returns the number of artists who didn't release an album in 1970, ?release is used only inside the not clause and doesn't need to unify with the outer clause. not-join is used to specify that only ?artist needs unifying.

;; query
[:find (count ?artist) .
       :where [?artist :artist/name]
       (not-join [?artist]
         [?release :release/artists ?artist]
         [?release :release/year 1970])]
;; inputs

;; result

When more than one clause is supplied to not, you should read the clauses as if they are connected by 'and', just as they are in :where.

The following query counts the number of releases named 'Live at Carnegie Hall' that were not by Bill Withers.

;; query
[:find (count ?r) .
 :where [?r :release/name "Live at Carnegie Hall"]
        (not-join [?r]
          [?r :release/artists ?a]
          [?a :artist/name "Bill Withers"])]

;; inputs

;; result

How Not Clauses Work

One can understand not clauses as if they turn into subqueries where all of the variables and sources unified by the negation are propagated to the subquery. The results of the subquery are removed from the enclosing query via set difference. Note that, because they are implemented using set logic, not clauses can be much more efficient than building your own expression predicate that executes a query, as expression predicates are run on each tuple in turn.

Or Clauses

or clauses allow you to express that one or more logic variables inside a query satisfy at least one of a set of predicates. An or clause is written as:

(src-var? 'or' (clause | and-clause)+)

and constrains the result to tuples that satisfy at least one of the clauses in the or clause. or clauses target a source named $ unless you specify an explicit src-var.

The following query uses an or clause to find the count of all vinyl media by listing the complete set of media that make up vinyl in the or clause:

;; query
[:find (count ?medium) .
       :where (or [?medium :medium/format :medium.format/vinyl7]
                  [?medium :medium/format :medium.format/vinyl10]
                  [?medium :medium/format :medium.format/vinyl12]
                  [?medium :medium/format :medium.format/vinyl])]

;; inputs

;; result

Inside the or clause, you may use an and clause to specify conjunction. This clause is not available outside of an or clause, since conjunction is the default in other clauses.

The following query uses an and clause inside the or clause to find the number of artists who are either groups or females:

;; query
[:find (count ?artist) .
 :where (or [?artist :artist/type :artist.type/group]
            (and [?artist :artist/type :artist.type/person]
                 [?artist :artist/gender :artist.gender/female]))]

;; inputs

;; result

All clauses used in an or clause must use the same set of variables, which will unify with the surrounding query. This includes both the arguments to nested expression clauses as well as any bindings made by nested function expressions. Datomic will attempt to push the or clause down until all necessary variables are bound, and will throw an exception if that is not possible.

An or-join clause allows you to specify which variables should unify with the surrounding clause; only this list of variables needs binding before the clause can run.

An or-join clause is written as:

(src-var? 'or-join' [rule-vars] (clause | and-clause)+)

where rules-vars specifies which variables should unify, and uses the required-vars syntax of rule heads.

In this next query, which returns the number of releases that are either by Canadian artists or released in 1970, ?artist is only used inside the or clause and doesn't need to unify with the outer clause. or-join is used to specify that only ?release needs unifying.

;; query
[:find (count ?release) .
      :where [?release :release/name]
      (or-join [?release]
        (and [?release :release/artists ?artist]
             [?artist :artist/country :country/CA])
        [?release :release/year 1970])]
;; inputs

;; result

How Or Clauses Work

One can imagine or clauses turn into an invocation of an anonymous rule whose predicates comprise the or clauses. As with rules, src-vars are not currently supported within the clauses of or, but are supported on the or clause as a whole at top level.

Expression Clauses

Expression clauses allow arbitrary Java or Clojure functions to be used inside of Datalog queries. Any functions or methods you use in expression clauses must be pure, i.e. they must be free of side effects and always return the same thing given the same arguments. Expression clauses have one of two basic shapes:

[(predicate ...)]
[(function ...) bindings]

The first item in an expression clause is a list designating a function or method call.

Predicate Expressions

If no bindings are provided, the function is presumed to be a predicate returning a truth value: null and false are treated as false, anything else is treated as true.

In the example below, the built-in expression predicate < limits the results to artists who started before 1600:

;; query
[:find ?name ?year
 :where [?artist :artist/name ?name]
        [?artist :artist/startYear ?year]
        [(< ?year 1600)]]

;; inputs

;; result
#{["Choir of King's College, Cambridge" 1441] 
  ["Heinrich Schütz" 1585]}

Function Expressions

Functions behave similarly, except that their return values are used not as predicates, but to bind other variables. In the example below, the built-in expression function quot converts track lengths from milliseconds to minutes:

[:find ?track-name ?minutes
 :in $ ?artist-name
 :where [?artist :artist/name ?artist-name]
        [?track :track/artists ?artist]
        [?track :track/duration ?millis]
        [(quot ?millis 60000) ?minutes]
        [?track :track/name ?track-name]]

;; inputs 
db, "John Lennon"

;; result
#{["Crippled Inside" 3] 
  ["Working Class Hero" 3] 
  ["Sisters, O Sisters" 3] 
  ["Only People" 3] 

Expression clauses do not nest:

;; this query will not work!!!
[:find ?celsius .
 :in ?fahrenheit
 :where [(/ (- ?fahrenheit 32) 1.8) ?celsius]]

Instead, multi-step calculations must be performed with separate expressions:

;; query
[:find ?celsius .
 :in ?fahrenheit
 :where [(- ?fahrenheit 32) ?f-32]
        [(/ ?f-32 1.8) ?celsius]]

;; inputs

;; result

Built-in Expression Functions and Predicates

Datomic provides the following built-in expression functions and predicates:

  • Two argument comparison predicates !=, <, <=, >, and >=.
  • Two-argument mathematical operators +, -, *, and /.
  • All of the functions from the clojure.core namespace of Clojure, except eval.
  • A set of functions and predicates that are aware of Datomic data structures, documented below:


[(get-else src-var ent attr default) ?val-or-default]

The get-else function takes a database, an entity identifier, a cardinality-one attribute, and a default value. It returns that entity's value for the attribute, or the default value if entity does not have a value.

The query below reports "N/A" whenever an artist's startYear is not in the database:

;; query
[:find ?artist-name ?year
 :in $ [?artist-name ...]
 :where [?artist :artist/name ?artist-name]
        [(get-else $ ?artist :artist/startYear "N/A") ?year]]

;; inputs
db, ["Crosby, Stills & Nash" "Crosby & Nash"]

;; result
#{["Crosby, Stills & Nash" 1968] 
  ["Crosby & Nash" "N/A"]}


[(get-some src-var ent attr+) [?attr ?val]]

The get-some function takes a database, an entity identifier, and one or more cardinality-one attributes, returning a tuple of the entity id and value for the first attribute possessed by the entity.

The query below tries to find a :country/name for an entity, and then falls back to :artist/name:

;; query
[:find [?e ?attr ?name]
 :in $ ?e
 :where [(get-some $ ?e :country/name :artist/name) [?attr ?name]]]

;; inputs
db, :country/US

;; result
[:country/US 84 "United States"]


[(ground const) binding]

The ground function takes a single argument, which must be a constant, and returns that same argument. Programs that know information at query time should prefer ground over e.g. identity, as the former can be used inside the query engine to enable optimizations.

[(ground [:a :e :i :o :u]) [?vowel ...]]


[(fulltext src-var attr search) [[?ent ?val ?tx ?score]]]

The fulltext function takes a database, an attribute, and a search expression, and returns a relation of four-tuples: entity, value, transaction, and score.

The following query finds all the artists whose name includes "Jane":

;; query
[:find ?entity ?name ?tx ?score
 :in $ ?search
 :where [(fulltext $ :artist/name ?search) [[?entity ?name ?tx ?score]]]]

;; inputs
db, "Jane"

;; result
#{[17592186047274 "Jane Birkin" 2839 0.625] 
  [17592186046687 "Jane" 2267 1.0] 
  [17592186047500 "Mary Jane Hooper" 3073 0.5]}


[(missing? src-var ent attr)]

The missing? predicate takes a database, entity, and attribute, and returns true if the entity has no value for attribute in the database.

The following query finds all artists whose start year is not recorded in the database.

;; query
[:find ?name
 :where [?artist :artist/name ?name]
        [(missing? $ ?artist :artist/startYear)]]

;; inputs

;; result
#{["Sigmund Snopek III"] ["De Labanda's"] ["Baby Whale"] ...}


[(tx-ids ?log ?start ?end) [?tx ...]]

Given a database log, start, and end, tx-ids returns a collection of transaction ids. Start and end can be specified as database t, transaction id, or instant in time, and can be nil.

The following query finds transactions from time t 1000 through 1050:

;; query
[:find [?tx ...]
 :in ?log
 :where [(tx-ids ?log 1000 1050) [?tx ...]]]

;; inputs

;; result
[13194139534340 13194139534312 13194139534313 13194139534314]

tx-ids is often used in conjunction with tx-data, to first locate transactions and then the data within those transactions.


[(tx-data ?log ?tx) [[?e ?a ?v _ ?op]]]

Given a log and a database, tx-data returns a collection of the datoms added by a transaction. You should not bind the transaction position of the result, as the transaction is already bound on input.

The following query finds the entities referenced by transaction id

;; query
[:find [?e ...]
 :in ?log ?tx
 :where [(tx-data ?log ?tx) [[?e]]]]

;; inputs
log, 13194139534312

;; result
[13194139534312 63 0 64 65 66 67 68 69 70 71 ...]

Calling Java Methods

Java methods can be used as query expression functions and predicates, and can be type hinted for performance. Java code used in this way must be on the Java process classpath.

Calling Static Methods

Java static methods can be called with the (ClassName/methodName …) form. For example, the following code calls System.getProperties, binding property names to ?k and property values to ?v.

;; query
[:find ?k ?v
 :where [(System/getProperties) [[?k ?v]]]]

;; no inputs

;; result
#{["java.vendor.url.bug" "http://bugreport.sun.com/bugreport/"] 
  ["sun.cpu.isalist" ""] 
  ["sun.jnu.encoding" "UTF-8"]

Calling Instance Methods

Java instance methods can be called with the (.methodName obj …) form. For example, the following code calls String.endsWith:

[(.endsWith ?k "path")]

and could be used to extend the previous example like this:

;; query
[:find ?k ?v
 :where [(System/getProperties) [[?k ?v]]]
        [(.endsWith ?k "version")]]

;; no inputs

;; result
#{["java.class.version" "52.0"] 
  ["java.runtime.version" "1.8.0_20-b26"] 
  ["java.version" "1.8.0_20"]

Type Hinting for Performance

The current version of Datomic performs reflective lookup for Java interop. You can significantly improve performance by type hinting objects, allowing the query engine to make direct method invocations. Type hints take the form of ^ClassName preceding an argument, so the previous example becomes

[(.endsWith ^String ?k "path")]

Note that type hints outside java.lang will need to be fully qualified, and that complex method signatures may require more than one hint to be unambiguous.

Calling Clojure Functions

Clojure functions can be used as query expression functions and predicates. Clojure code used in this way must be on the Clojure process classpath. The example below uses subs as an expression function to extract prefixes of words

;; query
'[:find [?prefix ...]
  :in [?word ...]
  :where [(subs ?word 0 5) ?prefix]]

;; inputs
["hello" "antidisestablishmentarianism"]

;; result
["hello" "antid"]

Function names outside clojure.core need to be fully qualified, and their namespaces must be loaded before use in query.

The implicit data source - $

Often you will have only a single, or primary, data source (usually a database). In this case you can call that data source $, and elide it in the data clauses:

[:find ?e :in $ ?age :where [?e :age ?age]]
;;same as
[:find ?e :in $data ?age :where [$data ?e :age ?age]]


Datomic datalog allows you to package up sets of :where clauses into named rules. These rules make query logic reusable, and also composable, meaning that you can bind portions of a query's logic at query time.

A rule is a named group of clauses that can be plugged into the :where section of your query. For example, here is a rule from the Seattle example dataset that tests whether a community is a twitter feed:

[(twitter? ?c)
 [?c :community/type :community.type/twitter]]

As with transactions and queries, rules are described using data structures. A rule is a list of lists. The first list in the rule is the head. It names the rule and specifies its parameters. The rest of the lists are clauses that make up the body of the rule. In this rule, the name is "twitter", the variable ?c is an input argument, and the body is single data clause testing whether the :community/type attribute of the entity ?c has the value :community.type/twitter.

This rule has no output argument - it is a predicate rule that will evaluate to true or false, indicating whether ?c matches the specified criteria. However, rules with more than one argument can be used to bind output variables that can be subsequently used elsewhere in the query.

[(community-type ?c ?t)
 [?c :community/type ?t]]

In the rule above, we could bind either ?c or ?t at invocation time, and the other variable would be bound to the output of the rule.

We can require that variables need binding at invocation time by enclosing the required variables in a vector or list as the first argument to the rule. If the required variables are not bound, an exception will be thrown. The next example rewrites the previous rule to require ?c:

[(community-type [?c] ?t)
 [?c :community/type ?t]]

Individual rule definitions are combined into a set of rules. A set of rules is simply another list containing some number of rule definitions:

[[(twitter ?c)
  [?c :community/type :community.type/twitter]]]

You have to do two things to use a rule set in a query. First, you have to pass the rule set as an input source and reference it in the :in section of your query using the '%' symbol. Second, you have to invoke one or more rules from the :where section of your query. You do this by adding a rule invocation clause. Rule invocations have this structure:

(rule-name rule-arg*)

A rule invocation is a list containing a rule-name and one or more arguments, either variables or constants, as defined in the rule head. It's idiomatic to use parenthesis instead of square brackets to represent a rule invocation in literal form, because it makes it easier to differentiate from a data clause. However, this is not a requirement.

As with other where clauses, you may specify a database before the rule-name to scope the rule to that database. Databases cannot be used as arguments in a rule.

(src-var rule-name rule-arg*)

Rules with multiple definitions will evaluate them as different logical paths to the same conclusion (i.e. logical OR). Here's a rule, again from the Seattle example, which identifies communities that are "social-media".

[[(social-media ?c)
  [?c :community/type :community.type/twitter]]
 [(social-media ?c)
  [?c :community/type :community.type/facebook-page]]]

The social-media rule has two definitions, one testing whether a community's type is :community.type/twitter and the other testing whether a community's type is :community.type/facebook-page. When a given community value is tested, the social-media rule will be true if either of the definitions is true. In other words, using rules, we can implement logical OR in queries.

In all the examples above, the body of each rule is made up solely of data clauses. However, rules can contain any type of clause: data, expression, or even other rule invocations.


Datomic's aggregate syntax is incorporated in the :find clause:

[:find ?a (min ?b) (max ?b) ?c (sample 12 ?d)
 :where ...]

The list expressions are aggregate expressions. Query variables not in aggregate expressions will group the results and appear intact in the result. Thus, the above query binds ?a ?b ?c ?d, then groups by ?a and ?c, and produces a result for each aggregate expression for each group, yielding 5-tuples.

Control Grouping via :with

Unless otherwise specified, Datomic's datalog returns sets, and you will not see duplicate values. This is often undesirable when producing aggregates. Consider the following query, which attempts to return the total number of heads possessed by a set of mythological monsters:

;; incorrect query
[:find (sum ?heads) .
 :in [[_ ?heads]]]

;; inputs
[["Cerberus" 3]
 ["Medusa" 1]
 ["Cyclops" 1]
 ["Chimera" 1]]

;; result

The monsters clearly have six total heads, but set logic coalesces Medusa, the Cyclops, and the Chimera together, since each has one head.

The solution to this problem is the :with clause, which considers additional variables when forming the basis set for the query result. The :with variables are then removed, leaving a bag (not a set!) of values available for aggregation.

;; query
[:find (sum ?heads) .
 :with ?monster
 :in [[?monster ?heads]]]

;; inputs
[["Cerberus" 3]
 ["Medusa" 1]
 ["Cyclops" 1]
 ["Chimera" 1]]

;; result

Aggregates Returning a Single Value

(min ?xs)
(max ?xs)
(count ?xs)
(count-distinct ?xs)
(sum ?xs)
(avg ?xs)
(median ?xs)
(variance ?xs)
(stddev ?xs)

The aggregation functions that return a single value are listed above, and all behave as their names suggest.

  • Min and Max

    The following query finds the smallest and largest track lengths:

    ;; query 
    [:find [(min ?dur) (max ?dur)]
     :where [_ :track/duration ?dur]]
    ;; inputs
    [3000 3894000]

    min and max support all database types (via comparators), not just numbers.

  • Sum

    The following query uses sum to find the total number of tracks on all media in the database.

    ;; query
    (d/q '[:find (sum ?count) .
           :with ?medium
           :where [?medium :medium/trackCount ?count]]
    ;; inputs
    ;; result
  • Counts

    More than one artist can have the same name. The following query uses count to report the total number of artist names, and count-distinct to report the total number of unique artist names.

    ;; query
    [:find (count ?name) (count-distinct ?name)
     :with ?artist
     :where [?artist :artist/name ?name]]
    ;; inputs
    ;; result
    [4601 4588]

    Note the use of :with so that equal names do not coalesce.

  • Statistics

    Are musicians becoming more verbose when naming songs? The following query reports the median, avg, and stddev of song title lengths (in characters), and includes year in the find set to break out the results by year.

    ;; query
    [:find ?year (median ?namelen) (avg ?namelen) (stddev ?namelen)
     :with ?track
     :where [?track :track/name ?name]
            [(count ?name) ?namelen]
            [?medium :medium/tracks ?track]
            [?release :release/media ?medium]
            [?release :release/year ?year]]
    ;; inputs 
    ;; result
    [[1968 16 18.92181098534824 12.898760656290333] 
     [1969 16 18.147895557287608 11.263945894977244] 
     [1970 15 18.007481296758105 12.076103750401026] 
     [1971 15 18.203682039283294 13.715552693168124] 
     [1972 15 17.907170949841063 11.712941060399375] 
     [1973 16 18.19300100438759 12.656827911058622]]

Aggregates Returning Collections

(distinct ?xs)
(min n ?xs)
(max n ?xs)
(rand n ?xs)
(sample n ?xs)

Where n is specified, fewer than n items may be returned if not enough items are available.

  • Distinct

    The distinct aggregate returns the set of distinct values in the collection.

    ;; query
    [:find (distinct ?v) .
     :in [?v ...]] 
    ;; inputs
    [1 1 2 2 2 3]
    ;; result
    #{1 3 2}
  • Min N /Max N

    The min n and max n aggregates return up to n least/greatest items. The following query returns the five shortest and five longest track lengths in the database.

    ;; query
    [:find [(min 5 ?millis) (max 5 ?millis)]
     :where [?track :track/duration ?millis]]
    ;; inputs 
    ;; result
    [[3000 4000 5000 6000 7000] 
     [3894000 3407000 2928000 2802000 2775000]]
  • Rand N / Sample N

    The rand n aggregate selects exactly n items with potential for duplicates. and the sample n aggregate returns up to n distinct items.

    The following query returns two random and two sampled artist names.

    ;; query
    [:find [(rand 2 ?name) (sample 2 ?name)]
     :where [_ :artist/name ?name]]
    ;; inputs
    ;; result
    [("Four Tops" "Ethel McCoy") 
     ["Gábor Szabó" "Zapata"]]

Custom Aggregates

You may call an arbitrary Clojure function as an aggregation function as follows:

  • Use the fully qualified name of the function.
  • Load the namespace before using the function.
  • The one and only aggregated variable must be the last argument to the function.
  • Other arguments to the function must be constants in the query.

The aggregated variable will be passed as a partial implementation of java.util.List - only size(), iterator(), and get(i) are implemented.

For example, you might implement your own mode function to calculate the mode as follows:

(defn mode
  (->> (frequencies vals)
       (sort-by (comp - second))

With mode in hand, you can answer the question "What is the most common release medium length, in tracks?"

;; query
[:find (user/mode ?track-count) .
 :with ?media
 :where [?media :medium/trackCount ?track-count]]

;; inputs

;; result

I was initially surprised by this result until I recalled the time period of the sample data included a huge number of vinyl singles, which by definition have two tracks

Pull Expressions

Pull expressions can used in a :find clause. A pull expression takes the form

(pull ?entity-var pattern)

and adds a value to the result set by applying the Pull API to the entities named by ?entity-var.

For example, the following query returns the :release/name for all of Led Zeppelin's releases:

;; query
[:find (pull ?e [:release/name])
 :in $ ?artist
 :where [?e :release/artists ?artist]]

;; args
db, led-zeppelin

;; results
[[{:release/name "Immigrant Song / Hey Hey What Can I Do"}]
 [{:release/name "Heartbreaker / Bring It On Home"}]
 [{:release/name "Led Zeppelin III"}]

The pull expression pattern can also be bound dynamically as an :in parameter to query:

;; query
[:find (pull ?e pattern)
 :in $ ?artist pattern
 :where [?e :release/artists ?artist]]

;; args
db, led-zeppelin, [:release/name]

;; results elided, same as previous example

Building Queries Programmatically

List form vs. Map form

Queries written by humans typically are a list, and the various keyword arguments are inferred by position. For example, in the query

[:find ?e
 :in $ ?fname ?lname
 :where [?e :user/firstName ?fname]
        [?e :user/lastName ?lname]]

there is one :find argument, three :in arguments, and two :where arguments.

While most people find the positional syntax easy to read, it makes extra work for programmatic readers and writers, which have to keep track of what keyword is currently "active" and interpret tokens accordingly. For such cases, queries can be specified more simply as maps. The query above becomes:

{:find [?e]
 :in [$ ?fname ?lname]
 :where [[?e :user/firstName ?fname]
         [?e :user/lastName ?lname]]}

Work with Data Structures, Not Strings

Two features of Datalog queries make them immune to many of the SQL-injection style attacks to which many other DBMSs are vulnerable:

  • Datalog queries are composed of data structures, rather than strings, which obviates the need to do string interpolation, sanitation, escaping, etc.
  • The query API is parameterized with data sources. In many cases, this feature obviates the need to include user-provided data in the query itself. Instead, you can pass user data to a parameterized query as its own data source.

You should avoid building queries by reading in a string that has been built up by concatenation or interpolation. Doing so gives up the security and simplicity of working with native data structures.


Clause order

To minimize the amount work the query engine must do, query authors should put the most restrictive or narrowing :where clauses first, and then proceed on to less restrictive clauses.

As an example, consider the following two queries looking for Paul McCartney's releases. The first :where clause begins with a data pattern ([?release :release/name ?name]) that is not at all selective, forcing the query engine to consider all the releases in the system:

;; query
[:find [?name ...]
 :in $ ?artist
 :where [?release :release/name ?name]
        [?release :release/artists ?artist]]

;; inputs
db, mccartney

;; result
["McCartney" "Another Day / Oh Woman Oh Why" "Ram"]

The following equivalent query reorders the :where clause, leading with a much more selective pattern ([?release :release/artists ?artist]) that is limited in this context to the single ?artist passed in.

;; query
[:find [?name ...]
 :in $ ?artist
 :where [?release :release/artists ?artist]
        [?release :release/name ?name]]

;; inputs and result same as above

The second query runs 50 times faster.

Queries and Peer Memory

Since queries run within the Peer with application-local memory, application designers need to consider the memory requirements for their queries.

Datomic's Datalog is set-oriented and eager, and does not spill large results to disk. Queries are designed to be able to run over datasets much larger than memory. However, each intermediate representation step of a query must fit into local memory. Datomic doesn't spool intermediate representations to disk like some server-based RDBMS's.

If you need to work with result sets larger than can fit in memory, you will need to break up the query into smaller pieces whose results can fit in memory. The datoms and index-range APIs are often useful for this.

Query Caching

Datomic processes maintain an in-memory cache of parsed query representations. Caching is based on equality (in the Java .equals sense) of the first argument to q. To take advantage of caching, programs should

  • Use parameterized queries (i.e. queries with multiple inputs) instead of building dynamic queries.
  • When building dynamic queries, use a canonical approach to naming and ordering such that equivalent queries will be Java .equals.


Resolving Entity Identifiers in V Position

Datomic performs automatic resolution of entity identifiers, so that you can generally use entity ids, idents, and lookup refs interchangeably.

For example, the following family of queries all locate Belgium and return the same results:

;; query
[:find ?artist-name
 :in $ ?country
 :where [?artist :artist/name ?artist-name]
        [?artist :artist/country ?country]]

;; input option 1: lookup ref
db, [:country/name "Belgium"]

;; input option 2: ident
db, :country/BE

;; input option 3: entity id
db, 17592186045516

;; result
#{["Wallace Collection"] 
  ["André Brasseur"] 
  ["Arthur Grumiaux"]

Highly dynamic queries inhibit Datomic's resolution of entity identifiers in value position. The following query makes the reference attribute a dynamic ?reference input to query.

;; query
'[:find [?artist-name ...]
  :in $ ?country [?reference ...]
  :where [?artist :artist/name ?artist-name]
         [?artist ?reference ?country]]

;; inputs 
db, :country/BE, [:artist/country]

;; result

Since the attribute itself is dynamic, Datomic does not know that the the variable ?reference is guaranteed to refer to a reference attribute, and will not perform entity identifier resolution for ?country. Unable to resolve :country/BE, the query returns no results.

There are two options for dealing with dynamic queries such as these.

  • Where possible, make the attribute specification static, as in the first example in this section.
  • Where attributes truly need to be dynamically specified, resolve the entity id yourself. The query below introduces a call to entid to resolve the entity:
;; query
[:find [?artist-name ...]
 :in $ ?country [?reference ...]
 :where [(datomic.api/entid $ ?country) ?country-id]
        [?artist :artist/name ?artist-name]
        [?artist ?reference ?country-id]]

;; inputs
db, :country/BE, [:artist/country]

;; result
#{["Wallace Collection"] 
  ["André Brasseur"] 
  ["Arthur Grumiaux"]

Note that this ambiguity occurs only with the value component of a datom, which might be a reference or a scalar. Entity identifier resolution is always available for entity, attribute, and transaction, since those components are known to always be entities.