«

Datomic Queries and Rules

Datomic's query and rules system is an extended form of Datalog. Datalog is a deductive query system, typically consisting of:

  • A database of facts
  • A set of rules for deriving new facts from existing facts
  • a query processor that, given some partial specification of a fact or rule:
    • finds all instances of that specification implied by the database and rules
    • i.e. all the matching facts

Typically a Datalog system would have a global fact database and set of rules. Datomic's query engine instead takes databases (and in fact, many other data sources) and rule sets as inputs.

There is a video introducing Datalog queries.

Examples

Example Data

The examples in this document use the mbrainz 1968-1973 sample database. Download and untar this file using the steps listed in the repository.

edn

Queries, sample data, and results in this document are written in the Extensible Data Notation (edn), which is programming language neutral. In your own programs, you can create data programmatically out of basic language data types, e.g. Java Strings, Lists, and Maps. Alternatively, you can pass the pattern argument as a serialized edn string.

The ellipsis is used in query results to shows that a large result set has been elided for brevity.

Example Code

You can follow the examples below in Java or Clojure code.

Why Datalog?

Datomic Datalog is simple, declarative, logic-based, and embedded in your application process.

Simple

Datalog is simple. The basic component of Datalog is a clause, which is simply a list that either begins with the name of a rule, or is a data pattern. These clauses can contain variables (symbols beginning with a '?'). The query engine simply finds all combinations of values of the variables that satisfy all of the clauses. There is no complex syntax to learn.

Declarative

Like SQL and other good query languages, Datalog is declarative. That is, you specify what you want to know and not how to find it. This kind of declarative programming is very powerful, and it is a shame that it has been relegated to the database servers and not available to application programmers. Declarative programs are:

  • More evident - it is easier to tell what their purpose is, both for programmers and stakeholders.
  • More readily optimized - the query engine is free to reorder and parallelize operations to a degree not normally taken on by application programs.
  • Simpler - and thus, more robust.

Logic-based

Even SQL, while fundamentally declarative, still includes many operations that go beyond the query itself, like specifying joins explicitly. Because Datalog is based upon logical implication, joins are implicit, and the query engine figures out when they are needed.

Embedded

Datomic's queries are further simplified by the fact that its query engine, and the data, are made available locally. Query languages like SQL are oriented around a client-server model where, in a single conversation, you are going to have to both:

  • Answer your fundamental question, e.g. who bought socks this month.
  • Recover any additional information required for reporting and processing, e.g. what are their names and email addresses.

The latter is not really a query, it is just a mechanical navigation to related information. With Datomic, you don't have to combine decisions about how to render the answers with finding them, leading to simpler queries. Given an entity id found in a query, you can at any time later quickly navigate to any related information, freeing yourself from the complex queries forced by a client-server model.

The Database of Facts

The first obvious input for a query is a Datomic database. It ends up that the data sources processed and returned by Datalog are in fact relations, i.e. sets of tuples. A Datomic database is as a universal relation of datoms, i.e. 5-tuples of the form:

[entity attribute value transaction added?]

Datomic's Datalog is of course able to process these tuples, but is not limited to processing 5-tuples. Queries and rules output relations with tuples of varying arity, and Datomic's query engine can accept as inputs relation-like data in arbitrary collections.

Query Grammar

Syntax Used in Grammar

'' literal
"" string
[] = list or vector
{} = map {k1 v1 ...}
() grouping
| choice
? zero or one
+ one or more

Query

query                      = [find-spec return-map-spec? with-clause? inputs? where-clauses?]
find-spec                  = ':find' (find-rel | find-coll | find-tuple | find-scalar)
return-map                 = (return-keys | return-syms | return-strs)
find-rel                   = find-elem+
find-coll                  = [find-elem '...']
find-scalar                = find-elem '.'
find-tuple                 = [find-elem+]
find-elem                  = (variable | pull-expr | aggregate)
return-keys                = ':keys' symbol+
return-syms                = ':syms' symbol+
return-strs                = ':strs' symbol+
pull-expr                  = ['pull' variable pattern]
pattern                    = (pattern-name | pattern-data-literal)
aggregate                  = [aggregate-fn-name fn-arg+]
fn-arg                     = (variable | constant | src-var)
with-clause                = ':with' variable+
where-clauses              = ':where' clause+
inputs                     = ':in' (src-var | binding | pattern-name | rules-var)+
src-var                    = symbol starting with "$"
variable                   = symbol starting with "?"
rules-var                  = the symbol "%"
plain-symbol               = symbol that does not begin with "$" or "?"
pattern-name               = plain-symbol
and-clause                 = [ 'and' clause+ ]
expression-clause          = (data-pattern | pred-expr | fn-expr | rule-expr)
rule-expr                  = [ src-var? rule-name (variable | constant | '_')+]
not-clause                 = [ src-var? 'not' clause+ ]
not-join-clause            = [ src-var? 'not-join' [variable+] clause+ ]
or-clause                  = [ src-var? 'or' (clause | and-clause)+]
or-join-clause             = [ src-var? 'or-join' rule-vars (clause | and-clause)+ ]
rule-vars                  = [variable+ | ([variable+] variable*)]
clause                     = (not-clause | not-join-clause | or-clause | or-join-clause | expression-clause)
data-pattern               = [ src-var? (variable | constant | '_')+ ]
constant                   = any non-variable data literal
pred-expr                  = [ [pred fn-arg+] ]
fn-expr                    = [ [fn fn-arg+] binding]
binding                    = (bind-scalar | bind-tuple | bind-coll | bind-rel)
bind-scalar                = variable
bind-tuple                 = [ (variable | '_')+]
bind-coll                  = [variable '...']
bind-rel                   = [ [(variable | '_')+] ]

See pattern grammar for the description of the pattern-data-literal rule.

Rules

Note that the rule grammar reuses some terms from the query grammar above.

rule                       = [ [rule-head clause+]+ ]
rule-head                  = [rule-name rule-vars]
rule-name                  = unqualified plain-symbol

Queries

Basics

The basic job of query is, given a set of variables and a set of clauses, find (the set of) all of the (tuples of) variables that satisfy the clauses. The shape of the most basic query looks like this:

[:find variables 
 :where clauses]

Given the following data tuples:

[[sally :age 21] 
 [fred :age 42] 
 [ethel :age 42]
 [fred :likes pizza] 
 [sally :likes opera] 
 [ethel :likes sushi]]

We could perform the query:

[:find ?e 
 :where [?e :age 42]]
=> [[fred], [ethel]]

Invoking a query takes this basic form:

Peer.query(query, inputs...);

The query above has one variable ?e, and will take one input, a collection of tuples with at least three components. This clause [?e :age 42] is called a data clause. A data clause consists of constants and/or variables, and a tuple satisfies a clause if its constants match. Variables in the data pattern are then bound to the corresponding part of the matching tuple. All of this matching happens by position.

Blanks

Sometimes we don't care about certain components of the tuples in a query, but must put something in the clause in order to get to the positions we care about. The underscore symbol _ is blank placeholder, matching anything without binding or unifying.

The query below finds anything that is liked, without caring who does the liking:

;; query
[:find ?x 
 :where [_ :likes ?x]]
=> [[opera] [sushi] [pizza]]

Do not use a dummy variable instead of the blank. This will make the query engine do extra work, tracking binding and unification for a variable that you never intend to use. It will also make human readers do extra work, puzzling out that a variable is intentionally not used.

Implicit Blanks

Notice that while the release names query targets a database, its data pattern contains only three elements, not five. In data patterns, you can always elide any trailing components you don't care about, rather than explicitly padding with blanks:

;; unnecessary trailing blanks
[_ :release/name ?release-name _ _]

Inputs

By default, queries expect a single input, a database whose name is the dollar sign $. Also by default, data patterns refer to a database named $.

Queries can choose to explicitly name their inputs via an :in clause. A fully explicit form of the release names query would look like:

;; query
[:find ?release-name
 :in $
 :where [$ _ :release/name ?release-name]]

;; inputs & result same as previous

Here you can see that the one and only :in argument names the one and only input db, and that the data pattern uses a leading $ to choose the database it matches against.

Multiple Inputs

Most real-world queries are parameterized at runtime with variable bindings. For example, the following query takes two inputs: a database and a scalar variable binding to limit releases to those perfomed by John Lennon.

;; query
[:find ?release-name
 :in $ ?artist-name
 :where [?artist :artist/name ?artist-name]
        [?release :release/artists ?artist]
        [?release :release/name ?release-name]]

;; inputs
db, "John Lennon"
=>
#{["Power to the People"] 
  ["Unfinished Music No. 2: Life With the Lions"] 
  ["Live Peace in Toronto 1969"] 
  ["Live Jam"]
  ...}

Because ?artist-name appears second in the :in clause, it is bound to the second input "John Lennon".

Pattern Inputs

An input can be a pull pattern, which can be named by a symbol in the :in clause, and that name can be used in pull expressions in the :find clause.

The query below binds pattern to the artist's start year and end year.

;; query
'[:find (pull ?e pattern)
  :in $ ?name pattern
  :where [?e :artist/name ?name]]

;; args
[db "The Beatles" [:artist/startYear :artist/endYear]]

;; example in 1-arity form
(d/q {:query '[:find (pull ?e pattern)
               :in $ ?name pattern
               :where [?e :artist/name ?name]]
      :args [db "The Beatles" [:artist/startYear :artist/endYear]]})
=> [[#:artist{:startYear 1957, :endYear 1970}]]

Separation of Concerns

The pull API separates the process of finding entities and acquiring information about the entities. Pull expressions allow you to utilize queries to find entities and return an explicit map with the desired information about each entity.

This example uses songs-by-artist to find all tracks for an artist, then uses different pull patterns to pull different information about the resulting entities.

(def songs-by-artist
  '[:find (pull ?t pattern)
    :in $ pattern ?artist-name
    :where
    [?a :artist/name ?artist-name]
    [?t :track/artists ?a]])

(def track-releases-and-artists
  [:track/name
   {:medium/_tracks
    [{:release/_media
      [{:release/artists [:artist/name]}
       :release/name]}]}])
;; Pull only the :track/name
(d/q songs-by-artist db [:track/name] "Bob Dylan")
=>
  ([#:track{:name "California"}]
   [#:track{:name "Grasshoppers in My Pillow"}]
   [#:track{:name "Baby Please Don't Go"}]
   [#:track{:name "Man of Constant Sorrow"}]
   [#:track{:name "Only a Hobo"}]
  ...)
;; Use a different pull pattern to get the track name, the release name, and the artists on the release.
(d/q songs-by-artist db track-releases-and-artists "Bob Dylan")
=>
([{:track/name "California",
   :medium/_tracks
   #:release{:_media #:release{:artists [#:artist{:name "Bob Dylan"}], :name "A Rare Batch of Little White Wonder"}}}]
 [{:track/name "Grasshoppers in My Pillow",
   :medium/_tracks
   #:release{:_media #:release{:artists [#:artist{:name "Bob Dylan"}], :name "A Rare Batch of Little White Wonder"}}}]
 [{:track/name "Baby Please Don't Go",
   :medium/_tracks
   #:release{:_media #:release{:artists [#:artist{:name "Bob Dylan"}], :name "A Rare Batch of Little White Wonder"}}}]
 [{:track/name "Man of Constant Sorrow",
   :medium/_tracks
   #:release{:_media #:release{:artists [#:artist{:name "Bob Dylan"}], :name "A Rare Batch of Little White Wonder"}}}]
 [{:track/name "Only a Hobo",
   :medium/_tracks
   #:release{:_media #:release{:artists [#:artist{:name "Bob Dylan"}], :name "A Rare Batch of Little White Wonder"}}}]
 ...)

Bindings

A variable name like ?artist-name is the simplest kind of binding, to a single scalar. Other input shapes can be bound as follows:

Binding Form Binds
?a scalar
[?a ?b] tuple
[?a …] collection
[ [?a ?b ] ] relation

Tuple Binding

A tuple binding binds a set of variables to a single value each, passed in as a collection. This can be used to ask "and" questions, i.e. what releases are associated with the artist named John Lennon and named Mind Games?

;; query
[:find ?release
 :in $ [?artist-name ?release-name]
 :where [?artist :artist/name ?artist-name]
        [?release :release/artists ?artist]
        [?release :release/name ?release-name]]

;; inputs
db, ["John Lennon" "Mind Games"]
=>
#{[17592186157686] 
  [17592186157672] 
  [17592186157690] 
  [17592186157658]}

Collection Binding

A collection binding binds a single variable to multiple values passed in as a collection. This can be used to ask "or" questions, i.e. what releases are associated with either Paul McCartney or George Harrison?

;; query
[:find ?release-name
 :in $ [?artist-name ...]
 :where [?artist :artist/name ?artist-name]
        [?release :release/artists ?artist]
        [?release :release/name ?release-name]]

;; inputs
db, ["Paul McCartney" "George Harrison"]
=>
#{["My Sweet Lord"] 
  ["Electronic Sound"]
  ["Give Me Love (Give Me Peace on Earth)"] 
  ["All Things Must Pass"]
  ...}

Relation Binding

A relation binding is fully general, binding multiple variables positionally to a relation (collection of tuples) passed in. This can be used to ask "or" questions involving multiple variables. For example, what releases are associated with either John Lennon's Mind Games or Paul McCartney's Ram?

;; query
[:find ?release
 :in $ [[?artist-name ?release-name]]
 :where [?artist :artist/name ?artist-name]
        [?release :release/artists ?artist]
        [?release :release/name ?release-name]]

;; inputs
db,  [["John Lennon" "Mind Games"] 
      ["Paul McCartney" "Ram"]]

=>
#{[17592186157686] 
  [17592186157672] 
  [17592186157690] 
  [17592186157658] 
  [17592186063566]}

Find Specifications

Where bindings control inputs, find specifications control results.

Find Spec Returns Java Type Returned
:find ?a ?b relation Collection of Lists
:find [?a …] collection Collection
:find [?a ?b] single tuple List
:find ?a . single scalar Scalar Value

The relation find spec is the most common, and the most general. It will return a tuple for each result, with values in each tuple matching the named variables. All of the examples so far have used the relation find spec. The example below finds a relation spec with two variables and returns a relation of 2-tuples:

;; query
[:find ?artist-name ?release-name
 :where [?release :release/name ?release-name]
        [?release :release/artists ?artist]
        [?artist :artist/name ?artist-name]]

;; inputs
db
=>
#{["George Jones" "With Love"] 
  ["Shocking Blue" "Hello Darkness / Pickin' Tomatoes"] 
  ["Junipher Greene" "Friendship"]
  ...}

The collection find spec is useful when you are only interested in a single variable. The form [?release-name …] below returns values for ?release-name, not wrapped in a one-tuple:

;; query
[:find [?release-name ...]
 :in $ ?artist-name
 :where [?artist :artist/name ?artist-name]
        [?release :release/artists ?artist]
        [?release :release/name ?release-name]]

;; inputs
db "John Lennon"
=>
["Power to the People" 
 "Unfinished Music No. 2: Life With the Lions" 
 "Live Peace in Toronto 1969" 
 "Live Jam"
 ...]

The single tuple find spec is useful when you are interested in multiple variables, but expect only a single result. The form [?year ?month ?day] below returns a single triple, not wrapped in a relation.

;; query
[:find [?year ?month ?day]
 :in $ ?name
 :where [?artist :artist/name ?name]
        [?artist :artist/startDay ?day]
        [?artist :artist/startMonth ?month]
        [?artist :artist/startYear ?year]] 

;; inputs
db "John Lennon"
=> [1940 10 9]

The scalar find spec is useful when you want to return a single value of a single variable. The form ?year below returns a single scalar value:

;; query
[:find ?year .
 :in $ ?name
 :where [?artist :artist/name ?name]
        [?artist :artist/startYear ?year]]

;; inputs
db "John Lennon"
=> 1940

Note that the single tuple find spec and the scalar find spec will return only a single value from the query result, even if the result itself has more than one value. These find specs are typically used only when you know in advance that a query will have exactly one result.

Return Maps

Supplying a return-map will cause the query to return maps instead of tuples. Each entry in the :keys, :strs, or :syms clause will become a key mapped to the corresponding item in the :find clause.

keyword symbols become
:keys keyword keys
:strs string keys
:syms symbol keys

In the example below, the :keys artist and release are used to construct a map for each row returned.

;; query
[:find ?artist-name ?release-name
 :keys artist release
 :where [?release :release/name ?release-name]
 [?release :release/artists ?artist]
 [?artist :artist/name ?artist-name]]

;; inputs
db
=>
[{:artist "George Jones" :release "With Love"}
 {:artist "Shocking Blue" :release "Hello Darkness / Pickin' Tomatoes"}
 {:artist "Junipher Greene" :release "Friendship"}
 ...]

Return maps also preserve the order of the :find clause. In particular, return maps

  • implement clojure.lang.Indexed
  • support nth
  • support vector style destructuring

For example, the first result from the previous query can be destructured in two ways:

;; positional destructure
(let [[artist release] (first result)]
  ...)

;; key destructure
(let [{:keys [artist release]} (first result)]
  ...)

Not Clauses

not clauses allow you to express that one or more logic variables inside a query must not satisfy all of a set of predicates. A not clause is written as:

(src-var? 'not' clause+)

and removes already-bound tuples that satisfy the clauses. not clauses target a source named $ unless you specify an explicit src-var.

The following query uses a not clause to find the count of all artists who are not Canadian:

;; query
[:find (count ?eid) .
 :where [?eid :artist/name]
        (not [?eid :artist/country :country/CA])]

;; inputs
db
=> 4538

All variables used in a not clause will unify with the surrounding query. This includes both the arguments to nested expression clauses as well as any bindings made by nested function expressions. Datomic will attempt to push the not clause down until all necessary variables are bound, and will throw an exception if that is not possible.

A not-join clause allows you to specify which variables should unify with the surrounding clause; only this list of variables needs binding before the clause can run.

A not-join clause is written as:

(src-var? 'not-join' [var+] clause+)

where var specifies which variables should unify.

In this next query, which returns the number of artists who didn't release an album in 1970, ?release is used only inside the not clause and doesn't need to unify with the outer clause. not-join is used to specify that only ?artist needs unifying.

;; query
[:find (count ?artist) .
       :where [?artist :artist/name]
       (not-join [?artist]
         [?release :release/artists ?artist]
         [?release :release/year 1970])]
;; inputs
db
=> 3263

When more than one clause is supplied to not, you should read the clauses as if they are connected by 'and', just as they are in :where.

The following query counts the number of releases named 'Live at Carnegie Hall' that were not by Bill Withers.

;; query
[:find (count ?r) .
 :where [?r :release/name "Live at Carnegie Hall"]
        (not-join [?r]
          [?r :release/artists ?a]
          [?a :artist/name "Bill Withers"])]

;; inputs
db
=> 2

How Not Clauses Work

One can understand not clauses as if they turn into subqueries where all of the variables and sources unified by the negation are propagated to the subquery. The results of the subquery are removed from the enclosing query via set difference. Note that, because they are implemented using set logic, not clauses can be much more efficient than building your own expression predicate that executes a query, as expression predicates are run on each tuple in turn.

Or Clauses

or clauses allow you to express that one or more logic variables inside a query satisfy at least one of a set of predicates. An or clause is written as:

(src-var? 'or' (clause | and-clause)+)

and constrains the result to tuples that satisfy at least one of the clauses in the or clause. or clauses target a source named $ unless you specify an explicit src-var.

The following query uses an or clause to find the count of all vinyl media by listing the complete set of media that make up vinyl in the or clause:

;; query
[:find (count ?medium) .
       :where (or [?medium :medium/format :medium.format/vinyl7]
                  [?medium :medium/format :medium.format/vinyl10]
                  [?medium :medium/format :medium.format/vinyl12]
                  [?medium :medium/format :medium.format/vinyl])]

;; inputs
db
=> 9219

Inside the or clause, you may use an and clause to specify conjunction. This clause is not available outside of an or clause, since conjunction is the default in other clauses.

The following query uses an and clause inside the or clause to find the number of artists who are either groups or females:

;; query
[:find (count ?artist) .
 :where (or [?artist :artist/type :artist.type/group]
            (and [?artist :artist/type :artist.type/person]
                 [?artist :artist/gender :artist.gender/female]))]

;; inputs
db
=> 2323

All clauses used in an or clause must use the same set of variables, which will unify with the surrounding query. This includes both the arguments to nested expression clauses as well as any bindings made by nested function expressions. Datomic will attempt to push the or clause down until all necessary variables are bound, and will throw an exception if that is not possible.

Or-join Clause

An or-join is similar to an or clause, but it allows you to specify which variables should unify with the surrounding clause; only this list of variables needs binding before the clause can run. The variables specifies which variables should unify.

An or-join clause is written as:

or-join-clause             = [ src-var? 'or-join' [variable+] (clause | and-clause)+ ]

In this query, which returns the number of releases that are either by Canadian artists or released in 1970, ?artist is only used inside the or clause and doesn't need to unify with the outer clause. or-join is used to specify that only ?release needs unifying.

;; query
[:find (count ?release) .
      :where [?release :release/name]
      (or-join [?release]
        (and [?release :release/artists ?artist]
             [?artist :artist/country :country/CA])
        [?release :release/year 1970])]
;; inputs
db
=> 2124

How Or Clauses Work

One can imagine or clauses turn into an invocation of an anonymous rule whose predicates comprise the or clauses. As with rules, src-vars are not currently supported within the clauses of or, but are supported on the or clause as a whole at top level.

Expression Clauses

Expression clauses allow arbitrary Java or Clojure functions to be used inside of Datalog queries. Any functions or methods you use in expression clauses must be pure, i.e. they must be free of side effects and always return the same thing given the same arguments. Expression clauses have one of two basic shapes:

[(predicate ...)]
[(function ...) bindings]

The first item in an expression clause is a list designating a function or method call.

Predicate Expressions

If no bindings are provided, the function is presumed to be a predicate returning a truth value: null and false are treated as false, anything else is treated as true.

In the example below, the built-in expression predicate < limits the results to artists who started before 1600:

;; query
[:find ?name ?year
 :where [?artist :artist/name ?name]
        [?artist :artist/startYear ?year]
        [(< ?year 1600)]]

;; inputs
db
=>
#{["Choir of King's College, Cambridge" 1441] 
  ["Heinrich Schütz" 1585]}

Function Expressions

Functions behave similarly, except that their return values are used not as predicates, but to bind other variables. In the example below, the built-in expression function quot converts track lengths from milliseconds to minutes:

[:find ?track-name ?minutes
 :in $ ?artist-name
 :where [?artist :artist/name ?artist-name]
        [?track :track/artists ?artist]
        [?track :track/duration ?millis]
        [(quot ?millis 60000) ?minutes]
        [?track :track/name ?track-name]]

;; inputs 
db, "John Lennon"
=>
#{["Crippled Inside" 3] 
  ["Working Class Hero" 3] 
  ["Sisters, O Sisters" 3] 
  ["Only People" 3] 
  ...}

Expression clauses do not nest:

;; this query will not work!!!
[:find ?celsius .
 :in ?fahrenheit
 :where [(/ (- ?fahrenheit 32) 1.8) ?celsius]]

Instead, multi-step calculations must be performed with separate expressions:

;; query
[:find ?celsius .
 :in ?fahrenheit
 :where [(- ?fahrenheit 32) ?f-32]
        [(/ ?f-32 1.8) ?celsius]]

;; inputs
212
=> 100.0

Built-in Expression Functions and Predicates

Datomic provides the following built-in expression functions and predicates:

  • Two argument comparison predicates =, !=, <, <=, >, and >=.
  • Two-argument mathematical operators +, -, *, and /.
    • Datomic's / operator works similar to Clojure's / in terms of promotion and contagion with a notable exception: Datomic's / does not return a clojure.lang.Ratio to callers. Instead, it returns a quotient as per quot.
  • All of the functions from the clojure.core namespace of Clojure, except eval.
  • A set of functions and predicates that are aware of Datomic data structures, documented below:

get-else

[(get-else src-var ent attr default) ?val-or-default]

The get-else function takes a database, an entity identifier, a cardinality-one attribute, and a default value. It returns that entity's value for the attribute, or the default value if entity does not have a value.

The query below reports "N/A" whenever an artist's startYear is not in the database:

;; query
[:find ?artist-name ?year
 :in $ [?artist-name ...]
 :where [?artist :artist/name ?artist-name]
        [(get-else $ ?artist :artist/startYear "N/A") ?year]]

;; inputs
db, ["Crosby, Stills & Nash" "Crosby & Nash"]
=>
#{["Crosby, Stills & Nash" 1968] 
  ["Crosby & Nash" "N/A"]}

get-some

[(get-some src-var ent attr+) [?attr ?val]]

The get-some function takes a database, an entity identifier, and one or more cardinality-one attributes, returning a tuple of the entity id and value for the first attribute possessed by the entity.

The query below tries to find a :country/name for an entity, and then falls back to :artist/name:

;; query
[:find [?e ?attr ?name]
 :in $ ?e
 :where [(get-some $ ?e :country/name :artist/name) [?attr ?name]]]

;; inputs
db, :country/US
=> [:country/US 84 "United States"]

ground

[(ground const) binding]

The ground function takes a single argument, which must be a constant, and returns that same argument. Programs that know information at query time should prefer ground over e.g. identity, as the former can be used inside the query engine to enable optimizations.

[(ground [:a :e :i :o :u]) [?vowel ...]]

fulltext

[(fulltext src-var attr search) [[?ent ?val ?tx ?score]]]

The fulltext function takes a database, an attribute, and a search expression, and returns a relation of four-tuples: entity, value, transaction, and score.

The following query finds all the artists whose name includes "Jane":

;; query
[:find ?entity ?name ?tx ?score
 :in $ ?search
 :where [(fulltext $ :artist/name ?search) [[?entity ?name ?tx ?score]]]]

;; inputs
db, "Jane"
=>
#{[17592186047274 "Jane Birkin" 2839 0.625] 
  [17592186046687 "Jane" 2267 1.0] 
  [17592186047500 "Mary Jane Hooper" 3073 0.5]}

missing?

[(missing? src-var ent attr)]

The missing? predicate takes a database, entity, and attribute, and returns true if the entity has no value for attribute in the database.

The following query finds all artists whose start year is not recorded in the database.

;; query
[:find ?name
 :where [?artist :artist/name ?name]
        [(missing? $ ?artist :artist/startYear)]]

;; inputs
db
=> #{["Sigmund Snopek III"] ["De Labanda's"] ["Baby Whale"] ...}

tuple

[(tuple ?a ...) ?tup]

Given one or more values, the tuple function returns a tuple containing each value. See also untuple.

;; query
[:find ?tup
 :in ?a ?b
 :where [(tuple ?a ?b) ?tup]]

;; inputs
1 2

;; result
#{[[1 2]]}

tx-ids

[(tx-ids ?log ?start ?end) [?tx ...]]

Given a database log, start, and end, tx-ids returns a collection of transaction ids. Start and end can be specified as database t, transaction id, or instant in time, and can be nil.

The following query finds transactions from time t 1000 through 1050:

;; query
[:find [?tx ...]
 :in ?log
 :where [(tx-ids ?log 1000 1050) [?tx ...]]]

;; inputs
log
=> [13194139534340 13194139534312 13194139534313 13194139534314]

tx-ids is often used in conjunction with tx-data, to first locate transactions and then the data within those transactions.

tx-data

[(tx-data ?log ?tx) [[?e ?a ?v _ ?op]]]

Given a database log and a transaction id, tx-data returns a collection of the datoms added by that transaction. You should not bind the transaction position of the result, as the transaction is already bound on input.

The following query finds the entities referenced by transaction id

;; query
[:find [?e ...]
 :in ?log ?tx
 :where [(tx-data ?log ?tx) [[?e]]]]

;; inputs
log, 13194139534312
=> [13194139534312 63 0 64 65 66 67 68 69 70 71 ...]

untuple

[(untuple ?tup) [?a ?b]]

Given a tuple, the untuple function can be used to name each element of the tuple. See also tuple.

;; query
[:find ?b
 :in ?tup
 :where [(untuple ?tup) [?a ?b]]]

;; inputs
[1 2]
=> #{[2]}

Calling Java Methods

Java methods can be used as query expression functions and predicates, and can be type hinted for performance. Java code used in this way must be on the Java process classpath.

Calling Static Methods

Java static methods can be called with the (ClassName/methodName …) form. For example, the following code calls System.getProperties, binding property names to ?k and property values to ?v.

;; query
[:find ?k ?v
 :where [(System/getProperties) [[?k ?v]]]]

;; no inputs
=>
#{["java.vendor.url.bug" "https://bugreport.sun.com/bugreport/"] 
  ["sun.cpu.isalist" ""] 
  ["sun.jnu.encoding" "UTF-8"]
  ...}

Calling Instance Methods

Java instance methods can be called with the (.methodName obj …) form. For example, the following code calls String.endsWith:

[(.endsWith ?k "path")]

and could be used to extend the previous example like this:

;; query
[:find ?k ?v
 :where [(System/getProperties) [[?k ?v]]]
        [(.endsWith ?k "version")]]

;; no inputs
=>
#{["java.class.version" "52.0"] 
  ["java.runtime.version" "1.8.0_20-b26"] 
  ["java.version" "1.8.0_20"]
  ...}

Type Hinting for Performance

The current version of Datomic performs reflective lookup for Java interop. You can significantly improve performance by type hinting objects, allowing the query engine to make direct method invocations. Type hints take the form of ^ClassName preceding an argument, so the previous example becomes

[(.endsWith ^String ?k "path")]

Note that type hints outside java.lang will need to be fully qualified, and that complex method signatures may require more than one hint to be unambiguous.

Calling Clojure Functions

Clojure functions can be used as query expression functions and predicates. Clojure code used in this way must be on the Clojure process classpath. The example below uses subs as an expression function to extract prefixes of words

;; query
'[:find [?prefix ...]
  :in [?word ...]
  :where [(subs ?word 0 5) ?prefix]]

;; inputs
["hello" "antidisestablishmentarianism"]
=> ["hello" "antid"]

Function names outside clojure.core need to be fully qualified. Datomic will automatically require the namespace for a query function.

The implicit data source - $

Often you will have only a single, or primary, data source (usually a database). In this case you can call that data source $, and elide it in the data clauses:

[:find ?e :in $ ?age :where [?e :age ?age]]
;;same as
[:find ?e :in $data ?age :where [$data ?e :age ?age]]

Rules

Datomic datalog allows you to package up sets of :where clauses into named rules. These rules make query logic reusable, and also composable, meaning that you can bind portions of a query's logic at query time.

A rule is a named group of clauses that can be plugged into the :where section of your query. For example, here is a rule from the Seattle example dataset that tests whether a community is a twitter feed:

[(twitter? ?c)
 [?c :community/type :community.type/twitter]]

As with transactions and queries, rules are described using data structures. A rule is a list of lists. The first list in the rule is the head. It names the rule and specifies its parameters. The rest of the lists are clauses that make up the body of the rule. In this rule, the name is "twitter", the variable ?c is an input argument, and the body is single data clause testing whether the :community/type attribute of the entity ?c has the value :community.type/twitter.

This rule has no output argument - it is a predicate rule that will evaluate to true or false, indicating whether ?c matches the specified criteria. However, rules with more than one argument can be used to bind output variables that can be subsequently used elsewhere in the query.

[(community-type ?c ?t)
 [?c :community/type ?t]]

In the rule above, we could bind either ?c or ?t at invocation time, and the other variable would be bound to the output of the rule.

We can require that variables need binding at invocation time by enclosing the required variables in a vector or list as the first argument to the rule. If the required variables are not bound, an exception will be thrown. The next example rewrites the previous rule to require ?c:

[(community-type [?c] ?t)
 [?c :community/type ?t]]

Individual rule definitions are combined into a set of rules. A set of rules is simply another list containing some number of rule definitions:

[[(twitter ?c)
  [?c :community/type :community.type/twitter]]]

You have to do two things to use a rule set in a query. First, you have to pass the rule set as an input source and reference it in the :in section of your query using the '%' symbol. Second, you have to invoke one or more rules from the :where section of your query. You do this by adding a rule invocation clause. Rule invocations have this structure:

(rule-name rule-arg*)

A rule invocation is a list containing a rule-name and one or more arguments, either variables or constants, as defined in the rule head. It's idiomatic to use parenthesis instead of square brackets to represent a rule invocation in literal form, because it makes it easier to differentiate from a data clause. However, this is not a requirement.

As with other where clauses, you may specify a database before the rule-name to scope the rule to that database. Databases cannot be used as arguments in a rule.

(src-var rule-name rule-arg*)

Rules with multiple definitions will evaluate them as different logical paths to the same conclusion (i.e. logical OR). Here's a rule, again from the Seattle example, which identifies communities that are "social-media".

[[(social-media ?c)
  [?c :community/type :community.type/twitter]]
 [(social-media ?c)
  [?c :community/type :community.type/facebook-page]]]

The social-media rule has two definitions, one testing whether a community's type is :community.type/twitter and the other testing whether a community's type is :community.type/facebook-page. When a given community value is tested, the social-media rule will be true if either of the definitions is true. In other words, using rules, we can implement logical OR in queries.

In all the examples above, the body of each rule is made up solely of data clauses. However, rules can contain any type of clause: data, expression, or even other rule invocations.

Aggregates

Datomic's aggregate syntax is incorporated in the :find clause:

[:find ?a (min ?b) (max ?b) ?c (sample 12 ?d)
 :where ...]

The list expressions are aggregate expressions. Query variables not in aggregate expressions will group the results and appear intact in the result. Thus, the above query binds ?a ?b ?c ?d, then groups by ?a and ?c, and produces a result for each aggregate expression for each group, yielding 5-tuples.

Control Grouping via :with

Unless otherwise specified, Datomic's datalog returns sets, and you will not see duplicate values. This is often undesirable when producing aggregates. Consider the following query, which attempts to return the total number of heads possessed by a set of mythological monsters:

;; incorrect query
[:find (sum ?heads) .
 :in [[_ ?heads]]]

;; inputs
[["Cerberus" 3]
 ["Medusa" 1]
 ["Cyclops" 1]
 ["Chimera" 1]]
=> 4

The monsters clearly have six total heads, but set logic coalesces Medusa, the Cyclops, and the Chimera together, since each has one head.

The solution to this problem is the :with clause, which considers additional variables when forming the basis set for the query result. The :with variables are then removed, leaving a bag (not a set!) of values available for aggregation.

;; query
[:find (sum ?heads) .
 :with ?monster
 :in [[?monster ?heads]]]

;; inputs
[["Cerberus" 3]
 ["Medusa" 1]
 ["Cyclops" 1]
 ["Chimera" 1]]
=> 6

Aggregates Returning a Single Value

(min ?xs)
(max ?xs)
(count ?xs)
(count-distinct ?xs)
(sum ?xs)
(avg ?xs)
(median ?xs)
(variance ?xs)
(stddev ?xs)

The aggregation functions that return a single value are listed above, and all behave as their names suggest.

  • Min and Max

    The following query finds the smallest and largest track lengths:

    ;; query 
    [:find [(min ?dur) (max ?dur)]
     :where [_ :track/duration ?dur]]
    
    ;; inputs
    db
    
    => [3000 3894000]
    

    min and max support all database types (via comparators), not just numbers.

  • Sum

    The following query uses sum to find the total number of tracks on all media in the database.

    ;; query
    (d/q '[:find (sum ?count) .
           :with ?medium
           :where [?medium :medium/trackCount ?count]]
         db)
    ;; inputs
    db
    
    => 100759
    
  • Counts

    More than one artist can have the same name. The following query uses count to report the total number of artist names, and count-distinct to report the total number of unique artist names.

    ;; query
    [:find (count ?name) (count-distinct ?name)
     :with ?artist
     :where [?artist :artist/name ?name]]
    
    ;; inputs
    db
    
    => [4601 4588]
    

    Note the use of :with so that equal names do not coalesce.

  • Statistics

    Are musicians becoming more verbose when naming songs? The following query reports the median, avg, and stddev of song title lengths (in characters), and includes year in the find set to break out the results by year.

    ;; query
    [:find ?year (median ?namelen) (avg ?namelen) (stddev ?namelen)
     :with ?track
     :where [?track :track/name ?name]
            [(count ?name) ?namelen]
            [?medium :medium/tracks ?track]
            [?release :release/media ?medium]
            [?release :release/year ?year]]
    
    ;; inputs 
    db
    
    =>
    [[1968 16 18.92181098534824 12.898760656290333] 
      [1969 16 18.147895557287608 11.263945894977244] 
      [1970 15 18.007481296758105 12.076103750401026] 
      [1971 15 18.203682039283294 13.715552693168124] 
      [1972 15 17.907170949841063 11.712941060399375] 
      [1973 16 18.19300100438759 12.656827911058622]]
    

Aggregates Returning Collections

(distinct ?xs)
(min n ?xs)
(max n ?xs)
(rand n ?xs)
(sample n ?xs)

Where n is specified, fewer than n items may be returned if not enough items are available.

  • Distinct

    The distinct aggregate returns the set of distinct values in the collection.

    ;; query
    [:find (distinct ?v) .
     :in [?v ...]] 
    
    ;; inputs
    [1 1 2 2 2 3]
    
    => #{1 3 2}
    
  • Min N /Max N

    The min n and max n aggregates return up to n least/greatest items. The following query returns the five shortest and five longest track lengths in the database.

    ;; query
    [:find [(min 5 ?millis) (max 5 ?millis)]
     :where [?track :track/duration ?millis]]
    
    ;; inputs 
    db
    
    =>
    [[3000 4000 5000 6000 7000] 
     [3894000 3407000 2928000 2802000 2775000]]
    
  • Rand N / Sample N

    The rand n aggregate selects exactly n items with potential for duplicates. and the sample n aggregate returns up to n distinct items.

    The following query returns two random and two sampled artist names.

    ;; query
    [:find [(rand 2 ?name) (sample 2 ?name)]
     :where [_ :artist/name ?name]]
    
    ;; inputs
    db
    
    =>
    [("Four Tops" "Ethel McCoy") 
     ["Gábor Szabó" "Zapata"]]
    

Custom Aggregates

You may call an arbitrary Clojure function as an aggregation function as follows:

  • Use the fully qualified name of the function.
  • Load the namespace before using the function.
  • The one and only aggregated variable must be the last argument to the function.
  • Other arguments to the function must be constants in the query.

The aggregated variable will be passed as a partial implementation of java.util.List - only size(), iterator(), and get(i) are implemented.

For example, you might implement your own mode function to calculate the mode as follows:

(defn mode
  [vals]
  (->> (frequencies vals)
       (sort-by (comp - second))
       ffirst))

With mode in hand, you can answer the question "What is the most common release medium length, in tracks?"

;; query
[:find (user/mode ?track-count) .
 :with ?media
 :where [?media :medium/trackCount ?track-count]]

;; inputs
db
=> 2

I was initially surprised by this result until I recalled the time period of the sample data included a huge number of vinyl singles, which by definition have two tracks

Pull Expressions

Pull expressions can used in a :find clause. A pull expression takes the form

(pull ?entity-var pattern)

and adds a value to the result set by applying the Pull API to the entities named by ?entity-var.

NOTE Each variable (?entity-var) can appear in at most one pull expression.

For example, the following query returns the :release/name for all of Led Zeppelin's releases:

;; query
[:find (pull ?e [:release/name])
 :in $ ?artist
 :where [?e :release/artists ?artist]]

;; args
db, led-zeppelin
=>
[[{:release/name "Immigrant Song / Hey Hey What Can I Do"}]
 [{:release/name "Heartbreaker / Bring It On Home"}]
 [{:release/name "Led Zeppelin III"}]
 ...]

The pull expression pattern can also be bound dynamically as an :in parameter to query:

;; query
[:find (pull ?e pattern)
 :in $ ?artist pattern
 :where [?e :release/artists ?artist]]

;; args
db, led-zeppelin, [:release/name]
=> results elided, same as previous example

A pull expression can only be applied to any specific ?entity-var a single time. The following forms are legal:

;; query with valid pull expression
[:find (pull ?e [:release/name])
 :in $ ?artist-name
 :where [?e :release/artists ?a]
        [?a :artist/name ?artist-name]]

;; query with valid pull expression
[:find (pull ?e [:release/name]) (pull ?a [*])
 :in $ ?artist-name
 :where [?e :release/artists ?a]
        [?a :artist/name ?artist-name]]

;; query with valid pull expression
[:find (pull ?e [:release/name :release/artists])
 :in $ ?artist-name
 :where [?e :release/artists ?a]
        [?a :artist/name ?artist-name]]

;; inputs used in each
db, "Led Zeppelin"

But the following expression would be invalid:

;; invalid pull expression in query
[:find (pull ?e [:release/name]) (pull ?e [:release/artists])
 :in $ ?artist-name
 :where [?e :release/artists ?a]
        [?a :artist/name ?artist-name]]

Timeout

You can configure a query to abort if it takes too long to run using Datomic's timeout functionality. Note: timeout is approximate. It is meant to protect against long running queries, but is not guaranteed to stop after precisely the duration specified.

In Java, do this by building a QueryRequest object with a timeout and passing it to Peer.query.

QueryRequest qr = QueryRequest.create(query, inputs...).timeout(1000);
Peer.query(qr);

Here, we are creating a QueryRequest by calling QueryRequest.create and passing a query and inputs as described in Peer.query. Then, on the QueryRequest object, call timeout passing timeoutMsec in milliseconds.

In Clojure, do this by passing a query-map to the new query function.

(d/query {:query query :args args :timeout timeout-in-milliseconds})

Here, we are passing a map where query is in the same format as in q, args is in the same format as inputs in q, and an optional timeout in milliseconds.

Limitations

Resolving Entity Identifiers in V Position

Datomic performs automatic resolution of entity identifiers, so that you can generally use entity ids, idents, and lookup refs interchangeably.

For example, the following family of queries all locate Belgium and return the same results:

;; query
[:find ?artist-name
 :in $ ?country
 :where [?artist :artist/name ?artist-name]
        [?artist :artist/country ?country]]

;; input option 1: lookup ref
db, [:country/name "Belgium"]

;; input option 2: ident
db, :country/BE

;; input option 3: entity id
db, 17592186045516
=>
#{["Wallace Collection"] 
  ["André Brasseur"] 
  ["Arthur Grumiaux"]
  ...}

Highly dynamic queries inhibit Datomic's resolution of entity identifiers in value position. The following query makes the reference attribute a dynamic ?reference input to query.

;; query
'[:find [?artist-name ...]
  :in $ ?country [?reference ...]
  :where [?artist :artist/name ?artist-name]
         [?artist ?reference ?country]]

;; inputs 
db, :country/BE, [:artist/country]
=> []

Since the attribute itself is dynamic, Datomic does not know that the variable ?reference is guaranteed to refer to a reference attribute, and will not perform entity identifier resolution for ?country. Unable to resolve :country/BE, the query returns no results.

There are two options for dealing with dynamic queries such as these.

  • Where possible, make the attribute specification static, as in the first example in this section.
  • Where attributes truly need to be dynamically specified, resolve the entity id yourself. The query below introduces a call to entid to resolve the entity:
;; query
[:find [?artist-name ...]
 :in $ ?country [?reference ...]
 :where [(datomic.api/entid $ ?country) ?country-id]
        [?artist :artist/name ?artist-name]
        [?artist ?reference ?country-id]]

;; inputs
db, :country/BE, [:artist/country]
=>
#{["Wallace Collection"] 
  ["André Brasseur"] 
  ["Arthur Grumiaux"]
  ...}

Note that this ambiguity occurs only with the value component of a datom, which might be a reference or a scalar. Entity identifier resolution is always available for entity, attribute, and transaction, since those components are known to always be entities.

Bytes Limitations

Attributes of type :db.type/bytes cannot be found by value in queries (see bytes limitations).