«

Executing Queries

Day of Datomic Cloud goes over query concepts, with examples on Github.

Querying a Database

In order to query, you must acquire a database value. To get a database value, you can call db, passing in a connection.

(require '[datomic.api :as d])
;; get db value
(def db (d/db conn))

;; query
(d/q '[:find ?release-name
       :where [_ :release/name ?release-name]]
      db)

=>
#{["Osmium"]
  ["Hela roept de akela"]
  ["Ali Baba"]
  ["The Power of the True Love Knot"]
  ...}

The arguments to q are documented in the Query Data Reference.

q

q is the primary entry point for Datomic query.

Peer API | Client API

q Performs the query described by query and args, and returns a collection of tuples.

  • The query to perform: a map, list, or string. Complete description.
    • :find - specifies the tuples to be returned.
    • :with - is optional, and names vars to be kept in the aggregation set but not returned
    • :in - is optional. Omitting ':in …' is the same as specifying ':in $'
    • :where - limits the result returned
  • Data sources for the query, e.g. database values retrieved from a call to db, and/or rules.

qseq

qseq is a variant of q that pulls and xforms lazily as you consume query results.

Peer API | Client API

qseq utilizes the same arguments and grammar as q.

qseq is primarily useful when you know in advance that you do not need/want a realized collection. i.e. you are only going to make a single pass (or partial pass) over the result data.

Item transformations such as pull are deferred until the seq is consumed. For queries with pull(s), this results in:

  • Reduced memory use and the ability to execute larger queries.
  • Lower latency before the first results are returned.

The returned seq object efficiently supports count.

Unification

Unification occurs when a variable appears in more than one data pattern. In the following query, ?e appears twice:

;;which 42-year-olds like what?
[:find ?e ?x
 :where [?e :age 42] [?e :likes ?x]]

Matches for the variable ?e must unify, i.e. represent the same value in every clause in order to satisfy the set of clauses. So a matching ?e must have both :age 42 and :likes for some ?x:

[[fred pizza], [ethel sushi]]

List Form vs. Map Form

Queries written by humans typically are a list, and the various keyword arguments are inferred by position. For example, this query has one :find: argument, three :in arguments, and two :where arguments:

[:find ?e
 :in $ ?fname ?lname
 :where [?e :user/firstName ?fname]
        [?e :user/lastName ?lname]]

While most people find the positional syntax easy to read, it makes extra work for programmatic readers and writers, which have to keep track of what keyword is currently "active" and interpret tokens accordingly. For such cases, queries can be specified more simply as maps. The query above becomes:

{:find [?e]
  :in [$ ?fname ?lname]
  :where [[?e :user/firstName ?fname]
          [?e :user/lastName ?lname]]}

Timeout

Users can protect against long-running queries via Datomic's query timeout functionality. Datomic will abort a query shortly after its elapsed duration has exceeded the provided :timeout threshold.

:timeout can be provided to query in the Peer API and the 1-arity version of q in the Client API.

The example below lists all movies in the database by genre, but will likely fail due to the 1msec timeout.

(d/q {:query '[:find ?movie-genre
               :where [_ :movie/genre ?movie-genre]]
      :timeout 1
      :args [db]})

You will likely see something like ExceptionInfo Datomic Client Timeout clojure.core/ex-info (core.clj:4739).

Clause Order

To minimize the amount work the query engine must do, query authors should put the most selective or narrowing :where clauses first, and then proceed on to less selective clauses.

query-stats provides information about clause selectivity that can be used to properly order the :where clauses of a query.

As an example, consider the following two queries looking for Paul McCartney's releases. The first :where clause begins with a data pattern ([?release :release/name ?name]) that has very low selectivity since ?release nor ?name have values bound to them, forcing the query engine to consider any release with some value for :release/name in the database:

;; query
[:find ?name 
 :in $ ?artist
 :where [?release :release/name ?name]
        [?release :release/artists ?artist]]

;; inputs
db, mccartney

=> [["McCartney"] ["Another Day / Oh Woman Oh Why"] ["Ram"] ...]

The following equivalent query reorders the :where clauses, leading with a much more selective pattern ([?release :release/artists ?artist]) that is limited in this context to the single ?artist passed in.

;; query
[:find ?name 
 :in $ ?artist
 :where [?release :release/artists ?artist]
        [?release :release/name ?name]]
=> ;; inputs and result same as above

The second query runs 50 times faster on the mbrainz dataset.

Query Caching

Datomic processes maintain an in-memory cache of parsed query representations. Caching is based on equality of the query argument to q. To take advantage of caching, programs should

  • Use parameterized queries (that is, queries with multiple inputs) instead of building dynamic queries.
  • When building dynamic queries, use a canonical approach to naming and ordering such that equivalent queries will be structurally equal.

In the example below, the parameterized query for artists will be cached on first use and can be reused any number of times:

(def query '[:find ?e
             :in $ ?name
             :where [?e :artist/name ?name]])

;; first use compiles and caches the query plan
(d/q query db "The Beatles")

;; subsequent uses find query plan in the cache
(d/q query db "The Who")

A semantically equivalent query with different variable names will be separately compiled and cached:

;; not an identical query, ?artist-name instead of ?name
(def query '[:find ?e
             :in $ ?artist-name
             :where [?e :artist/name ?artist-name]])