Executing Queries

Querying a Database

To query a database, you must first obtain a connection and a database value. The example below shows a simple query using the Synchronous API.

;; get db value
(def db (d/db conn))

;; query
(d/q {:query [:find ?release-name
              :where [_ :release/name ?release-name]]
      :args [db]})

;; result
#{["Osmium"]
  ["Hela roept de akela"]
  ["Ali Baba"]
  ["The Power of the True Love Knot"]
  ...}

The arguments to d/q are documented in the Query Data Reference.

Unification

Unification occurs when a variable appears in more than one data pattern. In the following query, ?e appears twice:

;;which 42-year-olds like what?
[:find ?e ?x
 :where [?e :age 42] [?e :likes ?x]]

Matches for the variable ?e must unify, i.e. represent the same value in every clause in order to satisfy the set of clauses. So a matching ?e must have both :age 42 and :likes for some ?x:

[[fred pizza], [ethel sushi]]

List Form vs. Map Form

Queries written by humans typically are a list, and the various keyword arguments are inferred by position. For example, the query

[:find ?e
 :in $ ?fname ?lname
 :where [?e :user/firstName ?fname]
        [?e :user/lastName ?lname]]

has one :find argument, three :in arguments, and two :where arguments.

While most people find the positional syntax easy to read, it makes extra work for programmatic readers and writers, which have to keep track of what keyword is currently "active" and interpret tokens accordingly. For such cases, queries can be specified more simply as maps. The query above becomes:

{:find [?e]
 :in [$ ?fname ?lname]
 :where [[?e :user/firstName ?fname]
         [?e :user/lastName ?lname]]}

Work with Data Structures, Not Strings

Two features of Datalog queries make them immune to many of the SQL-injection style attacks to which many other DBMSs are vulnerable:

  • Datalog queries are composed of data structures, rather than strings, which obviates the need to do string interpolation, sanitization, escaping, etc.
  • The query API is parameterized with data sources. In many cases, this feature obviates the need to include user-provided data in the query itself. Instead, you can pass user data to a parameterized query as its own data source.

You should avoid building queries by reading in a string that has been built up by concatenation or interpolation. Doing so gives up the security and simplicity of working with native data structures.

The example below shows the contrast between good and bad practice.

;; parameterized query: "The Beatles" is a data source
(def query '[:find ?e
            :in $ ?name
            :where [?e :artist/name ?name]])
(d/q query db "The Beatles")

;; NEVER DO THIS: string interpolation into a hard-coded query
(def query (format "[:find ?e
                     :where [?e :artist/name \"%s\"]]" "The Beatles"))
(d/q query db)

Timeout

You can configure a query to abort if it takes too long to run using Datomic's timeout functionality.

Note: Timeout is approximate. It is meant to protect against long running queries, but is not guaranteed to stop after precisely the duration specified.

The timeout is passed as an argument to the q API.

Clause Order

To minimize the amount work the query engine must do, query authors should put the most restrictive or narrowing :where clauses first, and then proceed on to less restrictive clauses.

As an example, consider the following two queries looking for Paul McCartney's releases. The first :where clause begins with a data pattern ([?release :release/name ?name]) that is not at all selective, forcing the query engine to consider all the releases in the database:

;; query
[:find [?name ...]
 :in $ ?artist
 :where [?release :release/name ?name]
        [?release :release/artists ?artist]]

;; inputs
db, mccartney

;; result
["McCartney" "Another Day / Oh Woman Oh Why" "Ram"]

The following equivalent query reorders the :where clause, leading with a much more selective pattern ([?release :release/artists ?artist]) that is limited in this context to the single ?artist passed in.

;; query
[:find [?name ...]
 :in $ ?artist
 :where [?release :release/artists ?artist]
        [?release :release/name ?name]]

;; inputs and result same as above

The second query runs 50 times faster on the mbrainz dataset.

Query Caching

Datomic processes maintain an in-memory cache of parsed query representations. Caching is based on equality of the query argument to q. To take advantage of caching, programs should

  • Use parameterized queries (that is, queries with multiple inputs) instead of building dynamic queries.
  • When building dynamic queries, use a canonical approach to naming and ordering such that equivalent queries will be equal.

In the example below, the parameterized query for artists will be cached on first use and can be reused any number of times:

(def query '[:find ?e
            :in $ ?name
            :where [?e :artist/name ?name]])

;; first use compiles and caches the query
(d/q query db "The Beatles")

;; subsequent uses find query in the cache
(d/q query db "The Who")

A semantically equivalent query with different names will be separately compiled and cached:

;; not an identical query, ?artist-name instead of ?name
(def query '[:find ?e
             :in $ ?artist-name
             :where [?e :artist/name ?artist-name]])