Executing Queries
Day of Datomic Cloud goes over query concepts, with examples on Github.
Querying a Database
In order to query, you must acquire a database
value. To get a database value, you can call db
, passing in a
connection.
(require '[datomic.api :as d]) ;; get db value (def db (d/db conn)) ;; query (d/q '[:find ?release-name :where [_ :release/name ?release-name]] db)
=> #{["Osmium"] ["Hela roept de akela"] ["Ali Baba"] ["The Power of the True Love Knot"] ...}
The arguments to q
are documented in the Query Data Reference.
q
q
is the primary entry point for Datomic query.
q
Performs the query described by query and args, and returns a collection of tuples.
- The query to perform: a map, list, or string. Complete description.
- Data sources for the query, e.g. database values retrieved from a call to db, and/or rules.
qseq
qseq
is a variant of q
that pulls
and xforms lazily as you consume query results.
qseq
utilizes the same arguments and grammar as q.
qseq
is primarily useful when you know in advance that you do not need/want a realized collection.
i.e. you are only going to make a single pass (or partial pass) over the result data.
Item transformations such as pull
are deferred until the seq is consumed. For queries with
pull(s), this results in:
- Reduced memory use and the ability to execute larger queries.
- Lower latency before the first results are returned.
The returned seq object efficiently supports count
.
Unification
Unification occurs when a variable appears in more than one data pattern. In the following query, ?e appears twice:
;;which 42-year-olds like what? [:find ?e ?x :where [?e :age 42] [?e :likes ?x]]
Matches for the variable ?e must unify, i.e. represent the same value in every clause in order to satisfy the set of clauses. So a matching ?e must have both :age 42 and :likes for some ?x:
[[fred pizza], [ethel sushi]]
List Form vs. Map Form
Queries written by humans typically are a list, and the various keyword
arguments are inferred by position. For example, this query has one
:find:
argument, three :in
arguments, and two :where
arguments:
[:find ?e :in $ ?fname ?lname :where [?e :user/firstName ?fname] [?e :user/lastName ?lname]]
While most people find the positional syntax easy to read, it makes extra work for programmatic readers and writers, which have to keep track of what keyword is currently "active" and interpret tokens accordingly. For such cases, queries can be specified more simply as maps. The query above becomes:
{:find [?e] :in [$ ?fname ?lname] :where [[?e :user/firstName ?fname] [?e :user/lastName ?lname]]}
Timeout
Users can protect against long-running queries via Datomic's query timeout
functionality. Datomic will abort a query shortly after its elapsed duration
has exceeded the provided :timeout
threshold.
:timeout
can be provided to query in the Peer API and the 1-arity
version of q in the Client API.
The example below lists all movies in the database by genre, but will likely fail due to the 1msec timeout.
(d/q {:query '[:find ?movie-genre :where [_ :movie/genre ?movie-genre]] :timeout 1 :args [db]})
You will likely see something like ExceptionInfo Datomic Client Timeout clojure.core/ex-info (core.clj:4739)
.
Clause Order
To minimize the amount work the query engine must do, query authors
should put the most selective or narrowing :where
clauses first, and
then proceed on to less selective clauses.
query-stats provides information
about clause selectivity that can be used to properly order the :where
clauses of a query.
As an example, consider the following two queries looking for Paul
McCartney's releases. The first :where
clause begins with a data pattern
([?release :release/name ?name]
) that has very low selectivity
since ?release
nor ?name
have values bound to them, forcing the query
engine to consider any release with some value for :release/name
in the database:
;; query [:find ?name :in $ ?artist :where [?release :release/name ?name] [?release :release/artists ?artist]] ;; inputs db, mccartney
=> [["McCartney"] ["Another Day / Oh Woman Oh Why"] ["Ram"] ...]
The following equivalent query reorders the :where
clauses, leading
with a much more selective pattern ([?release :release/artists
?artist]
) that is limited in this context to the single ?artist
passed in.
;; query [:find ?name :in $ ?artist :where [?release :release/artists ?artist] [?release :release/name ?name]]
=> ;; inputs and result same as above
The second query runs 50 times faster on the mbrainz dataset.
Query Caching
Datomic processes maintain an in-memory cache of parsed query
representations. Caching is based on equality of the query argument to
q
. To take advantage of caching, programs should
- Use parameterized queries (that is, queries with multiple inputs) instead of building dynamic queries.
- When building dynamic queries, use a canonical approach to naming and ordering such that equivalent queries will be structurally equal.
In the example below, the parameterized query for artists will be cached on first use and can be reused any number of times:
(def query '[:find ?e :in $ ?name :where [?e :artist/name ?name]]) ;; first use compiles and caches the query plan (d/q query db "The Beatles") ;; subsequent uses find query plan in the cache (d/q query db "The Who")
A semantically equivalent query with different variable names will be separately compiled and cached:
;; not an identical query, ?artist-name instead of ?name (def query '[:find ?e :in $ ?artist-name :where [?e :artist/name ?artist-name]])