Executing Queries

Querying a database

To query a database, we must first get a value of the database from the connection:

Database db = conn.db();

A Database is an immutable value, and will never change. If we use db for several queries we will know the answers are based upon exactly the same data from a single point in time. As we said, the database itself acts as a relation of 5-tuples of

[entity attribute value transaction added?]

The following query finds all release names in the database:

;; query
[:find ?release-name
 :where [_ :release/name ?release-name]
;; inputs
db

=>
#{["Osmium"]
 ["Hela roept de akela"] 
 ["Ali Baba"] 
 ["The Power of the True Love Knot"]
 ...}

q

datomic.client.api/q is the primary entry point for Datomic query.

q Performs the query described by query and args, and returns a collection of tuples.

:query - The query to perform: a map, list, or string.
- :find - specifies the tuples to be returned.
- :with - is optional, and names vars to be kept in the aggregation set but not returned
- :in - is optional. Omitting ':in …' is the same as specifying ':in $'
- :where - limits the result returned
:args - Data sources for the query, e.g. database values retrieved from a call to db, and/or rules.

qseq

qseq is a variant of q that pulls and xforms lazily as you consume query results.

datomic.client.ap/qseq utilizes the same arguments and grammar as q.

qseq is primarily useful when you know in advance that you do not need/want a realized collection. i.e. you are only going to make a single pass (or partial pass) over the result data.

Item transformations such as pull are deferred until the seq is consumed. For queries with pull(s), this results in:

Reduced memory use and the ability to execute larger queries.
Lower latency before the first results are returned.

The returned seq object efficiently supports count.

Unification

Unification occurs when a variable appears in more than one data pattern. In the following query, ?e appears twice:

;;which 42-year-olds like what?
[:find ?e ?x 
 :where [?e :age 42] [?e :likes ?x]]

Matches for the variable ?e must unify, i.e. represent the same value in every clause in order to satisfy the set of clauses. So a matching ?e must have both :age 42 and :likes for some ?x:

[[fred pizza], [ethel sushi]]

List form vs. Map form

Queries written by humans typically are a list, and the various keyword arguments are inferred by position. For example, in the query

[:find ?e
 :in $ ?fname ?lname
 :where [?e :user/firstName ?fname]
[?e :user/lastName ?lname]]

there is one :find argument, three :in arguments, and two :where arguments.

While most people find the positional syntax easy to read, it makes extra work for programmatic readers and writers, which have to keep track of what keyword is currently "active" and interpret tokens accordingly. For such cases, queries can be specified more simply as maps. The query above becomes:

{:find [?e]
 :in [$ ?fname ?lname]
 :where [[?e :user/firstName ?fname]
 [?e :user/lastName ?lname]]}

Work with Data Structures, Not Strings

Two features of Datalog queries make them immune to many of the SQL-injection style attacks to which many other DBMSs are vulnerable:

Datalog queries are composed of data structures, rather than strings, which obviates the need to do string interpolation, sanitation, escaping, etc.
The query API is parameterized with data sources. In many cases, this feature obviates the need to include user-provided data in the query itself. Instead, you can pass user data to a parameterized query as its own data source.

You should avoid building queries by reading in a string that has been built up by concatenation or interpolation. Doing so gives up the security and simplicity of working with native data structures.

Clause order

To minimize the amount work the query engine must do, query authors should put the most restrictive or narrowing :where clauses first, and then proceed on to less restrictive clauses.

As an example, consider the following two queries looking for Paul McCartney's releases. The first :where clause begins with a data pattern ([?release :release/name ?name]) that is not at all selective, forcing the query engine to consider all the releases in the system:

;; query
[:find [?name ...]
 :in $ ?artist
 :where [?release :release/name ?name]
        [?release :release/artists ?artist]]
;; inputs
db, mccartney

=> ["McCartney" "Another Day / Oh Woman Oh Why" "Ram"]

The following equivalent query reorders the :where clause, leading with a much more selective pattern ([?release :release/artists ?artist]) that is limited in this context to the single ?artist passed in.

;; query
[:find [?name ...]
 :in $ ?artist
 :where [?release :release/artists ?artist]
        [?release :release/name ?name]]
;; inputs and result same as above

The second query runs 50 times faster.

Queries and Peer Memory

Since queries run within the Peer with application-local memory, application designers need to consider the memory requirements for their queries.

Datomic's Datalog is set-oriented and eager, and does not spill large results to disk. Queries are designed to be able to run over datasets much larger than memory. However, each intermediate representation step of a query must fit into local memory. Datomic doesn't spool intermediate representations to disk like some server-based RDBMS's.

If you need to work with result sets larger than can fit in memory, you will need to break up the query into smaller pieces whose results can fit in memory. The datoms and index-range APIs are often useful for this.

Query Caching

Datomic processes maintain an in-memory cache of parsed query representations. Caching is based on equality (in the Java .equals sense) of the first argument to q. To take advantage of caching, programs should

Use parameterized queries (i.e. queries with multiple inputs) instead of building dynamic queries.
When building dynamic queries, use a canonical approach to naming and ordering such that equivalent queries will be Java .equals.