Datomic Queries and Rules

Datomic's query and rules system is an extended form of Datalog. Datalog is a deductive query system, typically consisting of:

  • A database of facts
  • A set of rules for deriving new facts from existing facts
  • a query processor that, given some partial specification of a fact or rule:
    • finds all instances of that specification implied by the database and rules
    • i.e. all the matching facts

Typically a Datalog system would have a global fact database and set of rules. Datomic's query engine instead takes databases (and in fact, many other data sources) and rule sets as inputs.

There is a video introducing Datalog queries.

Why Datalog?

Simplicity

Datalog is simple. The basic component of Datalog is a clause, which is simply a list that either begins with the name of a rule, or is a data pattern. These clauses can contain variables (symbols beginning with a '?'). The query engine simply finds all combinations of values of the variables that satisfy all of the clauses. There is no complex syntax to learn.

Declarative

Like SQL and other good query languages, Datalog is declarative. That is, you specify what you want to know and not how to find it. This kind of declarative programming is very powerful, and it is a shame that it has been relegated to the database servers and not available to application programmers. Declarative programs are:

  • More evident - it is easier to tell what their purpose is, both for programmers and stakeholders.
  • More readily optimized - the query engine is free to reorder and parallelize operations to a degree not normally taken on by application programs.
  • Simpler - and thus, more robust.

Logic-based

Even SQL, while fundamentally declarative, still includes many operations that go beyond the query itself, like specifying joins explicitly. Because Datalog is based upon logical implication, joins are implicit, and the query engine figures out when they are needed.

Embedded

Datomic's queries are further simplified by the fact that its query engine, and the data, are made available locally. Query languages like SQL are oriented around a client-server model where, in a single conversation, you are going to have to both:

  • Answer your fundamental question, e.g. who bought socks this month.
  • Recover any additional information required for reporting and processing, e.g. what are their names and email addresses.

The latter is not really a query, it is just a mechanical navigation to related information. With Datomic, you don't have to combine decisions about how to render the answers with finding them, leading to simpler queries. Given an entity found in a query, you can at any time later quickly navigate to any related information, freeing yourself from the complex queries forced by a client-server model.

The Database of Facts

The first obvious candidate for a query is a Datomic database. It ends up that the data sources processed and returned by Datalog are in fact relations, i.e. sets of tuples. And one way to consider a Datomic database is as a universal relation of tuples of the form:

[entity attribute value transaction]

That is, a Datomic database is a relation where each tuple is a Datom. Datomic's Datalog is of course able to process these tuples, but is not limited to processing 4-tuples. Queries and rules output relations with tuples of varying arity, and Datomic's query engine can accept as inputs relation-like data in ordinary collections etc.

Query Grammar

Syntax Used in Grammar

'' literal
"" string
[] = list or vector
{} = map {k1 v1 ...}
() grouping
| choice
? zero or one
+ one or more

Query

query                      = [find-spec with-clause? inputs? where-clauses?]
find-spec                  = ':find' (find-rel | find-coll | find-tuple | find-scalar)
find-rel                   = find-elem+
find-coll                  = [find-elem '...']
find-scalar                = find-elem '.'
find-tuple                 = [find-elem+]
find-elem                  = (variable | pull-expr | aggregate)
pull-expr                  = ["pull" variable pattern]
pattern                    = (input-name | pattern-data-literal)
aggregate                  = [aggregate-fn-name fn-arg+]
fn-arg                     = (variable | constant)
with-clause                = ':with' variable+
where-clauses              = ':where' clause+
inputs                     = ':in' (db-var | variable | pattern-var | rule-var)+
db-var                     = symbol starting with "$"
variable                   = symbol starting with "?"
rule-var                   = the symbol "%"
plain-symbol               = symbol that does not begin with "$" or "?"
pattern-var                = plain-symbol
clause                     = [ (data-pattern | pred-expr | fn-expr | rule-expr)+ ]
data-pattern               = [ (variable | constant)+ ]
constant                   = any non-variable data literal
pred-expr                  = [ [pred fn-arg+] ]
fn-expr                    = [ [fn fn-arg]+ binding]
binding                    = (bind-scalar | bind-tuple | bind-coll | bind-rel)
bind-scalar                = variable
bind-tuple                 = [variable+]
bind-coll                  = [variable '...']
bind-rel                   = [ [variable+] ]

See pattern grammar for the description of the pattern-data-literal rule.

Rules

Note that the rule grammar reuses some terms from the query grammar above.

rule                       = [ [rule-head clause+]+ ]
rule-head                  = [rule-name variable+]
rule-name                  = plain-symbol
rule-expr                  = [rule-name variable+]

Queries

Basics

The basic job of query is, given a set of variables and a set of clauses, find (the set of) all of the (tuples of) variables that satisfy the clauses. The shape of the most basic query looks like this:

[:find variables :where clauses]

If we had some data that was shaped like this:

[[sally :age 21] [fred :age 42] [ethel :age 42]
 [fred :likes pizza] [sally :likes opera] [ethel :likes sushi]]

We could ask a query like this:

[:find ?e :where [?e :age 42]]

And get this result:

[[fred], [ethel]]

Invoking a query takes this basic form:

Peer.q(query, inputs...);

So our query above has one variable ?e and one clause [?e :age 42], and will take one input, expected to be a set of tuples with at least three components. This first kind of clause is called a data clause. We will by convention be showing data clauses in square brackets and other kinds of clauses in parens, but both designate lists. What does it mean to 'satisfy' a data clause? A data clause consists of constants and/or variables, and a tuple satisfies a clause if its constants match. What about the variables? They get 'bound' to the corresponding part of the matching tuple. All of this matching happens by position.

List form vs. Map form

Queries written by humans typically are a list, and the various keyword arguments are inferred by position. For example, in the query

[:find ?e
 :in $ ?fname ?lname
 :where [?e :user/firstName ?fname]
        [?e :user/lastName ?lname]]

there is one :find argument, three :in arguments, and two :where arguments.

While most people find the positional syntax easy to read, it makes extra work for programmatic readers and writers, which have to keep track of what keyword is currently "active" and interpret tokens accordingly. For such cases, queries can be specified more simply as maps. The query above becomes:

{:find [?e]
 :in [$ ?fname ?lname]
 :where [[?e :user/firstName ?fname]
         [?e :user/lastName ?lname]]}

Unification

We could ask a query like this:

;;which 42-year-olds like what?
[:find ?e ?x :where [?e :age 42] [?e :likes ?x]]

And get this result:

[[fred pizza], [ethel sushi]]

Here we have two data clauses, and both use the variable ?e. This is where logic programming kicks in - when a variable name is used more than once, it must represent the same value in every clause in order to satisfy the set of clauses. Looked at another way, this reuse of ?e causes an implicit self-join on our single data source. All of the values of ?e in a single match are said to unify.

Blanks

Sometimes we don't care about certain components of the tuples in a query, but must put something in the clause in order to get to the positions we care about. You might be tempted to put in some dummy variable in that place, but that will cause the query engine to unify the dummy with any other uses of the same dummy variable. Rather than have to come up with a bunch of unique dummy names, a single placeholder variable '_' can be used. '_' matches anything, but does not unify with itself.

;;what things are liked by anyone?
[:find ?x :where [_ :likes ?x]]

Querying a database

So how do we query the database? First we need to get the value of the database by getting it from the connection:

Database db = conn.db();

This is a true value, it is not going to change on us. If we use db for several queries we will know the answers are based upon exactly the same data from a single point in time. As we said, the database itself acts as a relation of 4-tuples of [entity attribute value transaction]

;;when given a db source, finds the names of all the attributes
[:find ?name :where [_ :db.install/attribute ?a] [?a :db/ident ?name]]

You'll notice that while this query is intended to be used against a database, its data clauses contain only three elements, not four. You can in data clauses always elide any trailing components you don't care about. In this case we don't care about the transaction information.

Bindings

?months is the simplest kind of binding, to a single scalar result. But a function might return a tuple of results, a collection of results, or a full relation (collection of tuples). These can be bound as follows:

Binding FormBinds
?ascalar
[?a ?b]tuple
[?a …]collection
[ [?a ?b ] ]relation

Find Specifications

Where bindings control inputs, find specifications control results.

Find SpecReturnsJava Type Returned
:find ?a ?brelationCollection of Lists
:find [?a …]collectionCollection
:find [?a ?b]single tupleList
:find ?a .single scalarScalar Value

The relation find spec is the most common, and the most general. It will return a tuple for each result, with values in each tuple matching the named variables. The following example return a set of e,v pairs:

;; query
[:find ?e ?v
 :where [?e :db/ident ?v]]

;; result
#{[22 :db.type/long] [38 :db.unique/identity] ...}

The collection spec is useful when you are only interested in a single variable. The form [?v …] below returns all values for ?v:

;; query
[:find [?v ...]
 :where [_ :db/ident ?v]]

;; results
#{:db.type/long :db.unique/identity ...}

The single tuple find spec is useful when you are interested in multiple variables, but expect only a single result. The form [?e ?ident] below returns a single ?e,/?ident/ pair

;; query
[:find [?e ?ident]
 :where [?e :db/ident ?ident]]

;; result
[22 :db.type/long]

The scalar find spec is useful when you want to return a single value of a single variable. The form ?v . below returns a single ?v value:

;; query
[:find ?v .
 :where [0 :db/ident ?v]]

;; result
:db.part/db

Note that the single tuple find spec and the scalar find spec will return only a single value from the query result, even if the result itself has more than one value. The find specs are typically used when you know in advance that a query will have only one result.

Expression Clauses

The second type of clause we will discuss is the expression clause. These clauses allow native Java or Clojure functions to be used inside of Datalog queries. Your own or library functions can be used as predicates or as transformation functions. Any functions or methods you use in expression clauses must be pure, i.e. they must be free of side effects and always return the same thing given the same arguments. Expression clauses have one of two basic shapes:

[(predicate ...)]
[(function ...) bindings]

So, the first item in an expression clause is a list designating a function or method call. If no bindings are provided, the function is presumed to be a predicate returning a boolean truth value. In fact, the interpretation of the return is subject to a broader notion of truth, where false or null are false and anything else is true. A predicate can be used to filter out some results:

[:find ?e :where [?e :age ?a] [(< ?a 30)]]
;;returning
[[sally]]

As you can see above, variables can be supplied as arguments to the predicate, and the function will be called on their bound values.

Functions behave similarly, except that their return values can in turn bind other variables:

[:find ?e ?months :where [?e :age ?a] [(* ?a 12) ?months]]
;;returning
[[fred 504], [ethel 504], [sally 252]]

Function calls may not be nested in expression clauses, e.g. the following will not work:

;; won't work!!!
[:find ?e ?months :where [?e :age ?a] [(* ?a (* 3 4)) ?months]]

Built-in Expression Functions and Predicates

Datomic provides the following built-in expression functions and predicates:

  • Two argument comparison predicates !=, <, <=, >, and >=.
  • Two-argument mathematical operators +, -, *, and /.
  • All of the functions from the clojure.core namespace of Clojure, except eval.
  • A set of functions and predicates that are aware of Datomic data structures, documented below:

get-else

The get-else function takes a database, an entity, a cardinality-one attribute, and a default value. It returns entity's value for attribute, or the default value if entity does not have a value.

[(get-else $ ?person :account/balance 0) ?balance]

get-some

The get-some function takes a database, entity, and one or more cardinality-one attributes, returning a tuple of the entity id and value for the first attribute possessed by the entity.

[(get-some $ ?person :person/customer-id :person/email) ?identifier]

ground

The ground function takes a single argument, which must be a constant, and returns that same argument. Programs that know information at query time should prefer ground over e.g. identity, as the former can be used inside the query engine to enable optimizations.

[(ground [:a :e :i :o :u]) [?vowel ...]]

fulltext

The fulltext function takes a database, an attribute, and a search expression, and returns a collection of four-tuples: entity, value, transaction, and score.

[(fulltext $ :community/name "Wallingford") [[?e ?name]]]

missing?

The missing? predicate takes a database, entity, and attribute, and returns true if the entity has no value for attribute in the database.

[(missing? $ ?customer :cust/orders)]

tx-ids

Given a log, start, and end, tx-ids returns a collection of transaction ids. Start and end can be specified as database t, transaction id, or instant in time, and can be nil.

tx-ids is often used in conjunction with tx-data, to first locate transactions and then the data within those transactions.

[(tx-ids ?log ?t1 ?t2) [?tx ...]]

tx-data

Given a log and a database, tx-data returns a collection of the datoms added by a transaction. You should not bind the transaction position of the result, as the transaction is already bound on input.

[(tx-data ?log ?tx) [[?e ?a ?v _ ?op]]]

Calling Java Methods

Java methods can be used as query expression functions and predicates, and can be type hinted for performance. Java code used in this way must be on the Java process classpath.

Calling Static Methods

Java static methods can be called with the (ClassName/methodName …) form. For example, the following code calls System.getProperties:

[(System/getProperties) [[?k ?v]]]

and could be used as follows:

// Groovy syntax for brevity
import static datomic.Peer.q;

q('''[:find ?k ?v
      :where [(System/getProperties) [[?k ?v]]]]''');

Calling Instance Methods

Java instance methods can be called with the (.methodName obj …) form. For example, the following code calls String.endsWith:

[(.endsWith ?k "path")]

and could be used to extend the previous example like this:

q('''[:find ?k ?v
      :where [(System/getProperties) [[?k ?v]]]
             [(.endsWith ?k "path")]]''');

Type Hinting for Performance

The current version of Datomic performs reflective lookup for Java interop. You can significantly improve performance by type hinting objects, allowing the query engine to make direct method invocations. Type hints take the form of ^ClassName preceding an argument, so the previous example becomes

[(.endsWith ^String ?k "path")]

Note that type hints outside java.lang will need to be fully qualified, and that complex method signatures may require more than one hint to be unambiguous.

Calling Clojure Functions

Clojure functions can be used as query expression functions and predicates. Clojure code used in this way must be on the Clojure process classpath. The example below uses subs as an expression function.

(d/q '[:find ?prefix
       :in [?word ...]
       :where [(subs ?word 0 5) ?prefix]]
     ["hello" "antidisestablishmentarianism"])

Function names outside clojure.core need to be fully qualified.

Multiple inputs

Queries can take multiple inputs by specifying an :in clause to describe and name them:

[:find ?e :in $data ?age :where [$data ?e :age ?age]]
;;this query must be called with two inputs, e.g. data and 42
;;returns
[[fred], [ethel]]

You would call it like this:

Peer.q(query, data, 42);

There are several things to note here, The first is the :in clause - :in $data ?age. This indicates that the query expects 2 inputs, and it will refer to them as $data and ?age.

Inputs named with leading $ are data sources, and can be matched using data clauses.

Inputs involving variables are binding patterns, and directly bind those variables. All of the binding patterns accepted for function returns listed above are also accepted for inputs. That means you can take scalars, tuples, collections, and relations as inputs and bind their components to variables for use in the query.

The result is a powerful parameterized query capability. In this case we expect a single scalar for ?age and 42 fits the bill. One should definitely prefer creating parameterized queries over making different versions of the query with different constants.

The final thing to note is the data clause, [$data ?e :age ?age] which now begins with the name of the data source. This is necessary since, once you can accept multiple inputs, you might be passed multiple data sources, and the data clause must know to which source it applies.

The implicit data source - $

Often you will have only a single, or primary, data source (usually a database). In this case you can call that data source $, and elide it in the data clauses:

[:find ?e :in $ ?age :where [?e :age ?age]]
;;same as
[:find ?e :in $data ?age :where [$data ?e :age ?age]]

Attributes as Query Inputs

Attribute idents vs. entity ids

When querying against a database, the query engine will automatically resolve attribute ident keywords in data clauses to their corresponding entity ids. When passing in attributes as inputs to a query, no such conversion is guaranteed, and so attribute entity ids should be passed rather than attribute idents. You can use Database.entid to get the entity id corresponding to an ident prior to passing into query:

db.entid(":age")

Using attribute ident constants is recommended whenever the attributes are known at query time, as doing so will perform better than passing attributes as inputs to query.

Reference attributes

When reference attributes are given as inputs to query, the query engine won't automatically resolve an ident keyword value to its entity id.

[:find ?e :in $ ?a :where [?e ?a :some-ident]] ;; Don't do this: won't auto-resolve value to entity id
[:find ?e :in $ ?a :where [?e ?a 12345]]       ;; Do this instead, pass entity id as value

Querying References

Datomic Datalog provides some conveniences when querying references in the value position. When the attribute position of a where clause is specified, and the attribute's type is :db.type/ref, the value position may be specified as either:

  • an integer, which Datalog interprets as being an entity id
  • a keyword, which Datalog interprets as being an ident, and will resolve to the corresponding entity id

For example:

[:find ?e :in $ :where [?e :some/reference 42]]
;; or
[:find ?e :in $ :where [?e :some/reference :forty-two]]

Of course, the value position could also be a binding variable or blank as usual, i.e. not specified.

[:find ?e :in $ :where [?e :some/reference ?v]]
;; or
[:find ?e :in $ :where [?e :some/reference _]]
;; or
[:find ?e :in $ :where [?e :some/reference]]

If both the entity and attribute positions of a clause are unbound and the value is specified, the value must be an entity id.

[:find ?e :in $ :where [?e ?a 42]]

Rules

Datomic datalog allows you to package up sets of :where clauses into named rules. These rules make query logic reusable, and also composable, meaning that you can bind portions of a query's logic at query time.

A rule is a named group of clauses that can be plugged into the :where section of your query. For example, here is a rule from the Seattle example dataset that tests whether a community is a twitter feed:

[(twitter? ?c)
 [?c :community/type :community.type/twitter]]

As with transactions and queries, rules are described using data structures. A rule is a list of lists. The first list in the rule is the head. It names the rule and specifies its parameters. The rest of the lists are clauses that make up the body of the rule. In this rule, the name is "twitter", the variable ?c is an input argument, and the body is single data clause testing whether the :community/type attribute of the entity ?c has the value :community.type/twitter.

This rule has no output argument - it is a predicate rule that will evaluate to true or false, indicating whether ?c matches the specified criteria. However, rules with more than one argument can be used to bind output variables that can be subsequently used elsewhere in the query.

[(community-type ?c ?t)
 [?c :community/type ?t]]

In the rule above, we could bind either ?c or ?t at invocation time, and the other variable would be bound to the output of the rule.

Individual rule definitions are combined into a set of rules. A set of rules is simply another list containing some number of rule definitions:

[[(twitter ?c)
  [?c :community/type :community.type/twitter]]]

You have to do two things to use a rule set in a query. First, you have to pass the rule set as an input source and reference it in the :in section of your query using the '%' symbol. Second, you have to invoke one or more rules from the :where section of your query. You do this by adding a rule invocation clause. Rule invocations have this structure:

(rule-name rule-arg*)

A rule invocation is a list containing a rule-name and one or more arguments, either variables or constants, as defined in the rule head. It's idiomatic to use parenthesis instead of square brackets to represent a rule invocation in literal form, because it makes it easier to differentiate from a data clause. However, this is not a requirement.

As with other where clauses, you may specify a database before the rule-name to scope the rule to that database. Databases cannot be used as arguments in a rule.

($db rule-name rule-arg*)

Rules also make it possible to define different logical paths to the same conclusion (i.e. logical OR). Here's a rule, again from the Seattle example, which identifies communities that are "social-media".

[[(social-media ?c)
  [?c :community/type :community.type/twitter]]
 [(social-media ?c)
  [?c :community/type :community.type/facebook-page]]]

The social-media rule has two definitions, one testing whether a community's type is :community.type/twitter and the other testing whether a community's type is :community.type/facebook-page. When a given community value is tested, the social-media rule will be true if either of the definitions is true. In other words, using rules, we can implement logical OR in queries.

In all the examples above, the body of each rule is made up solely of data clauses. However, rules can contain any type of clause: data, expression, or even other rule invocations.

Aggregates

Datomic's aggregate syntax is incorporated in the :find clause:

[:find ?a (min ?b) (max ?b) ?c (sample 12 ?d)
 :where ...]

The list expressions are aggregate expressions. Query variables not in aggregate expressions will group the results and appear intact in the result. Thus, the above query binds ?a ?b ?c ?d, then groups by ?a and ?c, and produces a result for each aggregate expression for each group, yielding 5-tuples.

Aggregates Returning a Single Value

The aggregation functions min, max, count, count-distinct, sum, avg, median, variance, and stddev all behave as their names suggest. For example, the following query finds the highest value of :object/meanRadius in a data set about the solar system.

[:find  (max ?radius)
 :where [_ :object/meanRadius ?radius]]

You can use a scalar find specification to pull only this single value.

[:find  (max ?radius) .
 :where [_ :object/meanRadius ?radius]]

If the largest meanRadius value was 3, The first form above would return:

[[3]]

While the second would return:

3

min and max support all database types (via comparators), not just numbers.

rand aggregator selects a random element from the collection being aggregated:

[:find  (rand ?name) .
 :where [?e :object/name ?name]]

Aggregates Returning Collections

(distinct ?xs)
(min n ?xs)
(max n ?xs)
(rand n ?xs)
(sample n ?xs)

distinct returns the set of distinct values in the collection. min / max n return the n (if available) least/greatest items. rand n selects n items with potential for duplicates, while sample n attempts to return n distinct elements, treating the collection as a population. In all cases where n is provided, fewer than n may be returned if that's all that is available.

The following query returns five names from a population of solar system objects:

[:find (sample 5 ?name)
 :with ?e
 :where [?e :object/name ?name]]

Control Grouping via :with

Unless otherwise specified, Datomic's datalog returns sets, and you will not see duplicate values. This is often undesirable when producing aggregates. Consider the following data set describing mythological monsters:

[["Cerberus" 3]
 ["Medusa" 1]
 ["Cyclops" 1]
 ["Chimera" 1]]

and this (incorrect!) head-counting query:

[:find (sum ?heads) .
 :in [[_ ?heads]]]
;;=> 4

The solution to this problem is the :with clause, which considers additional variables when forming the basis set for the query result. The :with variables are then removed, leaving a bag (not a set!) of values available for aggregation.

[:find (sum ?heads) .
 :with ?monster
 :in [[?monster ?heads]]]
;;=> 6

Custom Aggregates

You may call an arbitrary Clojure function as an aggregation function as follows:

Use the fully qualified name of the function. The one and only aggregated variable must be the last argument to the function. Other arguments to the function must be constants in they query. Your function will be called with a partial implementation of java.util.List - only size(), iterator(), and get(i) are supported.

For example, the following query might come in handy when analyzing naming conventions in a database. It returns the modes of schema name size, using a custom modes aggregator.

[:find (my.aggregates/modes ?length)
 :with ?e
 :where
 [?e :db/ident ?ident]
 [(name ?ident) ?name]
 [(count ?name) ?length]]

Pull Expressions

Pull expressions can used in a :find clause. A pull expression takes the form

(pull ?entity-var pattern)

and adds a value to the result set by applying the Pull API to the entities named by ?entity-var.

For example, the following query returns the :release/name for all of Led Zeppelin's releases:

;; query
[:find (pull ?e [:release/name])
 :in $ ?artist
 :where [?e :release/artists ?artist]]

;; args
db, led-zeppelin

;; results
[[{:release/name "Immigrant Song / Hey Hey What Can I Do"}]
 [{:release/name "Heartbreaker / Bring It On Home"}]
 [{:release/name "Led Zeppelin III"}]
 ...]

The pull expression pattern can also be bound dynamically as an :in parameter to query:

;; query
[:find (pull ?e pattern)
 :in $ ?artist pattern
 :where [?e :release/artists ?artist]]

;; args
db, led-zeppelin, [:release/name]

;; results elided, same as previous example

Filtering Databases

Datomic's filter allows you to limit your database to datoms that match an arbitrary predicate, prior to passing that database to a query or entity call.

For example, imagine that you want to exclude all values of an attribute entirely from consideration. The following code creates a filter that rejects all datoms about the attribute :user/passwordHash.

(def password-hash-id (d/entid plain-db :user/passwordHash))
(def password-hash-filter (fn [_ ^Datom datom] (not= password-hash-id (.a datom))))
(def filtered-db (d/filter (d/db conn) password-hash-filter))

Joining Multiple Versions of the Same Database

It is idiomatic to join filtered and unfiltered versions of the same database. One motivating case is performance: Since filtering is not free, it makes sense to filter only the parts of the query that need filtering. For example, the query below uses an unfiltered database to find entities, and then a filtered version of the same database to limit the attributes visible through the entity() call:

[:find ?ent
 :in $plain $filtered ?email
 :where
 [$plain ?e :user/email ?email]
 [(datomic.api/entity $filtered ?e) ?ent]]

Filtering on Transaction Attributes

Filters can do more than look at single datoms. They can also consider datoms in the context of the entire database. For example, imagine that you mark the transactions in your system with a :source/confidence field that indicates your confidence in the source, on a scale from 0 to 100. You could then filter the database as follows:

public static Database filterByConfidence(Database db, final long conf) {
    return db.filter(new Database.Predicate<datomic.Datom>() {
        public boolean apply(Database db, Datom datom) {
            Long confidence = (Long) db.entity(datom.tx()).get(":source/confidence");
            return (confidence != null) && (confidence > conf);
        }
    });
}

Queries using this filter can the focus on finding data of interest, without worrying about the cross-cutting concern of how trusted the data is. The query below finds only the stories whose titles were added by sources with a trust score higher than 90:

q("[:find ?title" +
  " :where [_ :story/title ?title]]", filterByConfidence(db, 90L));

A Warning

Two features of Datalog queries make them immune to many of the SQL-injection style attacks to which many other DBMSs are vulnerable:

  • Datalog queries are composed of data structures, rather than strings, which obviates the need to do string interpolation, sanitation, escaping, etc.
  • The query API is parameterized with data sources. In many cases, this feature obviates the need to include user-provided data in the query itself, instead preferring to pass user-provided data to a parameterized query as a data source.

You should never build queries by reading in a string that has been built up by concatenation or interpolation. Doing so gives up the security and simplicity of working with native data structures.

Performance

Clause order

To minimize the amount work the query engine must do, query authors should put the most restrictive or narrowing clauses first, and then proceed on to less and less restrictive clauses.

Queries and Peer Memory

Since queries run within the Peer with application-local memory, application designers need to consider the memory requirements for their queries. Queries are designed to be able to run over datasets much larger than memory. However, each intermediate representation step of a query must fit into local memory. Datomic doesn't spool intermediate representations to disk like some server-based RDBMS's.

Query Caching

Datomic processes maintain an in-memory cache of parsed query representations. Caching is based on equality (in the Java .equals sense) of the first argument to q. To take advantage of caching, programs should

  • Use parameterized queries (i.e. queries with multiple inputs) instead of building dynamic queries.
  • When building dynamic queries, use a canonical approach to naming and ordering such that equivalent queries will be Java .equals.