Database Filters
Filters take a database value and return a new database value that exposes only datoms that satisfy a predicate. This makes it possible to have a single set of queries and index traversals that can be used without change against different filtered views of your data.
Datomic databases can be filtered with the time-based predicates as-of
and since
. In addition, you can get an unfiltered view of all history via history
.
Each of these APIs is described below.
Example Database
All the examples below use an example inventory database, looking specifically at an entity that tracks the quantity of dilithium crystals on hand at a given point in time. The subset of the data relevant to the examples are:
e | a | v | tx | added |
---|---|---|---|---|
0x00000c00000003e9 | :db/txInstant | Mon Dec 31 19:00:00 | 0xc00000003e9 | true |
0x00001000000003ea | :item/id | DLC-042 | 0xc00000003e9 | true |
0x00001000000003ea | :item/description | Dilitihium Crystals | 0xc00000003e9 | true |
0x00001000000003ea | :item/count | 100 | 0xc00000003e9 | true |
0x00000c00000003eb | :db/txInstant | Thu Jan 31 19:00:00 | 0xc00000003eb | true |
0x00001000000003ea | :item/count | 100 | 0xc00000003eb | false |
0x00001000000003ea | :item/count | 250 | 0xc00000003eb | true |
0x00000c00000003ec | :db/txInstant | Thu Feb 27 19:00:00 | 0xc00000003ec | true |
0x00001000000003ea | :item/count | 250 | 0xc00000003ec | false |
0x00001000000003ea | :item/count | 50 | 0xc00000003ec | true |
0x00000c00000003ed | :db/txInstant | Mon Mar 31 20:00:00 | 0xc00000003ed | true |
0x00000c00000003ed | :tx/error | true | 0xc00000003ed | true |
0x00001000000003ea | :item/count | 50 | 0xc00000003ed | false |
0x00001000000003ea | :item/count | 9999 | 0xc00000003ed | true |
0x00000c00000003ee | :db/txInstant | Wed May 14 20:00:00 | 0xc00000003ee | true |
0x00001000000003ea | :item/count | 9999 | 0xc00000003ee | false |
0x00001000000003ea | :item/count | 100 | 0xc00000003ee | true |
as-of
An as-of
filter returns a database "as of" at a particular point in time,
ignoring any transactions after that point. The time specification
can be any time point, i.e.
- A Datomic transaction id. Use a transaction id when you want a database as of a specific transaction.
- A Datomic point in time. Use with, e.g., the basis-t value of the
:db-after
returned bytransact
- An instant in time (a
java.util.Date
). Use an instant when you have a wall clock time for the database you want, but do not have a transaction id or basis-t value, as an instant is not as precise as Datomic's t or tx values.
Because the item/count
of dilithium crystals is changing over time,
different as-of
views will show different counts:
(def db (d/db conn)) (d/pull db '[*] [:item/id "DLC-042"])
=> {:db/id 55301036830621764, :item/id "DLC-042", :item/description "Dilitihium Crystals", :item/count 100}
(def as-of-eoy-2013 (d/as-of db #inst "2014-01-01")) (d/pull as-of-eoy-2013 '[*] [:item/id "DLC-042"])
=> {:db/id 55301036830621764, :item/id "DLC-042", :item/description "Dilitihium Crystals", :item/count 250}
since
A since
filter is the opposite of as-of
. Taking the same
point-in-time arguments as as-of
, since
returns a value of the
database that includes only datoms added by transactions after that
point in time.
There is an important subtlety to consider when using since
. Typically,
the identifying information used to lookup an entity is established in
the first transaction about that entity, so a since
filter may not be
able to see the information needed for lookup. Our example has
exactly this problem trying to find the entity in question by its
:item/id
:
(def since-2014 (d/since db #inst "2014-01-01")) (d/pull since-2014 '[*] [:item/id "DLC-042"])
=> #:db{:id nil} ;; !!!
Most callers of since
will refer to the database twice: one reference to
the default "now" db to find entities, and a second reference to the
since
db to shave off the past. With entities, this looks like
(d/pull since-2014 '[*] (:db/id (d/pull db [:db/id] [:item/id "DLC-042"])))
=> {:db/id 55301036830621764, :item/count 100}
Notice that the "now" db
is used to resolve the lookup ref, and the
since-2014
db is then used to pull the associated attributes.
In the query, this can take the form of passing two different filters of the same db as separate inputs:
(d/q ('[:find ?count :in $ $since ?id :where [$ ?e :item/id ?id] [$since ?e :item/count ?count]]) db since-2014 "DLC-042")
Here, the db
argument is named $
inside the query and used to
resolve the :item/id
. Then the since-2014
is named $since
inside
the query, and used to find the :item/count
.
history
The history view of the database includes the present and the unfiltered past, including retractions. This makes the history view ideal for querying the complete history of an entity, or group of entities.
Below is a query for all assertions about dilithium crystals:
(def history (d/history db)) (->> (d/q '[:find ?aname ?v ?inst :in $ ?e :where [?e ?a ?v ?tx true] [?tx :db/txInstant ?inst] [?a :db/ident ?aname]] history [:item/id "DLC-042"]) (sort-by #(nth % 2)))
Note that the query joins through :db/txInstant
to return wall clock
time, and joins through :db/ident
to return human-readable attribute
names.
Simplified views of data that do not account for time are incompatible
with history
, since a history database can see multiple different
values for the "same" fact at different times. In particular, an
entity is a point-in-time view and cannot be created from a history
database.
filter
Filtering Errors
A filter view allows you to produce values of the database filtered by your own domain-specific predicates. A filter can be used to e.g. remove datoms that are incorrect, not applicable at a point in time, or not available for security reasons.
For example, suppose that the 9999 value for :item/count in the example database above turns out to be a mistake. The transaction that made this assertion could be identified, and used to filter out the incorrect value:
(def error-txes "Known bad transactions" #{13194139534317}) (defn correct? [_ datom] (not (contains? error-txes (:tx datom)))) (def corrected (d/filter history correct?))
Now, the corrected database value can be passed to all our existing queries, which will never see or consider the incorrect datoms.
Filtering For Security
Imagine that you want to exclude all values of an attribute entirely from consideration. The following code creates a filter that rejects all datoms about the attribute :user/passwordHash.
(def password-hash-id (d/entid plain-db :user/passwordHash)) (def password-hash-filter (fn [_ ^Datom datom] (not= password-hash-id (.a datom)))) (def filtered-db (d/filter (d/db conn) password-hash-filter))
Joining Different Filters of the Same Database
It is idiomatic to join different filters of the same database. One motivating case is performance: Since filtering is not free, it makes sense to filter only the parts of the query that need filtering. For example, the query below uses an unfiltered database to find entities, and then a filtered version of the same database to limit the attributes visible through the d/entity call:
[:find ?ent :in $plain $filtered ?email :where [$plain ?e :user/email ?email] [(datomic.api/entity $filtered ?e) ?ent]]
Filtering on Transaction Attributes
Filters can do more than look at single datoms. They can also consider datoms in the context of the entire database. For example, imagine that you mark the transactions in your system with a :source/confidence field that indicates your confidence in the source, on a scale from 0 to 100. You could then filter the database as follows:
(defn filter-by-confidence [db conf] (d/filter db (fn [db ^Datom datom] (when-let [confidence (->> (.tx datom) (d/entity db) :source/confidence)] (> confidence conf)))))
Queries using this filter can focus on finding data of interest, without worrying about the cross-cutting concern of how trusted the data is. The query below finds only the stories whose titles were added by sources with a trust score higher than 90:
(d/q '[:find ?title :where [_ :story/title ?title]] (filter-by-confidence db 90))
Usage Considerations
as-of Is Not a Branch
Filters are applied to an unfiltered database value obtained from db
or with
. In particular, the combination of with
and as-of
means
"with
followed by as-of
", regardless of which API call you make
first. with
plus as-of
lets you see a speculative db with recent
datoms filtered out, but it does not let you branch the past.
The Present Pays No Penalty
The original database value returned from connection does no filtering. It sees only the present, without having to filter the past. So queries about "now" are as efficient as possible–they do not consider history and pay no penalty for history, no matter how much history is stored in the system.
Filter or Log?
Filters are applied to database indexes, filtering out data that does not match a predicate. Filters cannot do anything that could not be done directly with indexes, but they do allow you to separate some aspect of a query, reusing query logic across different filtered views.
Consider the following two queries:
- "What happened between 8:00 and 9:00 this morning?"
- "What did entity 42 look like at 8:00 this morning?"
The first query is limited only by time. Such a query can be
answered directly from the log, which is a time index. Answering the
same question via as-of
and since
would require a filtering scan of
the entire database.
The second query is limited by both entity and by time. Thus it makes
sense to augment indexes with a filter, calling a query on an as-of
view of the database. The query will use the EAVT index to
jump straight to the requested datoms, which are then filtered to
exclude the most recent datoms. Because all the time-related work is done by as-of
, the query can be time-agnostic and can work
with any point-in-time. You can use the same approach in your own
queries, writing queries that are agnostic about time (or about some
domain value), and using filters to allow reuse of those queries across
different domain values.