Database Filters
Filters take a database value and return a new database value that exposes only datoms that satisfy a predicate. This makes it possible to have a single set of queries and index traversals that can be used without change against different filtered views of your data.
Datomic databases can be filtered with the time-based predicates asOf and since. Or, you can bring your own predicate with filter. In addition, you can get an unfiltered view of all history via history.
Each of these APIs is described below.
Example Database
All of the examples below form an example inventory database, looking specifically at an entity that tracks the quantity of dilithium crystals on hand at a given point in time. The subset of the data relevant to the examples is
e | a | v | tx | added |
---|---|---|---|---|
0x00000c00000003e9 | :db/txInstant | Mon Dec 31 19:00:00 | 0xc00000003e9 | true |
0x00001000000003ea | :item/id | DLC-042 | 0xc00000003e9 | true |
0x00001000000003ea | :item/description | Dilitihium Crystals | 0xc00000003e9 | true |
0x00001000000003ea | :item/count | 100 | 0xc00000003e9 | true |
0x00000c00000003eb | :db/txInstant | Thu Jan 31 19:00:00 | 0xc00000003eb | true |
0x00001000000003ea | :item/count | 100 | 0xc00000003eb | false |
0x00001000000003ea | :item/count | 250 | 0xc00000003eb | true |
0x00000c00000003ec | :db/txInstant | Thu Feb 27 19:00:00 | 0xc00000003ec | true |
0x00001000000003ea | :item/count | 250 | 0xc00000003ec | false |
0x00001000000003ea | :item/count | 50 | 0xc00000003ec | true |
0x00000c00000003ed | :db/txInstant | Mon Mar 31 20:00:00 | 0xc00000003ed | true |
0x00000c00000003ed | :tx/error | true | 0xc00000003ed | true |
0x00001000000003ea | :item/count | 50 | 0xc00000003ed | false |
0x00001000000003ea | :item/count | 9999 | 0xc00000003ed | true |
0x00000c00000003ee | :db/txInstant | Wed May 14 20:00:00 | 0xc00000003ee | true |
0x00001000000003ea | :item/count | 9999 | 0xc00000003ee | false |
0x00001000000003ea | :item/count | 100 | 0xc00000003ee | true |
asOf
An asOf filter returns a database "as of" a particular point in time, ignoring any transactions after that point. The time specification can be any time-point, i.e.
- a Datomic transaction id. Use a transaction id when you want a database as of a specific transaction.
- a Datomic point in time. Use with, e.g., the basis-t value of the :db-after returned by derefing the future returned by transact.
- an instant in time (a java.util.Date). An instant should be used when you have a wall clock time for the database you want but do not have a transaction id or basis-t value, as an instant is not as precise as Datomic's t or tx values.
Because the item/count of dilithium crystals is changing over time, different asOf views will show different counts:
(def db (d/db conn)) (d/entity db [:item/id "DLC-042"])
=> {:item/id "DLC-042", :item/description "Dilitihium Crystals", :item/count 100, :db/id 17592186045418}
(def as-of-eoy-2013 (d/as-of db #inst "2014-01-01")) (d/entity as-of-eoy-2013 [:item/id "DLC-042"])
=> {:item/id "DLC-042", :item/description "Dilitihium Crystals", :item/count 250, :db/id 17592186045418}
since
A since filter is the opposite of asOf. Taking the same point-in-time arguments as asOf, since returns a value of the database that includes only datoms added by transactions after that point in time.
There is an important subtlety to consider when using since. Typically, the identifying information used to lookup an entity is established in the first transaction about that entity, so a since view may not be able to see the information needed for lookup. Our example has exactly this problem trying to find the entity in question by its :item/id
(def since-2014 (d/since db #inst "2014-01-01")) (d/entity since-2014 [:item/id "DLC-042"])
=> nil ;; !!!
Most callers of since will refer to the database twice: one reference to the default "now" db to find entities, and a second reference to the since db to shave off the past. With entities, this looks like
(d/entity since-2014 (d/entid db [:item/id "DLC-042"]))
=> {:item/count 100, :db/id 17592186045418}
Notice that the "now" db is used to resolve the lookup ref, and the since-2014 db is then used to build the entity.
In query, this can take the form of passing two different filters of the same db as separate inputs:
(d/q ('[:find ?count :in $ $since ?id :where [$ ?e :item/id ?id] [$since ?e :item/count ?count]]) db since-2014 "DLC-042")
Here, the db argument is named $ inside the query, and used to resolve the :item/id. Then the since-2014 is named $since inside the query, and used to find the :item/count.
history
The history view of the database includes the present and all of the past, unfiltered. This makes the history view ideal for querying the complete history of an entity, or group of entities.
Here is a query for all assertions about dilithium crystals:
(def history (d/history db)) (->> (d/q '[:find ?aname ?v ?inst :in $ ?e :where [?e ?a ?v ?tx true] [?tx :db/txInstant ?inst] [?a :db/ident ?aname]] history [:item/id "DLC-042"]) (sort-by #(nth % 2)))
Note that the query joins through :db/txInstant to return wall clock time, and joins through :db/ident to return human-readable attribute names.
Simplified views of data that do not account for time are incompatible with history, since a history database can see multiple different values for the "same" fact at different times. In particular, an entity is a point-in-time view and cannot be created from a history database.
filter
Filtering Errors
A filter view allows you to produce values of the database filtered by your own domain-specific predicates. A filter can be used to e.g. remove datoms that are incorrect, not applicable at a point in time, or not available for security reasons.
For example, suppose that the 9999 value for :item/count in the example database above turns out to be a mistake. The transaction that made this assertion could be identified, and used to filter out the incorrect value:
(def error-txes "Known bad transactions" #{13194139534317}) (defn correct? [_ datom] (not (contains? error-txes (:tx datom)))) (def corrected (d/filter history correct?))
Now, the corrected database value can be passed to all our existing queries, which will never see or consider the incorrect datoms.
Filtering For Security
Imagine that you want to exclude all values of an attribute entirely from consideration. The following code creates a filter that rejects all datoms about the attribute :user/passwordHash.
(def password-hash-id (d/entid plain-db :user/passwordHash)) (def password-hash-filter (fn [_ ^Datom datom] (not= password-hash-id (.a datom)))) (def filtered-db (d/filter (d/db conn) password-hash-filter))
Joining Different Filters of the Same Database
It is idiomatic to join different filters of the same database. One motivating case is performance: Since filtering is not free, it makes sense to filter only the parts of the query that need filtering. For example, the query below uses an unfiltered database to find entities, and then a filtered version of the same database to limit the attributes visible through the entity() call:
[:find ?ent :in $plain $filtered ?email :where [$plain ?e :user/email ?email] [(datomic.api/entity $filtered ?e) ?ent]]
Filtering on Transaction Attributes
Filters can do more than look at single datoms. They can also consider datoms in the context of the entire database. For example, imagine that you mark the transactions in your system with a :source/confidence field that indicates your confidence in the source, on a scale from 0 to 100. You could then filter the database as follows:
public static Database filterByConfidence(Database db, final long conf) { return db.filter(new Database.Predicate<datomic.Datom>() { public boolean apply(Database db, Datom datom) { Long confidence = (Long) db.entity(datom.tx()).get(":source/confidence"); return (confidence != null) && (confidence > conf); } }); }
Queries using this filter can focus on finding data of interest, without worrying about the cross-cutting concern of how trusted the data is. The query below finds only the stories whose titles were added by sources with a trust score higher than 90:
q("[:find ?title" + " :where [_ :story/title ?title]]", filterByConfidence(db, 90L));
Usage Considerations
As-of Is Not a Branch
Filters are applied to an unfiltered database value obtained from db
or with
. In particular, the combination of with
and as-of
means
"with
followed by as-of
", regardless of which API call you make
first. with
plus as-of
lets you see a speculative db with recent
datoms filtered out, but it does not let you branch the past.
The Present Pays No Penalty
The original database value returned from connection does no filtering. It sees only the present, without having to filter the past. So queries about "now" are as efficient as possible–they do not consider history and pay no penalty for history, no matter how much history is stored in the system.
Filter or Log?
Filters are applied to database indexes, filtering out data that does not match a predicate. Filters cannot do anything that could not be done directly with indexes, but they do allow you to separate some aspect of a query, reusing query or entity logic across different filtered views.
Consider the following two queries:
- "What happened between 8:00 and 9:00 this morning?"
- "What did entity 42 look like at 8:00 this morning?"
The first query is limited only by time. Such a query can be answered directly from the log, which is a time index. Answering the same question via asOf and since would require a filtering scan of the entire database.
The second query is limited by both entity and by time. Thus it makes sense to augment indexes with a filter, calling entity on an asOf view of the database. The entity call will use the EAVT index to jump straight to the requested datoms, which are then filtered to exclude the most recent datoms. Because all the time-related work is done by asOf, the call to entity can be time-agnostic and can work with any point-in-time. You can use the same approach in your own queries, writing queries that are agnostic about time (or about some domain value), and using filters to allow reuse of those queries across different domain values.