Tutorial

Introduction

This tutorial walks you through the process of creating a database, loading a schema and initial data, querying and manipulating data.

There is a video introducing many of these topics.

All of the examples in this tutorial are based on the sample data set included in the Datomic Peer .zip. The data set is an inventory of online communities in the Seattle area (the source data is maintained at http://data.seattle.gov/Community/Seattle-Communities-Online-Inventory/5ytf-wban).

Programming with data structures

Datomic is designed to be directly programmable using data. Transactions, queries and query results are all represented as simple list and map data structures. We'll be using these data structures as examples throughout this tutorial. When we do, we'll present them as literals, like this:

[:find ?c :where [?c :community/name "belltown"]]

[:find ?c :in ?x :where [(.getClass ?x) ?c]]

[{:db/id 17592186045790 :db/doc "This is an example"}]

The brackets [] and parenthesis () represent a list. The braces {} represent a map. You can put , in between entries in lists or maps if you want to, but they're optional. Lists and maps can contain nested lists and maps, or simple values.

There are two simple values that deserve special attention: keywords and symbols. Keywords begin with : and include an optional namespace and a name. In the examples above :find, :in, :where, :community/name, :db/id and :db/doc are keywords. Symbols do not begin with :. In the examples above ?c, ?x, and .getClass are symbols. Keywords and symbols are different from strings, which are surrounded by "". In the examples above, /"belltown"/ and /"This is an example"/ are strings.

When you program with Datomic, your application will build data structures like these. There are two approaches to creating them: parsing them from strings or files (or resources) or constructing them programmatically. When you are constructing data structures programmatically to pass as input to Datomic APIs, you can pass strings in place of keyword or symbol objects. Datomic will convert them to the appropriate type. When you read keyword data from a database, however, a keyword object will be returned.

The datomic.Util class provides functions to help build lists and maps. Whether you use these APIs depends on the approach you take to build transactions and queries, and on your programming language, i.e., Java, JRuby, clojure, etc. Whatever path you take, the functions are there to help if you need them.

Following along with code

This tutorial introduces the Datomic API. Some of the examples show only data structures, but some also include code. For those examples, the code is shown in Java. The examples present code fragments, not complete programs. There are also complete executable examples in Java (samples/seattle/GettingStarted.java), Groovy (samples/seattle/GettingStarted.groovy), and Clojure (samples/seattle/getting-started.clj). To get the most out of this tutorial, it's best to follow along in one of those files and execute actual code as we go.

To run the Java code you can compile and step through it in a debugger (the header comment in the source includes compilation instructions). Alternatively, Java developers can follow the Groovy code, which is very similar.

To run the Groovy code, run bin/groovysh to get a Groovy shell configured with the appropriate classpath. You can copy lines from the Groovy sample file into the shell.

To use the Clojure code, run bin/repl to get a REPL configured with the appropriate libraries.

Which Datomic storage option?

Datomic provides five options for storage, listed below.

StorageWherePurpose
memYour process, no durabilityExperimenting, testing
devProcess on local machine, with storageDevelopment
sqlSQL databaseProduction
infInfinispan memory clusterProduction
ddbAmazon Dynamo DB tableProduction

We're going to use the memory environment, so we can start exploring right away. Everything we do will work with the other environments too, but more setup is required, as described here.

You bootstrap communication to any Datomic system using a URI. The URI format for an in-memory system is:

datomic:mem://[db-name]

where [db-name] is the name of the database you want to work with.

Since our sample data set describes online communities around Seattle, we'll use "seattle" as our database name:

datomic:mem://seattle

Making a database

With this URI in hand, we're ready to start working with Datomic. The first thing we have to do is make a database to work with. This section explains how to create a database and connect to it, then load our sample schema and seed data.

Creating a database and connection

The main entry-point for the Datomic API is the datomic.Peer class. It exposes static methods that allow you to create, connect to, and delete databases.

To access the Datomic Peer class, first we need to import it. We'll also import the Connection and Util classes for use in this tutorial.

import datomic.Peer
import datomic.Connection
import datomic.Util

The first thing to do is create a database. To do this, call Peer.createDatabase, passing the database URI:

uri = "datomic:mem://seattle"
Peer.createDatabase(uri)

Once your database is created, you can connect to it by calling Peer.connect, again, passing the URI:

conn = Peer.connect(uri)

Peer.connect returns a reference to a datomic.Connection. You can use this object to submit transactions to the database, and to retrieve a value of the database to query.

Adding a schema

Once you are connected to your database, you can add a schema to it. A schema defines the set of attributes you can assign to entities. At a minimum, a schema specifies each attribute's name (technically, it's :db/ident), type, and cardinality (whether it can have one or many values for a given entity). Schemas may define additional characteristics of attributes, like whether their values must be unique across entities, or whether they are indexed for fulltext search.

The Seattle sample provides data files for the schema definition, seed data and a number of queries. The schema data file is samples/seattle/seattle-schema.edn. It contains the literal representation of a transaction that creates the desired schema.

We're not going to dig into the schema in detail now (but we will later). At this point, it's only important to know that it defines attributes for modeling three main entity types: communities, neighborhoods and districts. By convention, the attributes used to describe a particular entity are named with a common prefix:

AttributeTypeCardinality
:community/nameStringOne
:community/urlStringOne
:community/neighborhoodReferenceOne
:community/categoryStringMany
:community/orgtypeReferenceOne
:community/typeReferenceOne
:neighborhood/nameStringOne
:neighborhood/districtReferenceOne
:district/nameStringOne
:district/regionReferenceOne

The :community/neighborhood and :neighborhood/district attributes are of reference type, meaning that their values refer to other entities. They are used to associate a community with a neighborhood and a neighborhood with a district. The community, neighborhood and district entities will all be created when data is loaded.

The :community/orgtype, :community/type, and :district/region attributes are also of reference type. Their values refer to entities representing enumerated values. These enumerated value entities are named with keyword identifiers. The reference attributes and their associated enumerated values are listed in the table below.

Reference attributeEnumerated value identifier
:community/orgtype:community.orgtype/community
:community.orgtype/commercial
:community.orgtype/nonprofit
:community.orgtype/personal
:community/type:community.type/email-list
:community.type/twitter
:community.type/facebook-page
:community.type/blog
:community.type/website
:community.type/wiki
:community.type/myspace
:community.type/ning
:district/region:region/nw
:region/n
:region/ne
:region/e
:region/se
:region/s
:region/sw
:region/w

In order to load the sample schema into your database, you have to read the .edn file and build an in-memory data structure. You can do this by creating a FileReader for the file, and passing it to Util.readAll:

schema_rdr = new FileReader("samples/seattle/seattle-schema.edn")

schema_tx = Util.readAll(schema_rdr).get(0)

The Util.readAll method returns a list, in this case containing one thing, the schema transaction, which is retrieved by calling get. Once the schema is read, you can manipulate it as data. More importantly, you can submit the schema definition data to the database as a transaction by passing it to your connection object's transact method.

txResult = conn.transact(schema_tx).get()

Transact returns a Future<Map> object. Its get method returns a map containing information about the transaction if the transaction commits and throws an exception if the transaction aborts.

The transact method is synchronous, it waits until the transaction completes before returning. There is a second function transactAsync that submits a transaction and returns immediately. It also returns a Future<Map> object. Its get method will block until the transaction completes, returning true if the transaction commits and throws an exception if the transaction aborts.

Adding seed data

Once a schema is loaded, you start adding data to your database. The "samples/seattle/seattle-data0.edn" file defines seed data based on the sample schema. Like the schema file, the seed data file contains the literal representation of a transaction that adds the data.

We can read the seed data transaction file the same way we read the schema file: wrap it in a reader and pass it to Util.readAll.

data_rdr = new FileReader("samples/seattle/seattle-data0.edn")
data_tx = Util.readAll(data_rdr).get(0)
data_rdr.close()

Once it's read, we can submit the seed data transaction to the database.

txResult = conn.transact(data_tx).get()

The database is now seeded with data we can work with! In the following sections, we'll walk through querying and modifying the data. If at some point you want to return to the initial seed state, the simplest way to clean out the existing data is to delete the database by calling Peer.deleteDatabase, passing in your connection URI. Then you can create the database again, reload the schema and reload the data.

Getting a database value

In order to access the data that we've just transacted, we need to get a database value. The simplest way of getting a database value is to call Connection.db:

db = conn.db()

Calls to Connection.db are fully local, i.e. they do not block nor communicate with the transactor, but rather immediately return the most recent database value available to the Peer.

Querying the database

With the seed data is loaded, we can query the database. This section introduces queries, explaining how to construct basic questions, ask them and process the answers. The Datomic query language is based on Datalog, a formal language for finding data by specifying a set of logical rules and conditions.

A first query

As with transactions, Datomic queries are expressed as data structures. You can represent them as literal strings, load them from files, or build them. We'll start out using strings and consider the alternatives later.

Queries consist of three sections: :find, :in and :where. The :find section specifies what the query should return. The :in section specifies data sources. It is not necessary when querying a single data source, we'll talk about it later. The :where section specifies one or more data, expression or rule clauses.

We'll start by using data clauses only, and explore the other types later on. The basic structure for a data clause is:

[entity attribute value?]

A data clause always has either a variable or a constant for entity and attribute. It may also specify a variable or a constant for value, but that's optional. When you query with a data clause, the query engine will find entity, attribute, value tuples that match the constraints expressed by the clause.

Here's a simple query that uses a single data clause to find communities.

[:find ?c :where [?c :community/name]]

The :find section specifies we want to retrieve all values of the variable ?c (for community). The :where section contains a single data clause matching any entity ?c with a value for the :community/name attribute. Entities for which the predicate are true will be included in the query results.

Queries are executed against a value of your database. You can get the current database value by calling your connection object's db method. It returns a reference to a datomic.Database, which represents the state of the database at the moment you retrieved it. You can query the database value by calling Peer.q, passing the query (as either a string or a data structure) and the database value as the as the second argument:

results = Peer.q("[:find ?c :where [?c :community/name]]",
  conn.db())

The query method returns a Collection<List<Object>>. Each list in the collection contains valid set of values for the variables you were looking for. In this case, we're only finding one variable, ?c, so each list contains a single value, the id of an entity with a value for the :community/name attribute. We can print the output to see it:

for (result in results) { println result }

[17592186045520]
[17592186045518]
[17592186045516]
...

Because Datomic query results are represented as regular data structures, we can bring all of the power of the general data APIs to bear, even in the interactive shell. So, if, for example, we wanted to know how many communities there are, we can simply call the result collection's size method.

println results.size()

150

Getting an entity's attribute values

Getting an entity is useful, but in most cases what we really want is one or more of its attribute values. There are two ways to get them. The first is to convert the entity id to an entity map by calling Database.entity, passing in the id.

id = results.iterator().next().get(0)
entity = conn.db().entity(id)

The entity method returns an object that implements datomic.Entity, a Map-like interface.

Once you have an entity, you can read an entity's attribute values simply by calling get. As with maps, you can get the set of keys the entity defines by calling the keySet method (this is very useful in the interactive shell). You can read a value by requesting it from the entity.

name = entity.get(":community/name")

If you want to print all the community names, you can loop through the results, get each entity, and extract the value for :community/name.

db = conn.db()

for (result in results) {
  entity = db.entity(result.get(0))
  println entity.get(":community/name")
}

// results
Central District News
Central Ballard Community Council
Central Area Community Festival Association
Nature Consortium
...

If the attribute you access via get is of reference type, you won't get back a raw entity id. Instead get will return another map that you can use to access the referenced entity. This makes it very easy to write code that traverses entity relationships. Here is an example that uses this technique to print the names of the neighborhoods that communities are in.

db = conn.db();

for (result in results) {
  entity = db.entity(result.get(0))
  neighborhood = entity.get(":community/neighborhood")
  println neighborhood.get(":neighborhood/name")
}

// results
Central District
Ballard
...

All reference relationships in Datomic are bi-directional. In other words, a community's :community/neighborhood attribute establishes a relationship between a community entity and a neighborhood entity that can be traversed in either direction, even though a neighborhood entity has no :neighborhood/community attribute. You can navigate relationships in the reverse direction by prepending a '_' to the name portion of a reference attribute keyword. For example, you can navigate from a neighborhood entity to all the community entities that reference it by getting the /:community/_neighborhood/ attribute of the neighborhood entity.

community = conn.db().entity(results.iterator().next().get(0))

neighborhood = community.get(":community/neighborhood")

communities = neighborhood.get(":community/_neighborhood")

for (comm in communities) {
  println comm.get(":community/name") 
}

// results
Central Area Community Festival Association
Central District News
Friends of Frink Park
KOMO Communities - Central District

Querying for an attribute's value

Entity maps provide one way to get an entity's attribute values, but often it's simpler to start with a more specific query. Here's our original query again:

[:find ?c :where [?c :community/name]]

If what we really want is to find the communities' names, we can modify the data clause in the :where section to include a variable for the name. Note that we have to add the variable to the :find section too if we want the data included in the query results.

[:find ?c ?n :where [?c :community/name ?n]]

This query says find each unique pair of values ?c and ?n where ?c is an entity with a :community/name attribute whose value is ?n. This query returns a collection of lists with two values each, the unique combinations that match ?c and ?n. The code below prints the second part of each pair, the community name.

results = Peer.q("[:find ?c ?n :where [?c :community/name ?n]]",
  conn.db())

for (result in results) {
  println result.get(1)
}

// results
Ballard Gossip Girl
CHS Capitol Hill Seattle Blog
Delridge Grassroots Leadership
...

Querying for multiple attributes' values

If you want to get additional attribute values from your query, you can add additional variables and conditions. Multiple predicates in a :where clause produce a join, so this query finds pairs of values ?n and ?u where there is an entity ?c whose :community/name attribute has the value ?n AND whose :community/url attribute has the value ?u.

[:find ?n ?u
 :where
 [?c :community/name ?n]
 [?c :community/url ?u]]

It produces these results.

[["Chinatown/International District" "http://www.cidbia.org/"]
 ["All About Belltown" "http://www.belltown.org/"]
 ["Friends of Discovery Park" "http://www.friendsdiscoverypark.org/"]

 ...]

Note that while the ?c variable is critical to the logic of this query, the value of ?c is not returned in the results because it is not in the :find section.

Attributes defined with :db.cardinality/many may associate multiple values with an entity. When you query for the value of a multi-occurrence attribute, you get a different result tuple for each value of the attribute.

The only multi-occurrence attribute in the sample database is :community/category, of type string. The "belltown" community has two categories, if you query for them:

[:find ?e ?c
 :where
 [?e :community/name "belltown"]
 [?e :community/category ?c]]

you get two results:

[[17592186045499 "news"]
 [17592186045499 "events"]]

The value for ?e is the same in both results, it's the entity id for the community named "belltown". There are two values for :community/category, "news" and "events".

Querying by attribute values

The previous example showed how to query for the values of attributes. You can also query by the values of attribute. All you need to do is replace the value variable with a constant. For example, you could specify that :community/type must be :community.type/twitter.

[:find ?n
 :where
 [?c :community/name ?n]
 [?c :community/type :community.type/twitter]]

More precisely, this query finds values for the variable ?n where there is an entity ?c whose :community/name attribute has a value ?n AND whose :community/type is an entity ?t whose :db/ident attribute has the value :community.type/twitter. It produces there results.

[["Discover SLU"]
 ["Fremont Universe"]
 ["Columbia Citizens"]
 ["Magnolia Voice"]
 ["Maple Leaf Life"]
 ["MyWallingford"]]

Querying across references

The sample data model includes three main entity types - communities, neighborhoods and districts - whose instance refer to each other. You can write queries that traverse references by introducing a variable for each entity you visit.

For instance, imagine you wanted to find all communities in a given region. A community entity do not have a :community/region attribute. Instead, it has a :community/neighborhood attribute, which refers to a neighborhood entity. A neighborhood entity has a :neighborhood/district attribute which refers to a district entity. And a district entity has a :district/region attribute, which refers to a region. To find all communities in a given region a query has to visit the intervening neighborhoods and districts, as shown below.

[:find ?c_name
 :where
 [?c :community/name ?c_name]
 [?c :community/neighborhood ?n]
 [?n :neighborhood/district ?d]
 [?d :district/region :region/ne]]

This query finds values for the variable ?c_name where there is an entity ?c whose :community/name attribute has a value ?c_name and whose :community/neighborhood attribute is an entity ?n whose :neighborhood/district attribute is an entity ?d whose :district/region attribute is an entity ?r whose :db/ident attribute has the value :region/ne. Here are the results.

[["KOMO Communities - U-District"]
 ["Maple Leaf Community Council"]
 ["KOMO Communities - View Ridge"]
 ["Hawthorne Hills Community Website"]
 ["Aurora Seattle"]
 ["Magnuson Community Garden"]
 ["Laurelhurst Community Club"]
 ["Magnuson Environmental Stewardship Alliance"]
 ["Maple Leaf Life"]]

The last data clause in the :where section of the previous query deserves special attention. Remember that we're using entities to represent enum values. You can compare the enum value associated with a given attribute, e.g., :district/region, using a keyword, e.g., :region/ne, even though the attribute is of reference type. This is because the entities used as values for the enum have :db/ident values, one of which is :region/ne.

If you wanted to query for the names of communities and the regions they're in, you'd need to traverse the enum value entity itself, like this:

[:find ?c_name ?r_name
 :where
 [?c :community/name ?c_name]
 [?c :community/neighborhood ?n]
 [?n :neighborhood/district ?d]
 [?d :district/region ?r]
 [?r :db/ident ?r_name]]

The results are community names and regions in pairs.

[["Highland Park Action Committee" :region/sw]
 ["Broadview Community Council" :region/sw]
 ["Central Ballard Community Council" :region/nw]
 ["Laurelhurst Community Club" :region/ne]

 ...]

Advanced queries

The previous section explained how to build and execute queries. This section expands on that, explaining how to parameterize queries, invoke code from within a query, do a fulltext search and define and use query rules.

Parameterizing queries

In the examples above that query by attribute value, we specified the values to look for directly as literals in the query itself. If we wanted to query for a different constant value - looking for :community.type/facebook-page instead of :community.type/twitter for example - we'd have to write another query.

If you're doing anything more than experimenting in the shell, this gets tedious. If you're representing queries as strings (as opposed to data structures), it's also less efficient. When you pass a string literal to the query method, it parses the string into a data structure and caches it. If you keep varying the string content, the cache is not effective.

The solution is to write your query in terms of a parameter whose value you specify when you call query, but separately from the query string or data structure itself. To do this, we'll need to add an :in section to our queries.

We haven't needed :in up to now because all our queries have used a single input source, the database value we pass as an argument when we call Peer.q. Now, however, we want an additional input source, a parameter to use within the query.

Here's an example. This is the query we used before to find the names of communities whose :community/type attributes have the value :community.type/twitter:

[:find ?n
 :where
 [?c :community/name ?n]
 [?c :community/type :community.type/twitter]]

We can parameterize it by adding an :in section, like this:

[:find ?n
 :in $ ?t
 :where
 [?c :community/name ?n]
 [?c :community/type ?t]]

The :in section always comes between the :find and :where sections. It specifies an ordered list of input sources. Database sources are named starting with a '$'. If you have only one database as an input source, you can name it simply $. For other input sources, you specify a binding pattern. In this case we're passing a single scalar value as an argument, so our binding pattern is the ?t variable. We can refer to ?t in the :where clauses just as we would refer to any other variable.

When we execute this query, we simply have to pass an additional input source, the value to use for ?t, as an additional argument. Note that because :community/type attribute is of type :db/keyword, we have to pass a keyword for the argument, but we can pass it as a string if we want to. Also note that the input sources must be passed in the same order they are declared in the :in section, that is, first the database, then the parameter value.

results = Peer.q(
  "[:find ?n :in \$ ?t :where [?c :community/name ?n][?c :community/type ?t]]",
  conn.db(),
  ":community.type/twitter")

The results are the same as the previous version of this query, where the value :community.type/twitter was a constant.

[["Discover SLU"]
 ["Fremont Universe"]
 ["Columbia Citizens"]
 ["Magnolia Voice"]
 ["Maple Leaf Life"]
 ["MyWallingford"]]

Now that the query is parameterized, we can change the community type we're querying for simply by passing a different keyword for the value of ?t, like :community.type/facebook-page.

results = Peer.q(
  "[:find ?n :in \$ ?t :where [?c :community/name ?n][?c :community/type ?t]]",
  conn.db(),
  ":community.type/facebook-page")

The results are different (some community names appear on both lists, but they are from different communities).

[["Discover SLU"]
 ["Blogging Georgetown"]
 ["Fremont Universe"]
 ["Columbia Citizens"]
 ["Magnolia Voice"]
 ["Maple Leaf Life"]
 ["MyWallingford"]
 ["Eastlake Community Council"]
 ["Fauntleroy Community Association"]]

The previous example used a single value for the parameter ?t, but you can also use a list of individual values. Here is a query that finds communities if type :community.type/twitter or :community.type/email-list.

[:find ?n ?t
 :in $ [?t ...]
 :where [?c :community/name ?n][?c :community/type ?t]]

In this case, the input source passed as the last argument to query is a list of keywords, like this:

[:community.type/facebook-page :community.type/twitter]

The results include both the community name and type:

[["Discover SLU" :community.type/twitter]
 ["Magnolia Voice" :community.type/facebook-page]
 ["Blogging Georgetown" :community.type/facebook-page]
 ["Maple Leaf Life" :community.type/twitter]
 ["Fremont Universe" :community.type/twitter]
 ["Magnolia Voice" :community.type/twitter]
 ["Columbia Citizens" :community.type/facebook-page]
 ["Fauntleroy Community Association" :community.type/facebook-page]
 ["Columbia Citizens" :community.type/twitter]
 ["Eastlake Community Council" :community.type/facebook-page]
 ["Fremont Universe" :community.type/facebook-page]
 ["Maple Leaf Life" :community.type/facebook-page]
 ["Discover SLU" :community.type/facebook-page]
 ["MyWallingford" :community.type/twitter]
 ["MyWallingford" :community.type/facebook-page]]

The :in section uses the collection binding expression, [?t …]. The indicates that the variable will be bound to each value specified in the corresponding input source. The input source is passed as the third parameter, after the database input source. It is a list constructed by calling Util.list with the desired keywords as arguments. Note that, the query returns the value of the parameter ?t as part of the results specified by :find, making it easy to see which communities are of which type.

In addition to passing multiple values for a single parameter, you can pass tuples of values for multiple parameters. This query finds communities that are :community.type/email-list and :community.orgtype/community as well as communities that are :community.type/website that are :community.orgtype/commercial.

[:find ?n ?t ?ot
 :in $ [[?t ?ot]]
 :where
 [?c :community/name ?n]
 [?c :community/type ?t]
 [?c :community/orgtype ?ot]]

The actual input arguments are passed in as a list of lists.

[[:community.type/email-list :community.orgtype/community]
 [:community.type/website :community.orgtype/commercial]]

Here are the results, containing communities that are either community email lists or commercial websites.

[["Ballard Neighbor Connection"
  :community.type/email-list
  :community.orgtype/community]
 ["Leschi Community Council"
  :community.type/email-list
  :community.orgtype/community]
 ["15th Ave Community"
  :community.type/email-list
  :community.orgtype/community]
 ["Beacon Hill Burglaries"
  :community.type/email-list
  :community.orgtype/community]
 ["InBallard" :community.type/website :community.orgtype/commercial]
 ["Greenwood Community Council Discussion"
  :community.type/email-list
  :community.orgtype/community]

 ...]

Invoking functions in queries

All the queries we've looked at so far have used data clauses only in the :where. There are also expression clauses. Expression clauses allow you to invoke functions as part of a query. The structure for an expression clause is:

[(function input-arg*) output-binding]

The code expression is contained inside a list nested in a list. It's idiomatic to use parenthesis inside square brackets for the inner list when representing this structure in literal form.

The function is a comparison operator (i.e., =, !=, >, <, >=, <=), a static method, or a member method. You can pass 0 or more query variables or constants as input. You can bind output variables using the same expressions we used for parameterizing queries. Predicate functions - comparisons or other test methods - that return boolean values do not bind output. Instead, their output is used to filter results.

Here's a simple example. Here's a query that finds communities whose names come before "C" in alphabetical order.

[:find ?n
 :where
 [?c :community/name ?n]
 [(.compareTo ?n "C") ?res]
 [(< ?res 0)]]

The second clause in the :where section uses the compareTo method on the string value in the variable ?n to test whether the community name comes before the constant "C" alphabetically. It captures the output of compareTo in the variable ?res. A negative value for ?res indicates a name that comes before "C", so the last clause tests ?res to see if it is less than 0.

[["All About South Park"]
 ["Ballard Neighbor Connection"]
 ["Ballard Blog"]
 ["At Large in Ballard"]
 ["Ballard Chamber of Commerce"]
 ["Beacon Hill Burglaries"]
 ["Alki News"]
 ["Beacon Hill Alliance of Neighbors"]
 ["Beach Drive Blog"]
 ["Ballard Avenue"]
 ["Aurora Seattle"]
 ["15th Ave Community"]
 ["Ballard Moms"]
 ["All About Belltown"]
 ["Admiral Neighborhood Association"]
 ["Ballard District Council"]
 ["Beacon Hill Community Site"]
 ["ArtsWest"]
 ["Alki News/Alki Community Council"]
 ["\"Columbia City, Seattle\""]
 ["Bike Works!"]
 ["Beacon Hill Blog"]
 ["Ballard Gossip Girl"]
 ["Blogging Georgetown"]
 ["Broadview Community Council"]
 ["Ballard Historical Society"]]

Note that the query engine knows about types and functions in the java.lang and clojure.core packages, so you can reference them without namespace qualification.

Querying with fulltext search

Datomic supports fulltext searching. When you define an attribute of string value, you can indicate whether it should be indexed for fulltext search. There are two fulltext indexed attributes in the sample schema: :community/name and :community/category.

You can query against a fulltext index by invoking the system-defined function fulltext. The fulltext function takes three input arguments: a database value, the attribute whose values you want to search (it must be fulltext indexed) and the string you are looking for. The fulltext function returns a collection of tuples of entities and values where the entity has a value for the specified attribute that contains the specified search string.

This query uses fulltext to find communities whose name includes the string "Wallingford". The $ is used to indicate that the single database input source should be used for the query. This works even without a :in section because there is only one input source for the query.

[:find ?n
 :where
 [(fulltext $ :community/name "Wallingford") [[?e ?n]]]]

The results include all the communities with "Wallingford" in the name, there is only one.

[["KOMO Communities - Wallingford"]]

You can parameterize full text search. You can also join across fulltext results and other data. This query searches for communities of the specified :community/type and with a :community/category value containing the specified word.

[:find ?name ?cat
 :in $ ?type ?search
 :where
 [?c :community/name ?name]
 [?c :community/type ?type]
 [(fulltext $ :community/category ?search) [[?c ?cat]]]]

The actual arguments are passed to query as input sources (in order, as described above).

results = Peer.q(
  ..., // query
  conn.db(),
  ":community.type/website",
  "food");

The query results show the names of the communities and the categories that contain the word "food".

[["InBallard" "food"]
 ["Community Harvest of Southwest Seattle" "sustainable food"]]

Fulltext indexes are updated in the background, and are not guaranteed to be aware of the most recent transactions. (In practice, fulltext indexes will typically be complete up to within a few moments of the most recent transaction.)

Querying with rules

As your queries get more complex, you'll get tired of repeating the same sets of :where clauses over and over again. You can package up reusable sets of clauses into rules. Rules make :where clauses reusable, but also composable, meaning that you can bind portions of a queries logic at query time.

A rule is a named group of clauses that can be plugged into the :where section of your query. For example, here is a rule that tests whether a community is a twitter feed:

[[twitter ?c]
 [?c :community/type :community.type/twitter]]

As with transactions and queries, rules are described using data structures. A rule is a list of lists. The first list in the rule is the head. It names the rule and specifies its arguments. The rest of the lists are clauses that make up the body of the rule. In this rule, the name is "twitter", the variable ?c is an input argument, and the body is single data clause testing whether the :community/type attribute of the entity ?c has the value :community.type/twitter. This rule has no output argument - it is a predicate rule that will evaluate to true or false, indicating whether ?c matches the specified criteria.

Individual rule definitions are combined into a set of rules. A set of rules is simply another list containing some number of rule definitions:

[[[twitter ?c]
  [?c :community/type :community.type/twitter]]]

You have to do two things to use a rule set in a query. First, you have to pass the rule set as an input source and reference it in the :in section of your query using the '%' symbol. Second, you have to invoke one or more rules from the :where section of your query. You do this by adding a rule invocation clause. Rule invocations have this structure:

(rule-name rule-arg*)

A rule invocation is a list containing a rule-name and one or more arguments, either variables or constants. It's idiomatic to use parenthesis instead of square brackets to represent a rule invocation in literal form, because it makes it easier to differentiate from a data clause. However, this is not a requirement.

Here's an example that uses the rule above to find communities that are twitter feeds.

rules = "[[[twitter ?c] [?c :community/type :community.type/twitter]]]"

results = Peer.q(
  "[:find ?n :in \$ % :where [?c :community/name ?n](twitter ?c)]",
  conn.db(),
  rules)

It produces the same results as our original query for twitter feeds.

[["Discover SLU"]
 ["Fremont Universe"]
 ["Columbia Citizens"]
 ["MyWallingford"]
 ["Maple Leaf Life"]
 ["Magnolia Voice"]]

That's a pretty simple example, and doesn't really save much typing. Here's a more complex example, that saves a lot of typing. Given a community, it traverses its neighborhood and district entities, and returns the community's region.

[[[region ?c ?r]
  [?c :community/neighborhood ?n]
  [?n :neighborhood/district ?d]
  [?d :district/region ?re]
  [?re :db/ident ?r]]]

This rule makes it easy to query for communities in different regions. These queries find communities in the NE and SW regions.

[:find ?n
 :in $ %
 :where
 [?c :community/name ?n]
 (region ?c :region/ne)]

[:find ?n
 :in $ %
 :where
 [?c :community/name ?n]
 (region ?c :region/sw)]

They produce these results, respectively.

[["KOMO Communities - U-District"]
 ["Hawthorne Hills Community Website"]
 ["Maple Leaf Community Council"]

 ...]

[["Greenwood Community Council Announcements"]
 ["Genesee-Schmitz Neighborhood Council"]
 ["Nature Consortium"]

 ...]

Rules make it possible to reuse groups of :where clauses. They also make it possible to define different logical paths to the same conclusion. Here's a rule that identifies communities that are "social-media".

[[[social-media ?c]
  [?c :community/type :community.type/twitter]]
 [[social-media ?c]
  [?c :community/type :community.type/facebook-page]]]

The social-media rule has two definitions, one testing whether a community's type is :community.type/twitter and the other testing whether a community's type is :community.type/facebook-page. When a given community value is tested, the social-media rule will be true if either of the definitions is true. In other words, using rules, we can implement logical or in queries.

In all the examples above, the body of each rule is made up solely of data clauses. However, rules can contain any type of clause: data, expression or even rule invocation. This allows rules to be reused by other rules. Here's a rule set that contains the region and social-media rules described above and adds two new rules northern and southern. Each of the new rules has three definitions. They reuse the region rule to test whether a given community ?c is in one of the northern or southern regions, respectively.

[[[region ?c ?r]
  [?c :community/neighborhood ?n]
  [?n :neighborhood/district ?d]
  [?d :district/region ?re]
  [?re :db/ident ?r]]
 [[social-media ?c]
  [?c :community/type :community.type/twitter]]
 [[social-media ?c]
  [?c :community/type :community.type/facebook-page]]
 [[northern ?c] (region ?c :region/ne)]
 [[northern ?c] (region ?c :region/n)]
 [[northern ?c] (region ?c :region/nw)]
 [[southern ?c] (region ?c :region/sw)]
 [[southern ?c] (region ?c :region/s)]
 [[southern ?c] (region ?c :region/se)]]

With this rule set defined, we can write a query for social media communities in the southern regions.

[:find ?n
 :in $ %
 :where
 [?c :community/name ?n]
 (southern ?c)
 (social-media ?c)]

It produces these results.

[["Blogging Georgetown"]
 ["Columbia Citizens"]
 ["MyWallingford"]
 ["Fauntleroy Community Association"]]

Working with time

All of the query results shown in the previous two sections were based on the initial seed data we loaded into our database. The data hasn't changed since then. In this section we'll load some more data, and explain how to work with database values from different moments in time.

Time is built in

One of the key concepts in Datomic is that new facts don't replace old facts. Instead, by default, the system keeps track of all the facts, forever. This makes it possible to look at the database as it was at a certain point in time, or at the changes since a certain point in time.

When you submit a transaction to a database, Datomic keeps track of the entities, attributes and values you add or retract. It also keeps track of the transaction itself. Transactions are entities in their own right, and you can write queries to find them. The system associates one attribute with each transaction entity, :db/txInstant, which records the time the transaction was processed.

Here's a query that retrieves the times when transactions were processed by the database, represented as java.util.Date instances.

[:find ?when
 :where
 [?tx :db/txInstant ?when]]

We've only executed two transactions, but the earlier system executed a few as part of its bootstrapping process. We know, though, that our two are the most recent. The code below uses the query to retrieve transaction times, sort them into reverse chronological order, and store the most recent two as data_tx_date and schema_tx_date, when we added our data and our schema, respectively.

results = Peer.q(
  "[:find ?when :where [?tx :db/txInstant ?when]]",
  conn.db())

tx_dates = new ArrayList()
for (result in results) tx_dates.add(result.get(0))
Collections.sort(tx_dates)
Collections.reverse(tx_dates)

data_tx_date = tx_dates.get(0)
schema_tx_date = tx_dates.get(1)

Revisiting the past

Once we have the relevant transaction times, we can look at the database as of that point in time. To do this, we retrieve the current database value by calling our connection object's db method, then call the database object's asOf method, passing in the Date we're interested in. The asOf method returns another database value that is "rewound" back to the requested date.

An example will help make this clear. The code below gets the value of the database as of our schema transaction. Then it runs our very first query, which retrieves entities representing communities, and prints the size of the results. Because we're using a database value from before we ran the transaction to load seed data, the size is 0.

query = "[:find ?c :where [?c :community/name]]"

db_asOf_schema = conn.db().asOf(schema_tx_date)

println Peer.q(query, db_asOf_schema).size() // 0

If we do the same thing using the date of our seed data transaction, the query returns 150 results, because as of that moment, the seed data is there.

db_asOf_data = conn.db().asOf(data_tx_date)

println Peer.q(query, db_asOf_data).size() // 150

The asOf method allows us to look at a database value containing data changes up to a specific point in time. There is another method since that allows us to look at a database value containing data changes since a specific point in time.

The code below gets the value of the database since our schema transaction and counts the number of communities. Because we're using a database value containing changes made since we ran the transaction to load our schema - including the changes made when we loaded our seed data - the size is 150.

db_since_schema = conn.db().since(schema_tx_date)

println Peer.q(query, db_since_schema).size() // 150

If we do the same thing using the date of our seed data transaction, the query returns 0 results, because we haven't added any communities since that time.

db_since_data = conn.db().since(data_tx_date)

println Peer.q(query, db_since_data).size() // 0

While we passed specific transaction dates to asOf and since, you can pass any date. The system find the closest relevant transaction and use that as the basis for filtering.

Keeping track of data over time is a very powerful feature. However, there may be some data you don't want to keep old versions of. You can control whether old versions are kept on a per-attribute basis by adding :db/noHistory true to your attribute definition when you create your schema. If you choose not to keep history for a given attribute and you look at a database as of a time before the most recent change to a given entity's value for that attribute, you will not find any value for it.

Imagining the future

Revisiting the past is a very powerful feature. It's also possible to imagine the future. The asOf and since methods work by removing data from the current database value that we retrieved using Connection.db. You can also add data to a database value, using the with method. The result is a database value that's been modified without submitting a transaction and changing the data stored in the system. The modified database value can be used to execute queries, allowing you to perform "what if" calculations before committing to data changes.

We can explore this feature using a second seed data file provided with the sample application, "samples/seattle/seattle-data1.edn". The code below reads it into a list.

data_rdr = new FileReader("samples/seattle/seattle-data1.edn")

new_data_tx = Util.readAll(data_rdr).get(0)

Once we have this new data transaction, we can build a database value that includes it. To do that, we simply get the current database value (or one as of or since a point in time) and call with, passing in the transaction data. with returns a map that includes Connection.DB_AFTER, which is the new value of the database after the new data is added. If we execute our community counting query against it, we get 258 results

result = conn.db().with(new_data_tx)

db_if_new_data = result.get(Connection.DB_AFTER)

println Peer.q(query, db_if_new_data).size() // 258

The actual data hasn't changed yet, so if we query the current database value, we still get 150 results. We won't see a change in the current database value until we submit the new transaction. After that, querying the current database value returns 258 results. Finally, if we get another database value containing data since our first seed data transaction ran, and query for communities we get 108 results, the number added by new data transaction.

println Peer.q(query, conn.db()).size() // 150

txResult = conn.transact(new_data_tx).get()

println Peer.q(query, conn.db()).size() // 258

db_since_data = conn.db().since(data_tx_date)

println Peer.q(query, db_since_data).size() // 108

Manipulating data

Earlier in this tutorial, we walked through the process of loading a pre-existing schema definition and data. In that section, we covered skimmed over the details of adding the data we needed to exercise queries. This section covers adding, updating and deleting data in more detail. It starts with a review of transaction data structures.

Transactions

We introduced the transaction data structure earlier, but it deserves more attention. A transaction is simply a list containing lists and/or maps.

Each list a transaction contains represents either the addition or retraction of a specific entity, attribute, value tuple, or the invocation of a data function, as shown here.

[:db/add entity-id attribute value]
[:db/retract entity-id attribute value]
[data-fn args*]

There is a built-in data function called retractEntity, which takes an entity id as an argument. It retracts all the attribute values where the given id is either the entity or value, effectively retracting the entity's own data and any references to the entity as well.

[:db.fn/retractEntity entity-id]

Each map a transaction contains is equivalent to a set of one or more add operations. The map must include a :db/id key identifying the entity data is being added to. It may include any number of attribute, value pairs.

{ :db/id entity-id
  attribute value
  attribute value
  ... }

Internally, the map structure gets transformed to the list structure. Each attribute, value pair becomes a :db/add list using the entity-id value associated with the :db/id key.

The map structure is supported as a convenience when adding data. As a further convenience, the attribute keys in the map may be either keywords or strings.

Temporary ids

All the statements in a transaction require an entity id. For new entities, you must a use a temporary id. There are two ways to get them. One way is to use an id literal like this:

#db/id[partition-name value*]

where partition-name is the name of a partition in the system and the value is a negative number. Value is optional.

The other way to get a temporary id is to call Peer.tempid and pass in a partition name and, optionally, a value.

The partition name identifies the partition the entity being created will live in. Partitions group data together, providing locality of reference when executing queries across a collection of entities. In general, you want to group entities based on how you'll use them. Entities you'll often query across - like the community-related entities in our sample data - should be in the same partition to increase query performance. Different logical groups of entities should be in different partitions.

There are three partitions built into Datomic.

PartitionPurpose
:db.part/dbSystem partition, used only for attribute (and internal) entities
:db.part/txTransaction partition, used only for transaction entities
:db.part/userUser partition, for application entities

You should use :db.part/user for your own entities, or you should create one or more partitions of your own. Like everything else in Datomic, partitions are just entities. You can make your own using a transaction like this:

[{:db/id #db/id[:db.part/db],
  :db/ident :communities,
  :db.install/_partition :db.part/db}]

This transaction makes a partition entity called :communities.

When a transaction is processed, all the temporary ids are translated to actual entity ids. Temporary ids with the same partition and negative number value are mapped to the same entity id. Temporary ids are mapped to new entity ids unless you use a temporary id with an attribute defined as :db/unique :db.unique/identity, in which case the system will map your temporary id to an existing entity if one exists with the same attribute and value (update) or will make a new entity if one does not exist (insert). All further adds in the transaction that apply to that same temporary id are applied to the "upserted" entity.

Adding, updating, and retracting data

Once we have a temporary id you can add data to a database. We've looked at several transactions that do that. There's only one additional point to make: there is no requirement that we add an entire set of attributes to an entity. For instance, we could add a community simply by asserting :community/name.

[{:db/id #db/id[:db.part/user] :community/name "Easton"}]

The lack of a rigid definition of the set of attributes an entity must provide values for is a key feature of Datomic. It allows a great deal of flexibility in dealing with changing requirements as applications evolve over time.

Updating data works the same way, we just need to use a real entity id instead of a temporary id. If we don't already know the entity id we want, we can query for it using an attribute value.

[:find ?belltown_id :where [?belltown_id :community/name "belltown"]]

With the entity id, we can build a transaction that updates the specified entity.

[{:db/id belltown_id :community/category "free stuff"}]

To retract data, you need to know the specific entity id, attribute and value you want to retract. For example, we could retract the category value we just associated with the "belltown" community.

[[:db/retract belltown_id :community/category "free stuff"]]

There are a couple of important things to know about retracting data. The first is that we must specify the value of the attribute being retracted. If the value is not accurate when our transaction is processed, the transaction will abort. If that happened, we'd need to query for the new value and resubmit an updated transaction. This is not a concern if we use the :db.fn/retractEntity function to retract all of an entity's attribute values.

The other thing to know is that, because we can access database values as they existed at specific points in time, we can retrieve retracted data by looking in the past. In other words, the data isn't really gone. If we want data to really be gone after we retract it, we have to disable history for the specific attribute, as described in the previous section.

Schema and seed data, revisited

Now that we've looked at transactions in more detail, we can dig into the schema and seed data we loaded at the beginning of the tutorial and understand what they do.

Here is the literal representation of the schema transaction that the samples/seattle/seattle-schema.edn file contains.

[
 ;; community
 {:db/id #db/id[:db.part/db]
  :db/ident :community/name
  :db/valueType :db.type/string
  :db/cardinality :db.cardinality/one
  :db/fulltext true
  :db/doc "A community's name"
  :db.install/_attribute :db.part/db}

 {:db/id #db/id[:db.part/db]
  :db/ident :community/url
  :db/valueType :db.type/string
  :db/cardinality :db.cardinality/one
  :db/doc "A community's url"
  :db.install/_attribute :db.part/db}

 {:db/id #db/id[:db.part/db]
  :db/ident :community/neighborhood
  :db/valueType :db.type/ref
  :db/cardinality :db.cardinality/one
  :db/doc "A community's neighborhood"
  :db.install/_attribute :db.part/db}

 {:db/id #db/id[:db.part/db]
  :db/ident :community/category
  :db/valueType :db.type/string
  :db/cardinality :db.cardinality/many
  :db/fulltext true
  :db/doc "All community categories"
  :db.install/_attribute :db.part/db}

 {:db/id #db/id[:db.part/db]
  :db/ident :community/orgtype
  :db/valueType :db.type/ref
  :db/cardinality :db.cardinality/one
  :db/doc "A community orgtype enum value"
  :db.install/_attribute :db.part/db}

 {:db/id #db/id[:db.part/db]
  :db/ident :community/type
  :db/valueType :db.type/ref
  :db/cardinality :db.cardinality/one
  :db/doc "A community type enum value"
  :db.install/_attribute :db.part/db}

 ;; community/org-type enum values
 [:db/add #db/id[:db.part/user] :db/ident :community.orgtype/community]
 [:db/add #db/id[:db.part/user] :db/ident :community.orgtype/commercial]
 [:db/add #db/id[:db.part/user] :db/ident :community.orgtype/nonprofit]
 [:db/add #db/id[:db.part/user] :db/ident :community.orgtype/personal]

 ;; community/type enum values
 [:db/add #db/id[:db.part/user] :db/ident :community.type/email-list]
 [:db/add #db/id[:db.part/user] :db/ident :community.type/twitter]
 [:db/add #db/id[:db.part/user] :db/ident :community.type/facebook-page]
 [:db/add #db/id[:db.part/user] :db/ident :community.type/blog]
 [:db/add #db/id[:db.part/user] :db/ident :community.type/website]
 [:db/add #db/id[:db.part/user] :db/ident :community.type/wiki]
 [:db/add #db/id[:db.part/user] :db/ident :community.type/myspace]
 [:db/add #db/id[:db.part/user] :db/ident :community.type/ning]

 ;; neighborhood
 {:db/id #db/id[:db.part/db]
  :db/ident :neighborhood/name
  :db/valueType :db.type/string
  :db/cardinality :db.cardinality/one
  :db/unique :db.unique/identity
  :db/doc "A unique neighborhood name (upsertable)"
  :db.install/_attribute :db.part/db}

 {:db/id #db/id[:db.part/db]
  :db/ident :neighborhood/district
  :db/valueType :db.type/ref
  :db/cardinality :db.cardinality/one
  :db/doc "A neighborhood's district"
  :db.install/_attribute :db.part/db}

 ;; district
 {:db/id #db/id[:db.part/db]
  :db/ident :district/name
  :db/valueType :db.type/string
  :db/cardinality :db.cardinality/one
  :db/unique :db.unique/identity
  :db/doc "A unique district name (upsertable)"
  :db.install/_attribute :db.part/db}

 {:db/id #db/id[:db.part/db]
  :db/ident :district/region
  :db/valueType :db.type/ref
  :db/cardinality :db.cardinality/one
  :db/doc "A district region enum value"
  :db.install/_attribute :db.part/db}

 ;; district/region enum values
 [:db/add #db/id[:db.part/user] :db/ident :region/n]
 [:db/add #db/id[:db.part/user] :db/ident :region/ne]
 [:db/add #db/id[:db.part/user] :db/ident :region/e]
 [:db/add #db/id[:db.part/user] :db/ident :region/se]
 [:db/add #db/id[:db.part/user] :db/ident :region/s]
 [:db/add #db/id[:db.part/user] :db/ident :region/sw]
 [:db/add #db/id[:db.part/user] :db/ident :region/w]
 [:db/add #db/id[:db.part/user] :db/ident :region/nw]
 ]

Schema attributes are themselves entities, with associated attributes and values that define their characteristics. The schema transaction uses a map structure to add each attribute definition, as shown below.

{:db/id #db/id[:db.part/db]
 :db/ident :community/name
 :db/valueType :db.type/string
 :db/cardinality :db.cardinality/one
 :db/fulltext true
 :db/doc "A community's name"
 :db.install/_attribute :db.part/db}

The :db/id key specifies the entity the other attributes and values in the map should be applied to. Since the schema is creating a new entity, it uses a temporary id. You can specify a temporary id literally using #db/id[:db.part/db], which will generate a unique temporary id each time it's parsed. The argument inside the brackets specifies the partition where the new entity will be defined. Schema attributes get added to the :db.part/db partition. (Note that the literal #db/id mechanism is only appropriate for transactions you are parsing from strings or files. When you build transactions programmatically, you should use datomic.Peer.tempid instead.)

The map specifies that the new entity's identity, :db/ident is :community/name, that it's of type string, that it stores one value per entity it's applied to, and that its values are indexed for fulltext search, and that it represents "a community's name".

The final key in the map, :db.install/_attribute, describes a "back reference" from the built-in entity :db.part/db to the new entity being created. The end result is that the built-in :db.part/db entity's :db.install/attribute attribute will include this new entity in its set of values. This is required for this new entity to function as a schema attribute.

The map structure is simply shorthand for the full list structure for adding data to a database. In the full list structure, each piece of data being added is represented as a list whose first item is the keyword :db/add, followed by an entity id, an attribute name, and a value. Here's an equivalent to the previous map example, using the full list structure:

[:db/add #db/id[:db.part/db -1] :db/ident :community/name]
[:db/add #db/id[:db.part/db -1] :db/valueType :db.type/string]
[:db/add #db/id[:db.part/db -1] :db/cardinality :db.cardinality/one]
[:db/add #db/id[:db.part/db -1] :db/fulltext true]
[:db/add #db/id[:db.part/db -1] :db/doc "A community's name"]
[:db.part/db :db.install/attribute :db.part/db #db/id[:db.part/db -1]] 

The :community/orgtype, :community/type, and :district/region attributes are intended to refer to entities representing enumerated values. These entities are created by the schema, using the :db/add operation. Here are the enumerated values that :community/orgtype refers to:

;; community/org-type enum values
[:db/add #db/id[:db.part/user -10] :db/ident :community.orgtype/community]
[:db/add #db/id[:db.part/user -11] :db/ident :community.orgtype/commercial]
[:db/add #db/id[:db.part/user -12] :db/ident :community.orgtype/nonprofit]
[:db/add #db/id[:db.part/user -13] :db/ident :community.orgtype/personal]

Each operation creates a new entity by associating the system-defined :db/ident attribute with a temporary entity id and assigning it a value. The :db/ident attribute makes it possible to refer to these entities using the specified keywords.

After we loaded the sample schema, we seeded the database with data we could use for queries. The samples/seattle/seattle-data0.edn and samples/seattle/seattle-data1.edn files each contain a transaction that loads data.

For each community in the sample data set, there are three statements in the transaction, one each for its district, its neighborhood, and the community itself. Here are the first three from samples/seattle/seattle-data0.edn:

{:district/region :region/e,
 :db/id #db/id[:db.part/user -1000001],
 :district/name "East"}

{:db/id #db/id[:db.part/user -1000002],
 :neighborhood/name "Capitol Hill",
 :neighborhood/district #db/id[:db.part/user -1000001]}

{:community/category ["15th avenue residents"],
 :community/orgtype :community.orgtype/community,
 :community/type :community.type/email-list,
 :db/id #db/id[:db.part/user -1000003],
 :community/name "15th Ave Community",
 :community/url "http://groups.yahoo.com/group/15thAve_Community/",
 :community/neighborhood #db/id[:db.part/user -1000002]}

All three statements use the map structure for adding multiple attributes and values to a given entity. The map includes a :db/id key whose value is a temporary id for the new entity. Note that, in this case, the temporary id expression includes a specific negative number. Using specific numbers allows you to refer to temporary ids elsewhere in the transaction, as explained below. (The sample schema did not use specific negative numbers for temporary ids, because it never needed to refer to them from within the schema transaction.)

We want each community to be represented by its own entity, but multiple communities may be in the same neighborhood, and multiple neighborhoods may be in the same district. We could make unique neighborhood and district entities for each new community, but that would add a lot of duplicate data and make it harder to query for "all communities in a district", for example.

We avoid this problem by leveraging Datomic's "upsert" support, described in the previous section. The definitions of the :neighborhood/name and :district/name attributes specify that their values identify unique identities:

{:db/id #db/id[:db.part/db]
 :db/ident :neighborhood/name
 :db/valueType :db.type/string
 :db/cardinality :db.cardinality/one
 :db/unique :db.unique/identity
 :db/doc "A unique neighborhood name (upsertable)"
 :db.install/_attribute :db.part/db}

{:db/id #db/id[:db.part/db]
 :db/ident :district/name
 :db/valueType :db.type/string
 :db/cardinality :db.cardinality/one
 :db/unique :db.unique/identity
 :db/doc "A unique district name (upsertable)"
 :db.install/_attribute :db.part/db}

This makes adding bulk data easy. We can add a neighborhood and district for each community, but the database guarantees when we're done that there will only be one neighborhood entity and one district entity with a particular :neighborhood/name and :district/name, respectively.

In addition to adding three entities, each transaction needs to "wire them up". Specifically, the community's :community/neighborhood attribute is a reference to the neighborhood, and the neighborhood's :neighborhood/district is a reference to the district. This is done simply by using the temporary id of the target entity as the value of the reference attribute. This is the motivation for using specific temporary ids in the seed data transaction: the temporary ids can be used as the values for attributes of reference type, within the same transaction (before the entities actually exist in the database).

:neighborhood/district #db/id[:db.part/user -1000001]

:community/neighborhood #db/id[:db.part/user -1000002]

In addition to these two, there are three other attributes of reference type defined by the sample schema: :community/type, :community/orgtype and :district/region. The entities these attributes refer to are predefined by the schema. Each pre-defined entity represents a specific enumerated value.

Each of the enum entities the schema defines for this purpose has a single system-defined attribute, :db/ident, whose value is a keyword, e.g., :community.type/blog. The :db/ident attribute tells Datomic that the specified keyword can be used to refer to the entity. This enables transactions to specify an existing entity as the value of a reference attribute without knowing its actual numeric id in advance.

:district/region :region/e

:community/orgtype :community.orgtype/community

:community/type :community.type/email-list

This mechanism is the preferred way to model an enumerated type because the values are defined by the schema, not the data (as would be the case, for instance, with an attribute of type keyword).

Conclusion

This tutorial walked you through the process of creating a database, loading a schema and initial data, querying and manipulating data. You now know enough about Datomic to explore on your own, and to start building applications free from the bounds of traditional databases.