Query the Data

Previously: you learned how to transact domain data into the database. This tutorial expects that you have a running REPL where you have already connected to the database "hello", transacted the movie schema, and added three movies.

Previously, you transacted three sets of attributes into the database, creating three movies. What you actually created were three entities, each of which has a unique id and a collection of associated attributes. The unique ids were assigned for you when you transacted the data (for more on entity ids, see Identity and Uniqueness). When you query a Datomic database, you may be looking to retrieve attribute values, collections of attribute values, entities, or any combination thereof. The Client library offers two mechanisms for retrieving data from your database:

  • query which uses Datalog, a declarative query language
  • pull which is a declarative way to make hierarchical (and possibly nested) selections of information about entities

We'll focus on query in this tutorial. For more on Pull, see the Datomic Pull Documentation.

First, in order to issue a query against a Datomic database, you must retrieve the current database value. A database value is the state of the database at a given point in time. You can issue as many queries against that database value as you want, they will always return the same results. Retrieve the current database value and store it in a var:

user=> (def db (client/db conn))  
#'user/db
user=>

If you inspect the resulting var, you will see that it is a map containing information describing the value of the database to use:

user=> db
{:database-id "58a4af76-1184-408d-b41b-0dfdba64f983", :t 1001, :next-t 1005}
user=>

(You will learn more about t in the next section, Seeing the History of the Data)

Once you have the database value, you can issue queries against it. The query API takes two parameters:

  • an active connection
  • a map of arguments

The argument map can have a variety of optional components, but two are required for any query to execute:

  • :query which takes a map or list containing at least a :find clause and a :where clause
  • :args which takes one or more data sources to query

When you craft your Datalog query, you must provide enough of a :where clause to limit the results. In our schema, a movie is anything that has an associated :movie/title attribute (or :movie/release-year or :movie/genre, but we'll just use :movie/title for now).

To construct your query, you must create a vector with the three mandatory components listed above. Let's look at a minimal query:

user=> (def all-movies-q '[:find ?e 
                           :where [?e :movie/title]])

Here we define a var, all-movies-q, that holds our query definition, which we will later pass to query. Look at the two clauses:

  • :find – specifies what you want returned from the query. In this case, ?e is a logic variable that will be bound within the :where clause
  • :where – the :where clause consists of a list of vectors. In our case, we are only passing one clause, [?e :movie/title], which translates to "bind the id of each entity that has an attribute called :movie/title to the logic variable named ?e"

So, the whole query reads as "find me the ids of all entities which have an attribute called :movie/title".

To actually issue the query, call client/q and pass the query and the database value you captured in the first part of this tutorial. You will see something like the following:

user=> (<!! (client/q conn {:query all-movies-q :args [db]}))
[[17592186045418] [17592186045419] [17592186045420]]
user=>

You can see that there were three entities in the database with a :movie/title, which maps to the three movies you added in the last tutorial. What you received back is a collection of return values, each of which in this case is just the entity id.

Instead of finding the entity ids, perhaps you just want to find all the titles of all the movies in the database. In that case, your query would specify a :find clause with a named value (?movie-title) that you bind to an attribute (:movie/title). That would look like this:

user=> (def all-titles-q '[:find ?movie-title 
                           :where [_ :movie/title ?movie-title]])

Notice the underscore at the beginning of the clause where ?e used to be. That indicates that we are not interested in the entity id itself, just the existence (and value of) the :movie/title attribute. The query reads as "find all movie titles from any entity that has an attribute :movie/title and assign the title to a logic variable called ?movie-title".

To execute the query, pass it and the db value in as before:

user=> (<!! (client/q conn {:query all-titles-q :args [db]}))
[["Commando"] ["The Goonies"] ["Repo Man"]] 
user=> 

Great! You were able to query the attributes from a collection of all the movies in the database. Again, you received a collection of return values, each of which this time is the string title of the movie.

But what if you only want the titles of movies released in 1985? In order to issue that command, your :where statement is going to have to have two clauses, one to bind the :movie/title attribute (as before), and one to filter by :movie/release-year. Those two clauses have to be joined.

Remember in the last example, our :where clause contained an _ where we first saw ?e, because we were not interested in the entity itself, only the value of the :movie/title attribute. Now, we will re-insert ?e into both clauses, and that will serve as the join point for the two clauses. In Datalog, joins are created implicitly by the presence of the same logic variable in multiples clauses.

The new query looks like this:

user=> (def titles-from-1985 '[:find ?title 
                               :where [?e :movie/title ?title] 
                                      [?e :movie/release-year 1985]])

This query reads: "find the title of any entity that has a :movie/title attribute and whose :movie/release-year is 1985". When you issue the new query:

user=> (<!! (client/q conn {:query titles-from-1985 :args [db]}))
[["Commando"] ["The Goonies"]]
user=>

Finally, what if you want to return all the attributes for each movie released in 1985? You need a clause for each attribute you want to return and for the filter on release year, all of which should be joined through ?e. That query looks like:

user=> (def all-data-from-1985 '[:find ?title ?year ?genre 
                                 :where [?e :movie/title ?title] 
                                        [?e :movie/release-year ?year] 
                                        [?e :movie/genre ?genre] 
                                        [?e :movie/release-year 1985]])

And when you run it:

user=> (<!! (client/q conn {:query all-data-from-1985 :args [db]}))
[["The Goonies" 1985 "action/adventure"] ["Commando" 1985 "action/adventure"]]
user=> 

Notice that this time, your return value contains multiple tuples - one per movie found. The shape of the relation returned is defined by your :find specification. More information can found in the query reference documention.

There is much more to learn about Datalog, from nested queries to secondary inputs and more. You can find out about it in the Query reference.

In the final step of this tutorial, you will learn about the history of values in the database.