Datomic analytics is implemented via a Datomic connector to Trino. To begin using analytics, you will need to perform the following steps:
|Prerequisites||all Datomic Analytics Use|
|install Datomic analytics||one-time setup|
|configure Trino||one-time setup|
|configure a Datomic connection||per Datomic system/db|
|configure a Datomic metaschema||per Datomic metaschema|
|start and stop analytics||as desired|
The Datomic analytics distribution includes sample configuration and templates so that you can complete these steps in minutes. Each of these steps is explained below, along with pointers for deploying analytics for production use.
- An accessible Datomic Cloud System
- Java 11
Install Datomic Analytics
Download (1.17GB) and unzip the latest version (0.9.96) of the Datomic Presto Server. This zip file includes the PrestoSQL distribution (now named Trino, and referred to Trino from here on), the Datomic connector, and example configuration files for getting started.
Configure a Trino Cluster
Trino configuration lives under a directory that Trino commands call
etc-dir is specified when launching trino.
etc-samples directory of Datomic analytics is part of the
etc-dir configured to run a minimal Trino cluster of a
single node on your local machine. You can leave these configuration
files unchanged while exploring Trino on your local machine, and
consult the Trino docs when you are ready to plan a production
For each Datomic system, you will create a Trino catalog property
file in the
catalog subdirectory of the Trino etc-dir. A catalog file has a name of the form
<catalog> is the catalog name.
/etc-samples/catalog in the provided zip has an example catalog properties.
A Datomic catalog property file has the following entries:
|connector.name||yes||"datomic"||Do not change.|
|datomic.client.config||yes||a Datomic client connect map||Must be on one line.|
|datomic.databases||no||a vector of Datomic database names|
datomic.databases property lists the databases that will be
available via Trino. Supplying one or more databases enables JDBC Metadata.
If this property is omitted, all databases will be available and not automatically queried.
Any time you change a catalog file, you will need to stop and then start the Trino cluster for your changes to take effect.
etc-samples directory includes a catalog named
sample.properties. Before you start Trino, edit this file and set
datomic.client.config to have a valid client connect map for your system.
You can also rename the file from
sample.properties to something more descriptive of your system.
Enabling JDBC Metadata
Many analytics tools provide the ability to automatically explore all of the tables in your system. Such exploration involves issuing one (or many!) JDBC metadata queries, forcing each database in your system to be loaded.
Because these automatic queries can be so expensive, they are disabled
by default. To enable JDBC metadata queries, you must explicitly
enumerate the Datomic databases you want to query with the
datomic.databases property described above.
Metaschema files control the mapping between Datomic attributes and
SQL tables and columns. Metaschema files are
.edn files in the
datomic subdirectory of Trino's
etc-dir. Metaschema files can
have any name you find convenient, and Datomic analytics will
automatically associate metaschemas with any database that has
etc-samples directory includes a metaschema file matching the mbrainz database.
To facilitate interactive development, Datomic analytics automatically discovers changes to metaschema files within a minute of changes. You do not need to restart your Trino cluster to pick up changes to metaschema files. However, if you use an analytics tool such as metabase that scans Trino to discover schema, you may need to re-run the scan manually to pick up metaschema changes.
Start and Stop Trino
Datomic analytics can be launched with Trino's
First make scripts executable:
chmod +x bin/launcher*
To run analytics in the foreground of a shell window, navigate to the root directory of the Datomic analytics bundle and enter:
bin/launcher run --etc-dir=etc-samples
To verify that your Trino cluster is running, go to localhost:8989, enter use the username
admin and leave the password field blank. A Cluster Overview should be displayed with metrics about your cluster.
To stop Trino, you can simply interrupt or kill the process, e.g. with Ctrl-C.
Trino in Production
The sample configuration that ships with Datomic anlaytics will have you up and quering in minutes, but it only scratches the surface of things you may want to do. Trino also allows you to join across disparate data sources, and cluster for horizontal scale.
When you are ready to run Trino in production, you will want to
- tune your JVM config to take full advantage of available memory
- run Trino as a daemon or in a container
- configure authn/authz
All of these options and more are covered in the Trino deployment docs.
Now you can use the SQL CLIent.