«

Setting up Storage Services

This document walks through the process of provisioning a storage service for use with Datomic Pro.

Storage Services

Storage service options are listed here:

The steps required to provision them and detailed instructions are included below (links provided here for convenience).

Note that you can move an application from one storage service to another simply by switching the connection string used by peers and the properties file used to start the transactor. All are fully API-compatible.

All of the script commands described in this document must be executed from the root directory of the Datomic distribution.

Storage Client Dependencies

Datomic depends on various storage client libraries. The recommended versions of all storage client libraries are in the provided scope of the Datomic pom.xml file:

mvn dependency:list -DincludeScope=provided

These jars are included in the Datomic distribution /lib, and are used by the transactor to access the storage systems. Peers will need the jars corresponding to the storage system on their classpath as well.

Dev Mode

The dev storage protocol is intended for development. It runs an embedded JDBC server inside the transactor, and uses local disk files for storage.

noslide

bin/transactor config/samples/dev-transactor-template.properties

By default, the embedded storage runs with default passwords and is accessible only by other processes on the same machine. This configuration is intended for interactive development where application peers and the transactor are colocated.

Securing Remote Access

To allow remote peers access to embedded storage you must do three things:

  • Choose two passwords for the embedded storage
  • Set the storage-access property
  • Add a password to the connection URI used by peers

Choose Passwords

Datomic's embedded JDBC storage has two passwords: an 'admin' password used by the transactor plus a 'datomic' password used by the peers. Once you set these passwords, you are responsible for remembering them. If you lose a password, you will not be able to access your data and will need to recover from a Datomic backup.

Set the passwords by setting the following transactor properties:

storage-admin-password=
storage-datomic-password=

Set storage-access Property

To enable remote access to Datomic's embedded storage, set the following transactor property:

storage-access=remote

New transactor properties will take effect on the next transactor start.

Add Password To Connection URI

After you set storage-datomic-password, peers must include the 'datomic' password in the connection URI, e.g.

datomic:dev://my-host:4334/my-db?password=my-secret

Rotating Passwords

Once you have set the storage-datomic-password, you can rotate it as follows:

  • Set old-storage-datomic-password to the current password
  • Set storage-datomic-password to a new password
  • Restart the transactor
  • Change peer Connection URIs to use the new password

The storage-admin-password can be rotated similarly. Be careful not to lose track of passwords while performing a password rotation.

SQL Database

The steps to provision a SQL database as your storage service are:

  • Setup a SQL database, or use an existing one
  • Create SQL table (datomic_kvs)
  • Create SQL user(s), or use an existing one
  • Get JDBC connection string
  • Add JDBC driver dependencies to your project

There are scripts for doing each of the first three steps with PostgreSQL, MySQL, and Oracle in the Datomic distribution's bin/sql directory. You can run them using their respective command line or GUI admin tools.

For example, for Postgres at the command line:

noslide

psql -f bin/sql/postgres-db.sql -U postgres

psql -f bin/sql/postgres-table.sql -U postgres -d datomic

psql -f bin/sql/postgres-user.sql -U postgres -d datomic

The last script creates a user named 'datomic' with the password 'datomic'. You can use an existing user instead, or modify the script to create a user with a different username or password, if desired.

If you want to use a different SQL server, mimic the table and schema from one of the included databases.

JDBC Drivers

Only the Postgres driver is included with the transactor distribution. For other SQL distributions, follow the steps below:

  • Make the driver available on the classpath of the transactor by placing it in <datomic-install>/lib.
  • In your peer project, add a dependency for your specific JDBC driver in all SQL distributions. The example below shows how that looks for PostgreSQL.
  • In a Maven-based build, add the following snippet to the dependencies section of your pom.xml:

noslide

<dependency>
  <groupId>org.postgresql</groupId>
  <artifactId>postgresql</artifactId>
  <version>42.5.1</version>
</dependency>
  • In a Leiningen project, add the following to the dependencies section of your project.clj in the collection under :dependencies key:
[org.postgresql/postgresql "42.5.1"]

Validation Query

Datomic uses a query to validate JDBC connections. By default, this query is:

noslide

select 1 from dual

For Oracle JDBC URIs, the query is:

noslide

select 1

This validation query can be overridden via the datomic.sqlValidationQuery system property.

Heroku PostgreSQL Database

You can provision a Heroku-hosted PostgreSQL database for use as your storage service using the following steps:

  • Sign up for Heroku PostgreSQL.
  • Start a database and retrieve the host, port, dbname, username, and password from the web console.
  • Use PGAdmin to install the Datomic table schema by:
    • Adding a server using the connection information retrieved in the previous step and specifying the dbname for the MaintenanceDB (in place of 'postgres') as well as the actual database.
    • Open a SQL window on that server and paste it into the bin/sql/postgres-table script.
    • Edit the owner and grant to be the user Heroku provides and remove the public grant.
    • Run the script to create the datomic_kvs table.

DynamoDB

Datomic is not compatible with DynamoDB's Global Tables feature.

The steps to provision DynamoDB as your storage service are:

  • Set up an AWS account, or use an existing one.
  • Create a DynamoDB table, IAM roles, S3 bucket, permissions, etc. Note that this process can be automated using Datomic's ensure-transactor command.
  • Add the AWS DynamoDB dependency to your project.

Datomic supports IAM roles. If you have a transactor that uses IAM user access keys, see Migrating to IAM Roles for information on how to switch to using roles. Check AWS Access Control for information about Datomic's use of IAM roles and IAM user access credentials in general.

If you don't already have an AWS account, sign up using the Amazon Web Services Console.

DynamoDB AWS Peer Dependency

There is an example in the provided scope of the pom.xml in the Datomic directory for including the DynamoDB portion of the AWS Java SDK in a Maven project.

To configure deps.edn with this dependency, include the following in your dependencies list:

noslide

com.amazonaws/aws-java-sdk-dynamodb {:mvn/version "1.12.564"}

DDB Automated Setup

There is an automated process to create the necessary DynamoDB table, IAM roles, S3 bucket, permissions etc. To invoke it:

  • Copy the config/samples/ddb-transactor-template.properties file to another location and give it a new name, for instance, my-transactor.properties. The template will work with the default settings, but you can edit your copy if you want to specify something explicitly.
  • Export your AWS account's credentials as environment variables so that they're accessible to scripts:

noslide

export AWS_ACCESS_KEY_ID=<aws-access-key-id>

export AWS_SECRET_KEY=<aws-secret-key>
  • Run the ensure-transactor command, specifying your copy of the properties file as the input and another filename as the output (the names can be the same):

noslide

bin/datomic ensure-transactor my-transactor.properties my-transactor.properties

The ensure-transactor command will create the necessary AWS constructs, and emit an updated properties file containing information about what it created.

  • If errors are reported, fix them, then repeat the process.
  • Check the final output file in source control.

DDB Manual Setup

If you are unable to run the ensure-transactor command or wish to customize your AWS configuration, manually configure your AWS setup to match the settings that will generate automatically:

Table Configuration

The DynamoDB table schema requires an "id" attribute of type string as the primary key with 'Keytype' "HASH":

{
  "Table":
    {
      "AttributeDefinitions": [
          {"AttributeName": "id",
            "AttributeType": "S"}
      ],

    ...

     "KeySchema": [
       {"KeyType": "HASH",
        "AttributeName": "id"}
      ]

    ...
    }
}

No indexing should be enabled.

IAM Role Configuration

Configure the IAM roles for the transactor and peers.

Transactor Role

It is required to apply a policy that grants the transactor role access to the Dynamo table you've created. Optionally, apply two other policies to the transactor role to enable log rotation to S3 and metric reporting.

DynamoDB Table Access (Storage Access)

This policy is required for the transactor to run.

Replace <AWS-ACCOUNT-ID> and <TABLE-NAME>:

noslide

{"Statement":
 [{"Effect": "Allow",
   "Action": ["dynamodb:*"],
   "Resource": "arn:aws:dynamodb:*:<AWS-ACCOUNT-ID>:table/<TABLE-NAME>"}]}

DDB S3 Bucket Access (Log Rotation)

This policy is optional - it enables S3 log rotation. The policy below includes access to subfolders within the S3 bucket. Replace <S3-LOG-BUCKET> with your created S3 log bucket address:

noslide

{"Statement":
 [{"Effect": "Allow",
   "Action": ["s3:PutObject"],
   "Resource": ["arn:aws:s3:::<S3-LOG-BUCKET>", "arn:aws:s3:::<S3-LOG-BUCKET>/*"]}]}

CloudWatch Metrics

This policy is optional - it enables the transactor to report metrics to CloudWatch.

noslide

{"Statement":
 [{"Effect":"Allow",
   "Resource":"*",
   "Condition":{"Bool":{"aws:SecureTransport":"true"}},
   "Action": ["cloudwatch:PutMetricData", "cloudwatch:PutMetricDataBatch"]}]}

DDB Peer Role

It's necessary to grant the peer application a few read-access permissions on the DynamoDB table. Use the following policy to create an IAM role that grants the Peer a few read-access permissions on the DynamoDB table. Replace <AWS-ACCOUNT-ID> and <TABLE-NAME>:

{"Statement":
 [{"Effect":"Allow",
   "Action": ["dynamodb:GetItem", "dynamodb:BatchGetItem", "dynamodb:Scan", "dynamodb:Query"],
   "Resource":
   "arn:aws:dynamodb:*:<AWS-ACCOUNT-ID>:table/<TABLE-NAME>"}]}

IAM User Keys

If you're using IAM user keys for the peer or transactor instead of roles, apply the above policies to the user.

DynamoDB Transactor Properties

  • Configure the DynamoDB transactor properties file to the hostname being manually entered.
  • Provide transactor and peer roles, as well as the name of the Dynamo table. Start from the config/samples/ddb-transactor-template.properties file in the Datomic distribution and insert values for the following from above, plus whatever other settings are necessary for your configuration:

noslide

host=<FINAL-HOST-NAME>
# e.g., ec2-ip-with-hyphens.compute-1.amazonaws.com
...
aws-dynamodb-table=<TABLE-NAME>
...
aws-transactor-role=<TRANSACTOR-IAM-ROLE-NAME>
...
aws-peer-role=<PEER-IAM-ROLE-NAME>

Other DDB Configuration

If required, for communication between peers and transactors on AWS, configure security groups as well.

Refer to the AWS docs for all other AWS configuration that is not directly related to using DynamoDB as a storage.

There is a datomic.ddbRequestTimeout system property with a default of 1000 milliseconds. In a high volume, low latency system, you can turn this down to recover more quickly when DDB is poorly behaved.

Check also Dynamo Client Configuration for more details.

Provisioning DynamoDB Throughput

By default, the ensure process creates a DynamoDB table with a modest provisioning level suitable for experimentation under low load. When you are ready to plan production use, review the capacity planning docs and use the DynamoDB console to provision more throughput.

Every application will be different, but these three rules of thumb will get you started provisioning DynamoDB for Datomic:

  • Budget at least as much for DynamoDB throughput as you

do for transactor EC2 instances. Split this budget evenly between read and write capacity. This will result in higher read capacity, as read capacity is less expensive.

  • Measure your actual use. DynamoDB has great CloudWatch metrics that

will tell you how much capacity you are actually using, and you can configure warnings when your application is throttled to its provisioned limit.

  • For bulk import jobs, turn write throughput up as high as necessary

(based on monitoring), and then turn it back down when you are done.

Once you have made an initial selection for DynamoDB throughput, you are ready to move on to transactor setup.

Cassandra

Using Cassandra as a storage service requires a cluster of at least 3 nodes running an up-to-date release of Cassandra. Datomic supports 2.0.X, 2.1.X, and 3.0.X versions of Cassandra. Refer to the version of the Cassandra driver in the provided scope in the pom.xml in the Datomic directory and the Datastax Java Driver Compatibility matrix to verify Datastax/Cassandra compatibility for a specific release of Datomic. The cluster needs to be configured to support the CQL native transport (this is on by default in new releases of Cassandra). If you have an existing cluster that meets these requirements, you can use it. Otherwise, you need to configure one following the instructions on the Cassandra site.

Note that Datomic does not support running on a Cassandra cluster that spans datacenters.

While Cassandra replica factor settings are entirely up to you, is highly recommended to always have a minimum of three nodes in the cluster size, a replica factor of 3, and utilize the LOCAL_QUORUM consistency level for all reads and writes.

Once a cluster is configured, you must provision a keyspace and table (column family) for Datomic to use. You can do this using the CQL scripts provided in the distribution's bin/cql directory. You can execute the scripts using the cqlsh tool provided by Cassandra (use an appropriate superuser username and password, if required):

noslide

cqlsh -f bin/cql/cassandra-keyspace.cql -u cassandra -p cassandra

noslide

cqlsh -f bin/cql/cassandra-table.cql -u cassandra -p cassandra

Datomic provides optional support for Cassandra's internal username/password mechanism for authentication and authorization. If your cluster is configured to require authentication and authorization, you must also create a user for Datomic:

noslide

cqlsh -f bin/cql/cassandra-user.cql -u cassandra -p cassandra

This script creates the user and grants it access to the Datomic keyspace.

Note that if you are using Cassandra 3.X the cqlsh tool may require you to explicitly specify the cql version with the command line flag:

noslide

--cqlversion=3.4.0

Next, set up your transactor properties file by following the steps below:

  • Copy the config/samples/cassandra-transactor-template.properties file to another location and give it a new name, for instance, cassandra-transactor.properties.
  • Set the entry for cassandra-host to refer to a member of the cluster.
  • If your cluster uses a non-standard port for the CQL native transport, set cassandra-port.
  • If your cluster is configured to require authentication and authorization, set cassandra-user and cassandra-password.
  • If your Cassandra cluster is configured with the necessary certificates to support SSL, and verified it works with a simple client (e.g., cqlsh), you can configure the transactor to use SSL by setting cassandra-ssl to true. See the Cassandra docs for more information.
  • You can optionally provide cassandra-cluster-callback, whose value is a name identifying either a Java static method (e.g. my.app.Cassandra.buildCluster), or a Clojure function (e.g. my.app/build-cluster). This method/function takes as its single argument a map containing the above parameters, and which returns an instance of Cassandra Cluster. Note that the argument type for this method is Object, e.g.
import com.datastax.driver.core.Cluster;
import datomic.Util;
import java.util.Map;

public class DatomicCassandra {
  public static Cluster build(Object args) {
    Map <Object, Object> params = (Map<Object,Object>) args;
    Cluster.Builder cluster = Cluster.builder();
    cluster.addContactPoint((String)params.get(Util.read(":host")));
    cluster.withPort((Integer)(params.get(Util.read(":port"))));
    cluster.withCredentials((String)params.get(Util.read(":user")),
                            (String)params.get(Util.read(":password")));

    // Set other cluster options here

    return cluster.build();
  }
}
  • In your peer project, add a dependency for the

DataStax Java Driver for Apache Cassandra.

  • In a deps project, add the following to the dependencies section of your deps.edn:

noslide

;; in collection under :dependencies key
com.datastax.cassandra/cassandra-driver-core {:mvn/version "3.1.0"}

Limitations

  • Datomic does not support Cassandra's multi-datacenter replication. See the HA section for more information on Datomic's storage level consistent copy requirements.
  • Datomic does not support quorum operations in multi-datacenter environments due to the negative impact on availability and write performance.

Cassandra2 (Alpha)

The Cassandra2 storage service provides improved support for large datasets. Cassandra2 breaks large segments into chunks, avoiding performance issues and operational problems that can occur when index segments run up against Cassandra's limits.

If you are happily using Cassandra with your existing dataset, you need not consider Cassandra2.

Cassandra2 support is alpha, please contact support if you are planning to try Cassandra2.

Cassandra2 works exactly like Cassandra, except

  • The Cassandra schema is different. Follow the same setup instructions as for Casssandra, but use the cql files with a '2' in the name.
  • The protocol URI is different. Follow the same connection instructions as for Casssandra, but use the cass2: instead of cass: in your connection URI.

Note that Cassandra and Cassandra2 are different storage services! You can use backup and restore to migrate databases from Cassandra to Cassandra2 (or between any storage services).

Running on AWS

Once you know everything is working, you may want to run your Transactor on the Amazon cloud. See Running on AWS for instructions.