Monitoring
AWS CloudWatch provides a powerful set of tools for monitoring a software system running on AWS:
- You can collect and track CloudWatch Metrics – variables that measure the behavior of your system.
- You can configure CloudWatch Alarms to notify operations or take other automated steps when potential problems arise.
- You can monitor, store, and search CloudWatch Logs across all your AWS resources.
- You can create CloudWatch Dashboards that provide a single overview for monitoring your systems.
Datomic is fully integrated with all of these AWS monitoring tools. On the producing side, Datomic creates metrics and logs; and on the consuming side, Datomic organizes metrics in custom dashboards. This document shows how to manage and monitor your Datomic system.
Topics
Finding Datomic Resources by Tag
Datomic tags resources it creates with the following tag:
Tag Name | Tag Value | Resources Tagged |
---|---|---|
datomic:system | System Name | all taggable resources |
You can use this tag to locate all the resources associated with a Datomic system. From the Tag Editor:
- Under Regions, Enter one or more AWS regions
- Under Resource types, choose "All resource types"
- Under Tags, choose the name "datomic:system" for tag key
- Under the same row in Tags, enter a specific System Name for tag value, or leave the tag value blank to find resources for all Datomic Systems.
Searching CloudWatch Logs
CLI Tool
The CLI Tools allow you to list log messages and show detail for specific messages without necessitating a trip to the AWS console.
AWS Console
If you need to view the CloudWatch Logs for a Datomic system:
- Open the CloudWatch Logs in the AWS Console
- Click on the Log Group named "datomic-{System}", where System is your system name
Each EC2 instance will create a separate log stream. Usually you will want to search across all log streams:
- click the "Search Log Group" button
- click to the right of the filter box to choose an appropriate time window
- enter a metric filter pattern to scope your search
- Finding Alerts
The search pattern "Alert - Alerts" will find the text of every alert produced by Datomic. The "Alert" matches each alert, and the "- Alerts" filters redundant summary messages.
The example below demonstrates reviewing a week's worth of alerts. The single event that matches is a transient DynamoDB request failure, so not a problem.
Finding Logs By Message
- CLI Tool
The CLI Tools allow you to list logs between a starting time
--tod
and a number of minutes in the past--minutes-back
.The
datomic log list
command can be piped to grep with the-B 1
option to be used to find specific logs for a given time period:datomic log list my-datomic-system \ -f all --tod 2019-06-24T03:41:23.491 --minutes-back 150 \ | grep -B 1 CloudWatchMetrics
The preceding command will find all logs labeled "CloudWatchMetrics" from March 6th, 2019, 1:11:23.491am until March 6th, 2019, 3:41:23.491am.
- AWS Console
The image below shows using the CloudWatch Filter and Pattern Syntax to find all logs whose
Msg
is "CodeDeployEvent".- The brackets
{}
identify the query as a JSON metric filter - the
$
is a placeholder for log entry as a whole - the dot syntax scopes the search to the individual key
Msg
- The brackets
Metrics Produced by Datomic Cloud
Datomic compute nodes publish Cloudwatch Metrics as follows:
- namespace is DatomicCloud
- dimensions are your system name and your compute stack name
Datomic reports a large number of distinct metrics, many of which are useful only for Datomic support. The most useful metrics for operators are listed below; see troubleshooting nodes for some common scenarios.
In the table below
- {DdbOp} can be Read or Write
- {Outcome} can be Succeeded or Failed
Metric | Units | Description |
---|---|---|
Alerts | count | number of alerts written to Cloudwatch logs |
Datoms | count | number of datoms in a database, reported when indexer checks for work |
Ddb{DdbOp}{Outcome}Msec | msec | latency for a ddb operation |
HttpEndpointOpsPending | count | count is total number of requests started |
HttpEndpointThrottled | count | requests rejected because server too busy |
IndexMemMb | mb | total size of the in-memory indexes on a node |
JvmFreeMb | mb | free JVM memory |
NodeDbCount | count | number of databases being served by node |
TxBatchBytes | bytes | number of bytes batched into a storage write |
TxDatoms | count | number of datoms in a transaction |
Note In order to reduce cost, the Solo Topology reports only a small subset of the metrics listed above: Alerts, Datoms, HttpEndpointOpsPending, JvmFreeMb, and HttpEndpointThrottled.
Logs Produced by Datomic Cloud
Datomic writes Cloudwatch Logs as follows:
- The "Log Group" name is the name of the Datomic System.
- Each EC instance will create a log stream named by the convention {system}-{group}-{instance-id}-{timestamp}.
Datomic Cloud Dashboards
When you launch a Datomic Cloud system, it automatically creates a
dashboard named datomic-(system)-(region)
that you can view in the
AWS Console.
The Solo Topology creates a basic dashboard:
The Production Topology creates a much larger dashboard, suitable for viewing on a large ops monitor screen:
Each dashboard widget tells a story. For example, the DynamoDB Usage widget tracks DynamoDB AutoScaling of reads (left axis) and writes (right axis). In the image below you can see DynamoDB write provisioning (the green line) scaling up for a four-hour period during a batch load, and then scaling back down to almost nothing.