Monitoring

Finding Datomic Resources by Tag

Datomic tags resources it creates with the following tag:

Tag NameTag ValueResources Tagged
datomic:systemSystem Nameall taggable resources

You can use this tag to locate all the resources associated with a Datomic system. From the Tag Editor:

  1. Under Regions, Enter one or more AWS regions
  2. Under Resource types, choose "All resource types"
  3. Under Tags, choose the name "datomic:system" for tag key
  4. Under the same row in Tags, enter a specific System Name for tag value, or leave the tag value blank to find resources for all Datomic Systems.

find-resources-to-tag.png

CloudWatch Metrics

Datomic cluster nodes write Cloudwatch Metrics as follows:

  • The namespace is DatomicCloud
  • The dimensions include system, cluster, and group.

In the table below

  • {CacheOp} can be Get or Put
  • {DdbOp} can be Read or Write
  • {Outcome} can be Succeeded or Failed
  • {RefOp} can be Get, Scan, Update
  • {StoreOp} can be Create, Get, Exists, Delete
MetricSubsystemUnitsDescription
AlertsLogcountnumber of alerts written to Cloudwatch logs
CacheKmsEkMapCachecount0 = miss, 1 = hit, average = hit %, KMS Encrypted Key Map
CacheKmsKeyCachecount0 = miss, 1 = hit, average = hit %, KMS Secret Key
DatomsDbcountnumber of datoms in a database, reported when indexer checks for work
Ddb{DdbOp}{Outcome}MsecStoragemseclatency for a ddb operation
Ddb{Op}RetryCountStoragemsecbackoff count for a ddb op
Ddb{Op}RetryDelayStoragemsecbackoff latency for a ddb op
Efs{CacheOp}{Outcome}MsecCachemseclatency for an EFS cache operation
EfsHitsCachecount0 = miss, 1 = hit, average = hit %
EfsRepairCachecountcount of repaired EFS cache hits
EventsLogcountnumber of events written to Cloudwatch logs
HttpEndpointAsyncTimeoutWebcountcount of requests failed by http async timeout
HttpEndpointPingsWebcountcount of healthcheck pings received from e.g. load balancer
HttpEndpointOpsPendingWebcountcount is total number of requests started
HttpEndpointThrottledWebcountrequests rejected because server too busy
ObjectCacheHitsCachecount0 = miss, 1 = hit, average = hit %
IndexClaimRootConflictIndexcountconflict attempting to claim index root, will retry
IndexJobMsecIndexmsecduration of index job
IndexJobDatomsIndexcountnumber of datoms added by an index job
IndexJobStartIndexcountstart of an index job
IndexJobWritesIndexcountnumber of segments written by an index job
IndexMemMbIndexmbtotal size of the in-memory indexes on a node
IndexWriteQueueCountIndexcountcount of index write queue
JvmFreeMb-mbfree JVM memory
KmsGenerateDataKeyLogcountcount of data keys generated
LogCatchupBatchesLogcountnumber of tx batches read to catchup the log
LogCatchupBytesLogcountnumber of tx batches read to catchup the log
LogCatchupMsecLogcountnumber of tx batches read to catchup the log
NanoImplFailedAuthnsAuthcountnumber of failed auth'n AND auth'z attempts
NextForwardCountWebcountnumber of paginated requests forwarded from one node to another
NodeDbCount-countnumber of databases being server by node
Ref{RefOp}MsecRefmseclatency for a ref op
S3{StoreOp}{Outcome}MsecStoragemseclatency for a store op
S3{StoreOp}{Outcome}Storagecountnumber of times current S3 Op has failed, including retries
S3{StoreOp}RetrySucceeddedStoragecountS3 op succeeded on a retry
S3StoreCreateBytesStoragebytesnumber of bytes written to S3
TxBatchCountTxcountnumber of txes batched into a storage write
TxBatchBytesTxbytesnumber of bytes batched into a storage write
TxBatchStoreValTxcountnumber of large tx batches written as vals
Tx{Outcome}MsecTxmsectime from transaction dequeued to notifications enqueued
TxBatchProcessedMsecTxmsectime from first tx dequeued to all notitications enqueued
TxCacheGetMsecTxmsectime to retrieve recent tx from the cache
TxCacheHitsTxcount0 = miss, 1 = hit, average = hit %
TxForwardCountWebcountnumber of tx requests forwarded from one node to another
Valcache{CacheOp}{Outcome}MsecCachemseclatency for a memcached cache operation
ValcacheHitsCachecount0 = miss, 1 = hit, average = hit %
ValcacheRepairCachecountcount of repaired memcached cache hits

Deprecated Metrics

These metrics are no longer used, but may appear in the UI until they age out.

MetricSubsystemUnitsDescription
IndexMemBytesIndexbytessize of memory portion of the index
IndexWritesIndexcountnumber of segments written by an index job

Cloudwatch Logs

Datomic Compute Resources write Cloudwatch Logs as follows:

  • The "Log Group" name is the name of the Compute Resources
  • Each EC instance will create a log stream named by the convention {system}-{group}-{instance-id}-{timestamp}
SubsystemMsg KeyDetails
Cache"Create object cache"size of object cache
Index"Starting indexer"indexer settings
Index"Root map loaded"contents of an index root
Node"Active database ids"db ids being served by this node
Node"Cluster node starting"includes all configuration for the cluster
{various}"Daemon thread started"thread name identifies subsystem and db
{various}"Daamon thread completed"thread name identifies subsystem and db
<