1# SDK Metrics System 2## Concepts 3### Metric 4* A measure of some aspect of the SDK. Examples include request latency, number 5 of pooled connections and retries executed. 6 7* A metric is associated to a category. Some of the metric categories are 8 `Default`, `HttpClient` and `Streaming`. This enables customers to enable 9 metrics only for categories they are interested in. 10 11Refer to the [Metrics List](./MetricsList.md) document for a complete list of 12standard metrics collected by the SDK. 13 14### Metric Collector 15 16* `MetricCollector` is a typesafe aggregator of metrics. This is the primary 17 interface through which other SDK components report metrics they emit, using 18 the `reportMetric(SdkMetric,Object)` method. 19 20* `MetricCollector` objects allow for nesting. This enables metrics to be 21 collected in the context of other metric events. For example, for a single 22 API call, there may be multiple request attempts if there are retries. Each 23 attempt's associated metric events can be stored in their own 24 `MetricCollector`, all of which are children of another collector that 25 represents metrics for the entire API call. 26 27 A child of a collector is created by calling its `childCollector(String)` 28 method. 29 30* The `collect()` method returns a `MetricCollection`. This class essentially 31 returns an immutable version of the tree formed by the collector and its 32 children, which are also represented by `MetricCollection` objects. 33 34 Note that calling `collect()` implies that child collectors are also 35 collected. 36 37* Each collector has a name. Often this is will be used to describe the class of 38 metrics that it collects; e.g. `"ApiCall"` and `"ApiCallAttempt"`. 39 40* [Interface prototype](prototype/MetricCollector.java) 41 42### MetricPublisher 43 44* A `MetricPublisher` publishes collected metrics to a system(s) outside of the 45 SDK. It takes a `MetricCollection` object, potentially transforms the data 46 into richer metrics, and also into a format the receiver expects. 47 48* By default, the SDK will provide implementations to publish metrics to [Amazon 49 CloudWatch](https://aws.amazon.com/cloudwatch/) and [Client Side 50 Monitoring](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/sdk-metrics.html) 51 (also known as AWS SDK Metrics for Enterprise Support). 52 53* Metrics publishers are pluggable within the SDK, allowing customers to 54 provide their own custom implementations. 55 56* Metric publishers can have different behaviors in terms of list of metrics to 57 publish, publishing frequency, configuration needed to publish etc. 58 59* [Interface prototype](prototype/MetricPublisher.java) 60 61## Enabling Metrics 62 63The metrics feature is disabled by default. Metrics can be enabled and configured in the following ways: 64 65### Option 1: Configuring MetricPublishers on a request 66 67A publisher can be configured directly on the `RequestOverrideConfiguration`: 68 69```java 70MetricPublisher metricPublisher = CloudWatchMetricPublisher.create(); 71DynamoDbClient dynamoDb = DynamoDbClient.create(); 72dynamoDb.listTables(ListTablesRequest.builder() 73 .overrideConfiguration(c -> c.addMetricPublisher(metricPublisher)) 74 .build()); 75``` 76 77The methods exposed for setting metric publishers follow the pattern established by `ExecutionInterceptor`s: 78 79```java 80class RequestOverrideConfiguration { 81 // ... 82 class Builder { 83 // ... 84 Builder metricPublishers(List<MetricPublisher> metricsPublishers); 85 Builder addMetricPublisher(MetricPublisher metricsPublisher); 86 } 87} 88``` 89 90### Option 2: Configuring MetricPublishers on a client 91 92A publisher can be configured directly on the `ClientOverrideConfiguration`. A publisher specified in this way is used 93with lower priority than **Option 1** above. 94 95```java 96MetricPublisher metricPublisher = CloudWatchMetricPublisher.create(); 97DynamoDbClient dynamoDb = DynamoDbClient.builder() 98 .overrideConfiguration(c -> c.addMetricPublisher(metricPublisher)) 99 .build(); 100``` 101 102The methods exposed for setting metric publishers follow the pattern established by `ExecutionInterceptor`s: 103 104```java 105class ClientOverrideConfiguration { 106 // ... 107 class Builder { 108 // ... 109 Builder metricPublishers(List<MetricPublisher> metricsPublishers); 110 Builder addMetricPublisher(MetricPublisher metricsPublisher); 111 } 112} 113``` 114 115**Note:** As with the `httpClient` setting, calling `close()` on the `DynamoDbClient` *will not* close the configured 116`metricPublishers`. You must close the `metricPublishers` yourself when you're done using them. 117 118### Option 3: Configuring MetricPublishers using System Properties or Environment Variables 119 120This option allows the customer to enable metric publishing by default, without needing to enable it via **Option 1** 121or **Option 2** above. This means that a customer can enable metrics without needing to make a change to their runtime 122code. 123 124This option is enabled using an environment variable or system property. If both are specified, the system property 125will be used. If metrics are enabled at the client level using **Option 2** above, this option is ignored. Overriding 126the metric publisher at request time using **Option 1** overrides any publishers that have been enabled globally. 127 128**System Property:** `aws.metricPublishingEnabled=true` 129 130**Environment Variable:** `AWS_METRIC_PUBLISHING_ENABLED=true` 131 132The value specified must be one of `"true"` or `"false"`. Specifying any other string values will result in 133a value of `"false"` being used, and a warning being logged each time an SDK client is created. 134 135When the value is `"false"`, no metrics will be published by a client. 136 137When the value is `"true"`, metrics will be published by every client to a set of "global metric publishers". The set 138of global metric publishers is loaded automatically using the same mechanism currently used to discover HTTP 139clients. This means that including the `cloudwatch-metric-publisher` module and enabling the system property or 140environment variable above is sufficient to enable metric publishing to CloudWatch on all AWS clients. 141 142The set of "Global Metric Publishers" is static and is used for *all* AWS SDK clients instantiated by the application 143(while **Option 3** remains enabled). A JVM shutdown hook will be registered to invoke `MetricPublisher.close()` on 144every publisher (in case the publishers use non-daemon threads that would otherwise block JVM shutdown). 145 146#### Updating a MetricPublisher to work as a global metric publisher 147 148**Option 3** above references the concept of "Global Metric Publishers", which are a set of publishers that are 149discovered automatically by the SDK. This section outlines how global metric publishers are discovered and created. 150 151Each `MetricPublisher` that supports loading when **Option 3** is enabled must: 1521. Provide an `SdkMetricPublisherService` implementation. An `SdkMetricPublisherService` implementation is a class with 153a zero-arg constructor, used to instantiate a specific type of `MetricPublisher` (e.g. a 154`CloudWatchMetricPublisherService` that is a factory for `CloudWatchMetricPublisher`s). 1552. Provide a resource file: `META-INF/services/software.amazon.awssdk.metrics.SdkMetricPublisherService`. This file 156contains the list of fully-qualified `SdkMetricPublisherService` implementation class names. 157 158The `software.amazon.awssdk.metrics.SdkMetricPublisherService` interface that must be implemented by all global metric 159publisher candidates is defined as: 160 161```java 162public interface SdkMetricPublisherService { 163 MetricPublisher createMetricPublisher(); 164} 165``` 166 167**`SdkMetricPublisherService` Example** 168 169Enabling the `CloudWatchMetricPublisher` as a global metric publisher can be done by implementing the 170`SdkMetricPublisherService` interface: 171 172```java 173package software.amazon.awssdk.metrics.publishers.cloudwatch; 174 175public final class CloudWatchSdkMetricPublisherService implements SdkMetricPublisherService { 176 @Override 177 public MetricPublisher createMetricPublisher() { 178 return CloudWatchMetricPublisher.create(); 179 } 180} 181``` 182 183And creating a `META-INF/services/software.amazon.awssdk.metrics.SdkMetricPublisherService` resource file in the 184`cloudwatch-metric-publisher` module with the following contents: 185 186``` 187software.amazon.awssdk.metrics.publishers.cloudwatch.CloudWatchSdkMetricPublisherService 188``` 189 190#### Option 3 Implementation Details and Edge Cases 191 192**How the SDK loads `MetricPublisher`s when Option 3 is enabled** 193 194When a client is created with **Option 3** enabled (and **Option 2** "not specified"), the client retrieves the list of 195global metric publishers to use via a static "global metric publisher list" singleton. This singleton is initialized 196exactly once using the following process: 1971. The singleton uses `java.util.ServiceLoader` to locate all `SdkMetricPublisherService` implementations configured 198as described above. The classloader used with the service loader is chosen in the same manner as the one chosen for the 199HTTP client service loader (`software.amazon.awssdk.core.internal.http.loader.SdkServiceLoader`). That is, the first 200classloader present in the following list: (1) the classloader that loaded the SDK, (2) the current thread's classloader, 201then (3) the system classloader. 2022. The singleton creates an instance of every `SdkMetricPublisherService` located in this manner. 2033. The singleton creates an instance of each `MetricPublisher` instance using the metrics publisher services. 204 205**How Option 3 and Option 1 behave when Option 2 is "not specified"** 206 207The SDK treats **Option 3** as the default set of client-level metric publishers to be 208used when **Option 2** is "not specified". This means that if a customer: (1) enables global metric publishing using 209**Option 3**, (2) does not specify client-level publishers using **Option 2**, and (3) specifies metric publishers at 210the request level with **Option 1**, then the global metric publishers are still *instantiated* but will not be used. 211This nuance prevents the SDK from needing to consult the global metric configuration with every request. 212 213**How Option 2 is considered "not specified" for the purposes of considering Option 3** 214 215Global metric publishers (**Option 3**) are only considered for use when **Option 2** is "not specified". 216 217"Not specified" is defined to be when the customer either: (1) does not invoke 218`ClientOverrideConfiguration.Builder.addMetricPublisher()` / `ClientOverrideConfiguration.Builder.metricPublishers()`, 219or (2) invokes `ClientOverrideConfiguration.Builder.metricPublishers(null)` as the last `metricPublisher`-mutating 220action on the client override configuration builder. 221 222This definition purposefully excludes `ClientOverrideConfiguration.Builder.metricPublishers(emptyList())`. Setting 223the `metricPublishers` to an empty list is equivalent to setting the `metricPublishers` to the `NoOpMetricPublisher`. 224 225**Implementing an SdkMetricPublisherService that depends on an AWS clients** 226 227Any `MetricPublisher`s that supports creation via a `SdkMetricPublisherService` and depends on an AWS service client 228**must** disable metric publishing on those AWS service clients using **Option 2** when they are created via the 229`SdkMetricPublisherService`. This is to prevent a scenario where the global metric publisher singleton's initialization 230process depends on the global metric publishers singleton already being initialized. 231 232## Modules 233New modules are created to support metrics feature. 234 235### metrics-spi 236* Contains the metrics interfaces and default implementations that don't require other dependencies 237* This is a sub module under `core` 238* `sdk-core` has a dependency on `metrics-spi`, so customers will automatically get a dependency on this module. 239 240### metrics-publishers 241* This is a new module that contains implementations of all SDK supported publishers 242* Under this module, a new sub-module is created for each publisher (`cloudwatch-publisher`, `csm-publisher`) 243* Customers have to **explicitly add dependency** on these modules to use the sdk provided publishers 244 245## Performance 246One of the main tenets for metrics is "Enabling default metrics should have 247minimal impact on the application performance". The following design choices are 248made to ensure enabling metrics does not affect performance significantly. 249 250* When collecting metrics, a No-op metric collector is used if metrics are 251 disabled. All methods in this collector are no-op and return immediately. 252 253* Metric publisher implementations can involve network calls and impact latency 254 if done in blocking way. Therefore, all SDK publisher implementations will 255 process the metrics asynchronously to not block the request thread. 256 257* Performance tests will be written and run with each release to ensure that the 258 SDK performs well even when metrics are enabled and being collected and 259 published. 260