• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# heapprofd: Sampling for Memory Profiles
2
3_tomlewin, fmayer **·** 2021-04-14_
4
5## Background
6
7A heap profiler associates memory allocations with the callstacks on which they
8happen ([example visualization]). It is prohibitively expensive to handle every
9allocation done by a program, so the [Android Heap Profiler] employs a sampling
10approach that handles a statistically meaningful subset of allocations. This
11document explores the statistical properties thereof.
12
13## Conceptual motivation
14Conceptually the sampling is implemented such that each byte has some
15probability p of being sampled. In theory we can think of each byte undergoing a
16Bernoulli trial. The reason we use a random sampling approach, as opposed to
17taking every nth byte, is that there may be regular allocation patterns in the
18code that may be missed by a correspondingly regular sampling.
19
20To scale the sampled bytes to the correct scale, we multiply by 1 / p, i.e. if
21we sample a byte with probability 10%, then each byte sampled represents 10
22bytes allocated.
23
24
25## Implementation
26
27In practice, the [algorithm] works as follows:
28
291. We look at an allocation
30
312. If the size of the allocation is large enough that there’s a greater than 99%
32   chance of it being sampled at least once, we return the size of the
33   allocation directly. This is a performance optimization.
34
353. If the size of the allocation is smaller, then we compute the number of times
36   we would draw a sample if we sampled each byte with the given sampling rate:
37
38  * In practice we do this by keeping track of the arrival time of the next
39    sample. When an allocation happens, we subtract its size from the arrival
40    time of the next sample, and check whether we have brought it below zero. We
41    then repeatedly draw from the exponential distribution (which is the
42    interarrival time of Poisson) until the arrival time is brought back
43    above 0. The amount of times we had to draw from the exponential
44    distribution is the number of samples the allocation should count as.
45
46  * We multiply the number of samples we drew within the allocation by the
47    sampling rate to get an estimate of the size of the allocation
48
49We instead draw directly from the Poisson/binomial distribution to see how many
50samples we get for a given allocation size, but this is not as computationally
51efficient. This is because we don’t sample most of the allocations, due to their
52small size and our low sampling rate. This means it’s more efficient to use the
53exponential draw approach, as for a non-sample, we only need to decrement a
54counter. Sampling from the Poisson for every allocation (rather than drawing
55from exponential for every sample) is more expensive.
56
57## Theoretical performance
58
59If we sample at some rate 1 / p, then to set p reasonably we need to know what
60our probability of sampling an allocation of any size is, as well as our
61expected error when we sample it. If we set p = 1 then we sample everything and
62have no error. Reducing the sampling rate costs us coverage and accuracy.
63
64### Sampling probabilities
65
66We will sample an allocation with probability Exponential(sampling rate) <
67allocation size. This is equivalent to the probability that we do not fail to
68sample all bytes within the allocation if we sample bytes at our sampling rate.
69
70Because the exponential distribution is memoryless, we can add together
71allocations that are the same even if they occur apart for the purposes of
72probability. This means that if we have an allocation of 1MB and then another of
731MB, the probability of us taking at least one sample is the same as the
74probability of us taking at least one sample of a contiguous 2MB allocation.
75
76We can see from the chart below that if we 16X our sampling rate from 32KiB to
77512KiB we still have a 95% chance of sampling anything above 1.5MiB. If we 64X
78it to 2048KiB we still have an 80% chance to sample anything above 3.5MiB.
79
80![](/docs/images/heapprofd-sampling/one-sample.png)
81
82### Expected errors
83We can also consider the expected error we’ll make at given allocation sizes and
84sampling rates. Like before it doesn’t matter where the allocation happens — if
85we have two allocations of the same type occurring at different times they have
86the same statistical properties as a single allocation with size equal to their
87sum. This is because the exponential distribution we use is memoryless.
88
89
90For expected error we report something like the [mean absolute percentage
91error]. This just means we see how wrong we might be in percent relative to the
92true allocation size, and then scale these results according to their
93probability of occurrence. This is an estimator that has some issues (i.e. it’s
94biased such that underestimates are more penalised) but it’s easy to reason
95about and it’s useful as a gauge of how wrong on average we might be with our
96estimates. I would recommend just reading this as analogous to a standard
97deviation.
98
99
100Charts below show both the expected error and the conditional expected error:
101the expected error assuming we have at least one sample within the allocation.
102There’s periodicity in both in line with the sampling rate used. The thing to
103take away is that, while the estimates are unbiased such that their expected
104value is equal to their real value, substantial errors are still possible.
105
106![](/docs/images/heapprofd-sampling/expected-error.png)
107
108Assuming that we take at least one sample of an allocation, what error might we
109expect? We can answer that using the following chart, which shows expected error
110conditional on at least one sample being taken. This is the expected error we
111can expect for the things that end up in our heap profile. It’s important to
112emphasise that this expected error methodology is not exact, and also that the
113underlying sampling remains unbiased — the expected value is the true value.
114This should be considered more as a measure akin to standard deviation.
115
116![](/docs/images/heapprofd-sampling/conditional-expected-error.png)
117
118## Performance Considerations
119### STL Distributions
120
121Benchmarking of the STL distributions on a Pixel 4 reinforces our approach of
122estimating the geometric distribution using an exponential distribution, as its
123performance is ~8x better ([full results]).
124
125
126Draw sample from exponential distribution with p = 1 / 32000:
127
128ARM64: 21.3ns (sd 0.066ns, negligibly small, smaller than a CPU cycle)
129
130ARM32: 33.2ns (sd 0.011ns, negligibly small, smaller than a CPU cycle)
131
132
133Draw sample from geometric distribution with p = 1 / 32000:
134
135ARM64: 169ns (sd 0.023ns, negligibly small, smaller than a CPU cycle) (7.93x)
136
137ARM32: 222ns (sd 0.118ns, negligibly small, smaller than a CPU cycle) (6.69x)
138
139## Appendix
140
141### Improvements made
142
143We previously (before Android 12) returned the size of the allocation accurately
144and immediately if the allocation size was greater than our sampling rate. This
145had several impacts.
146
147
148The most obvious impact is that with the old approach we would expect to sample
149an allocation equal in size to our sampling rate ~63% of the time, rather than
150100%. This means there is a discontinuity in coverage between an allocation a
151byte smaller than our sampling rate, and one a byte larger. This is still
152unbiased from a statistical perspective, but it’s important to note.
153
154
155Another unusual impact is that the sampling rate depends not only on the size of
156the memory being used, but also how it’s allocated. If our sampling rate were
15710KB, and we have an allocation that’s 10KB, we sample it with certainty. If
158instead that 10KB is split among two 5KB allocations, we sample it with
159probability 63%. Again this doesn’t bias our results, but it means that our
160measurements of memory where there are many small allocations might be noisier
161than ones where there are a few large allocations, even if total memory used is
162the same. If we didn’t return allocations greater than the sampling rate every
163time, then the memorylessness property of the exponential distribution would
164mean our method is insensitive to how the memory is split amongst allocations,
165which seems a desirable property.
166
167
168We altered the cutoff at which we simply returned the allocation size.
169Previously, as discussed, the cutoff was at the sampling rate, which led to a
170discontinuity. Now the cutoff is determined by the size at which we have a >99%
171chance of sampling an allocation at least once, given the sampling rate we’re
172using. This resolves the above issues.
173
174[algorithm]:
175  https://cs.android.com/android/platform/superproject/+/master:external/perfetto/src/profiling/memory/sampler.h
176[example visualization]: /docs/images/native-heap-prof.png
177[Android Heap Profiler]: /docs/design-docs/heapprofd-design
178[mean absolute percentage error]: https://en.wikipedia.org/wiki/Mean_absolute_percentage_error
179[full results]: https://gist.github.com/fmayer/3aafcaf58f8db09714ba09aa4ac2b1ac
180