1# heapprofd: Sampling for Memory Profiles 2 3_tomlewin, fmayer **·** 2021-04-14_ 4 5## Background 6 7A heap profiler associates memory allocations with the callstacks on which they 8happen ([example visualization]). It is prohibitively expensive to handle every 9allocation done by a program, so the [Android Heap Profiler] employs a sampling 10approach that handles a statistically meaningful subset of allocations. This 11document explores the statistical properties thereof. 12 13## Conceptual motivation 14Conceptually the sampling is implemented such that each byte has some 15probability p of being sampled. In theory we can think of each byte undergoing a 16Bernoulli trial. The reason we use a random sampling approach, as opposed to 17taking every nth byte, is that there may be regular allocation patterns in the 18code that may be missed by a correspondingly regular sampling. 19 20To scale the sampled bytes to the correct scale, we multiply by 1 / p, i.e. if 21we sample a byte with probability 10%, then each byte sampled represents 10 22bytes allocated. 23 24 25## Implementation 26 27In practice, the [algorithm] works as follows: 28 291. We look at an allocation 30 312. If the size of the allocation is large enough that there’s a greater than 99% 32 chance of it being sampled at least once, we return the size of the 33 allocation directly. This is a performance optimization. 34 353. If the size of the allocation is smaller, then we compute the number of times 36 we would draw a sample if we sampled each byte with the given sampling rate: 37 38 * In practice we do this by keeping track of the arrival time of the next 39 sample. When an allocation happens, we subtract its size from the arrival 40 time of the next sample, and check whether we have brought it below zero. We 41 then repeatedly draw from the exponential distribution (which is the 42 interarrival time of Poisson) until the arrival time is brought back 43 above 0. The amount of times we had to draw from the exponential 44 distribution is the number of samples the allocation should count as. 45 46 * We multiply the number of samples we drew within the allocation by the 47 sampling rate to get an estimate of the size of the allocation 48 49We instead draw directly from the Poisson/binomial distribution to see how many 50samples we get for a given allocation size, but this is not as computationally 51efficient. This is because we don’t sample most of the allocations, due to their 52small size and our low sampling rate. This means it’s more efficient to use the 53exponential draw approach, as for a non-sample, we only need to decrement a 54counter. Sampling from the Poisson for every allocation (rather than drawing 55from exponential for every sample) is more expensive. 56 57## Theoretical performance 58 59If we sample at some rate 1 / p, then to set p reasonably we need to know what 60our probability of sampling an allocation of any size is, as well as our 61expected error when we sample it. If we set p = 1 then we sample everything and 62have no error. Reducing the sampling rate costs us coverage and accuracy. 63 64### Sampling probabilities 65 66We will sample an allocation with probability Exponential(sampling rate) < 67allocation size. This is equivalent to the probability that we do not fail to 68sample all bytes within the allocation if we sample bytes at our sampling rate. 69 70Because the exponential distribution is memoryless, we can add together 71allocations that are the same even if they occur apart for the purposes of 72probability. This means that if we have an allocation of 1MB and then another of 731MB, the probability of us taking at least one sample is the same as the 74probability of us taking at least one sample of a contiguous 2MB allocation. 75 76We can see from the chart below that if we 16X our sampling rate from 32KiB to 77512KiB we still have a 95% chance of sampling anything above 1.5MiB. If we 64X 78it to 2048KiB we still have an 80% chance to sample anything above 3.5MiB. 79 80![](/docs/images/heapprofd-sampling/one-sample.png) 81 82### Expected errors 83We can also consider the expected error we’ll make at given allocation sizes and 84sampling rates. Like before it doesn’t matter where the allocation happens — if 85we have two allocations of the same type occurring at different times they have 86the same statistical properties as a single allocation with size equal to their 87sum. This is because the exponential distribution we use is memoryless. 88 89 90For expected error we report something like the [mean absolute percentage 91error]. This just means we see how wrong we might be in percent relative to the 92true allocation size, and then scale these results according to their 93probability of occurrence. This is an estimator that has some issues (i.e. it’s 94biased such that underestimates are more penalised) but it’s easy to reason 95about and it’s useful as a gauge of how wrong on average we might be with our 96estimates. I would recommend just reading this as analogous to a standard 97deviation. 98 99 100Charts below show both the expected error and the conditional expected error: 101the expected error assuming we have at least one sample within the allocation. 102There’s periodicity in both in line with the sampling rate used. The thing to 103take away is that, while the estimates are unbiased such that their expected 104value is equal to their real value, substantial errors are still possible. 105 106![](/docs/images/heapprofd-sampling/expected-error.png) 107 108Assuming that we take at least one sample of an allocation, what error might we 109expect? We can answer that using the following chart, which shows expected error 110conditional on at least one sample being taken. This is the expected error we 111can expect for the things that end up in our heap profile. It’s important to 112emphasise that this expected error methodology is not exact, and also that the 113underlying sampling remains unbiased — the expected value is the true value. 114This should be considered more as a measure akin to standard deviation. 115 116![](/docs/images/heapprofd-sampling/conditional-expected-error.png) 117 118## Performance Considerations 119### STL Distributions 120 121Benchmarking of the STL distributions on a Pixel 4 reinforces our approach of 122estimating the geometric distribution using an exponential distribution, as its 123performance is ~8x better ([full results]). 124 125 126Draw sample from exponential distribution with p = 1 / 32000: 127 128ARM64: 21.3ns (sd 0.066ns, negligibly small, smaller than a CPU cycle) 129 130ARM32: 33.2ns (sd 0.011ns, negligibly small, smaller than a CPU cycle) 131 132 133Draw sample from geometric distribution with p = 1 / 32000: 134 135ARM64: 169ns (sd 0.023ns, negligibly small, smaller than a CPU cycle) (7.93x) 136 137ARM32: 222ns (sd 0.118ns, negligibly small, smaller than a CPU cycle) (6.69x) 138 139## Appendix 140 141### Improvements made 142 143We previously (before Android 12) returned the size of the allocation accurately 144and immediately if the allocation size was greater than our sampling rate. This 145had several impacts. 146 147 148The most obvious impact is that with the old approach we would expect to sample 149an allocation equal in size to our sampling rate ~63% of the time, rather than 150100%. This means there is a discontinuity in coverage between an allocation a 151byte smaller than our sampling rate, and one a byte larger. This is still 152unbiased from a statistical perspective, but it’s important to note. 153 154 155Another unusual impact is that the sampling rate depends not only on the size of 156the memory being used, but also how it’s allocated. If our sampling rate were 15710KB, and we have an allocation that’s 10KB, we sample it with certainty. If 158instead that 10KB is split among two 5KB allocations, we sample it with 159probability 63%. Again this doesn’t bias our results, but it means that our 160measurements of memory where there are many small allocations might be noisier 161than ones where there are a few large allocations, even if total memory used is 162the same. If we didn’t return allocations greater than the sampling rate every 163time, then the memorylessness property of the exponential distribution would 164mean our method is insensitive to how the memory is split amongst allocations, 165which seems a desirable property. 166 167 168We altered the cutoff at which we simply returned the allocation size. 169Previously, as discussed, the cutoff was at the sampling rate, which led to a 170discontinuity. Now the cutoff is determined by the size at which we have a >99% 171chance of sampling an allocation at least once, given the sampling rate we’re 172using. This resolves the above issues. 173 174[algorithm]: 175 https://cs.android.com/android/platform/superproject/+/master:external/perfetto/src/profiling/memory/sampler.h 176[example visualization]: /docs/images/native-heap-prof.png 177[Android Heap Profiler]: /docs/design-docs/heapprofd-design 178[mean absolute percentage error]: https://en.wikipedia.org/wiki/Mean_absolute_percentage_error 179[full results]: https://gist.github.com/fmayer/3aafcaf58f8db09714ba09aa4ac2b1ac 180