1# View the profile 2 3[TOC] 4 5## Introduction 6 7After using `simpleperf record` or `app_profiler.py`, we get a profile data file. The file contains 8a list of samples. Each sample has a timestamp, a thread id, a callstack, events (like cpu-cycles 9or cpu-clock) used in this sample, etc. We have many choices for viewing the profile. We can show 10samples in chronological order, or show aggregated flamegraphs. We can show reports in text format, 11or in some interactive UIs. 12 13Below shows some recommended UIs to view the profile. Google developers can find more examples in 14[go/gmm-profiling](go/gmm-profiling?polyglot=linux-workstation#viewing-the-profile). 15 16 17## Continuous PProf UI (great flamegraph UI, but only available internally) 18 19[PProf](https://github.com/google/pprof) is a mature profiling technology used extensively on 20Google servers, with a powerful flamegraph UI, with strong drilldown, search, pivot, profile diff, 21and graph visualisation. 22 23 24 25We can use `pprof_proto_generator.py` to convert profiles into pprof.profile protobufs for use in 26pprof. 27 28``` 29# Output all threads, broken down by threadpool. 30./pprof_proto_generator.py 31 32# Use proguard mapping. 33./pprof_proto_generator.py --proguard-mapping-file proguard.map 34 35# Just the main (UI) thread (query by thread name): 36./pprof_proto_generator.py --comm com.example.android.displayingbitmaps 37``` 38 39This will print some debug logs about Failed to read symbols: this is usually OK, unless those 40symbols are hotspots. 41 42Upload pprof.profile to http://pprof/ UI: 43 44``` 45# Upload all threads in profile, grouped by threadpool. 46# This is usually a good default, combining threads with similar names. 47pprof --flame --tagroot threadpool pprof.profile 48 49# Upload all threads in profile, grouped by individual thread name. 50pprof --flame --tagroot thread pprof.profile 51 52# Upload all threads in profile, without grouping by thread. 53pprof --flame pprof.profile 54This will output a URL, example: https://pprof.corp.google.com/?id=589a60852306144c880e36429e10b166 55``` 56 57## Firefox Profiler (great chronological UI) 58 59We can view Android profiles using Firefox Profiler: https://profiler.firefox.com/. This does not 60require Firefox installation -- Firefox Profiler is just a website, you can open it in any browser. 61 62 63 64Firefox Profiler has a great chronological view, as it doesn't pre-aggregate similar stack traces 65like pprof does. 66 67We can use `gecko_profile_generator.py` to convert raw perf.data files into a Firefox Profile, with 68Proguard deobfuscation. 69 70``` 71# Create Gecko Profile 72./gecko_profile_generator.py | gzip > gecko_profile.json.gz 73 74# Create Gecko Profile using Proguard map 75./gecko_profile_generator.py --proguard-mapping-file proguard.map | gzip > gecko_profile.json.gz 76``` 77 78Then drag-and-drop gecko_profile.json.gz into https://profiler.firefox.com/. 79 80Firefox Profiler supports: 81 821. Aggregated Flamegraphs 832. Chronological Stackcharts 84 85And allows filtering by: 86 871. Individual threads 882. Multiple threads (Ctrl+Click thread names to select many) 893. Timeline period 904. Stack frame text search 91 92## FlameScope (great jank-finding UI) 93 94[Netflix's FlameScope](https://github.com/Netflix/flamescope) is a rough, proof-of-concept UI that 95lets you spot repeating patterns of work by laying out the profile as a subsecond heatmap. 96 97Below, each vertical stripe is one second, and each cell is 10ms. Redder cells have more samples. 98See https://www.brendangregg.com/blog/2018-11-08/flamescope-pattern-recognition.html for how to 99spot patterns. 100 101This is an example of a 60s DisplayBitmaps app Startup Profile. 102 103 104 105You can see: 106 107 The thick red vertical line on the left is startup. 108 The long white vertical sections on the left shows the app is mostly idle, waiting for commands 109 from instrumented tests. 110 Then we see periodically red blocks, which shows the app is periodically busy handling commands 111 from instrumented tests. 112 113Click the start and end cells of a duration: 114 115 116 117To see a flamegraph for that duration: 118 119 120 121Install and run Flamescope: 122 123``` 124git clone https://github.com/Netflix/flamescope ~/flamescope 125cd ~/flamescope 126pip install -r requirements.txt 127npm install 128npm run webpack 129python3 run.py 130``` 131 132Then open FlameScope in-browser: http://localhost:5000/. 133 134FlameScope can read gzipped perf script format profiles. Convert simpleperf perf.data to this 135format with `report_sample.py`, and place it in Flamescope's examples directory: 136 137``` 138# Create `Linux perf script` format profile. 139report_sample.py | gzip > ~/flamescope/examples/my_simpleperf_profile.gz 140 141# Create `Linux perf script` format profile using Proguard map. 142report_sample.py \ 143 --proguard-mapping-file proguard.map \ 144 | gzip > ~/flamescope/examples/my_simpleperf_profile.gz 145``` 146 147Open the profile "as Linux Perf", and click start and end sections to get a flamegraph of that 148timespan. 149 150To investigate UI Thread Jank, filter to UI thread samples only: 151 152``` 153report_sample.py \ 154 --comm com.example.android.displayingbitmaps \ # UI Thread 155 | gzip > ~/flamescope/examples/uithread.gz 156``` 157 158Once you've identified the timespan of interest, consider also zooming into that section with 159Firefox Profiler, which has a more powerful flamegraph viewer. 160 161## Differential FlameGraph 162 163See Brendan Gregg's [Differential Flame Graphs](https://www.brendangregg.com/blog/2014-11-09/differential-flame-graphs.html) blog. 164 165Use Simpleperf's `stackcollapse.py` to convert perf.data to Folded Stacks format for the FlameGraph 166toolkit. 167 168Consider diffing both directions: After minus Before, and Before minus After. 169 170If you've recorded before and after your optimisation as perf_before.data and perf_after.data, and 171you're only interested in the UI thread: 172 173``` 174# Generate before and after folded stacks from perf.data files 175./stackcollapse.py --kernel --jit -i perf_before.data \ 176 --proguard-mapping-file proguard_before.map \ 177 --comm com.example.android.displayingbitmaps \ 178 > perf_before.folded 179./stackcollapse.py --kernel --jit -i perf_after.data \ 180 --proguard-mapping-file proguard_after.map \ 181 --comm com.example.android.displayingbitmaps \ 182 > perf_after.folded 183 184# Generate diff reports 185FlameGraph/difffolded.pl -n perf_before.folded perf_after.folded \ 186 | FlameGraph/flamegraph.pl > diff1.svg 187FlameGraph/difffolded.pl -n --negate perf_after.folded perf_before.folded \ 188 | FlameGraph/flamegraph.pl > diff2.svg 189``` 190 191## Android Studio Profiler 192 193Android Studio Profiler supports recording and reporting profiles of app processes. It supports 194several recording methods, including one using simpleperf as backend. You can use Android Studio 195Profiler for both recording and reporting. 196 197In Android Studio: 198Open View -> Tool Windows -> Profiler 199Click + -> Your Device -> Profileable Processes -> Your App 200 201 202 203Click into "CPU" Chart 204 205Choose Callstack Sample Recording. Even if you're using Java, this provides better observability, 206into ART, malloc, and the kernel. 207 208 209 210Click Record, run your test on the device, then Stop when you're done. 211 212Click on a thread track, and "Flame Chart" to see a chronological chart on the left, and an 213aggregated flamechart on the right: 214 215 216 217If you want more flexibility in recording options, or want to add proguard mapping file, you can 218record using simpleperf, and report using Android Studio Profiler. 219 220We can use `simpleperf report-sample` to convert perf.data to trace files for Android Studio 221Profiler. 222 223``` 224# Convert perf.data to perf.trace for Android Studio Profiler. 225# If on Mac/Windows, use simpleperf host executable for those platforms instead. 226bin/linux/x86_64/simpleperf report-sample --show-callchain --protobuf -i perf.data -o perf.trace 227 228# Convert perf.data to perf.trace using proguard mapping file. 229bin/linux/x86_64/simpleperf report-sample --show-callchain --protobuf -i perf.data -o perf.trace \ 230 --proguard-mapping-file proguard.map 231``` 232 233In Android Studio: Open File -> Open -> Select perf.trace 234 235 236 237 238## Simpleperf HTML Report 239 240Simpleperf can generate its own HTML Profile, which is able to show Android-specific information 241and separate flamegraphs for all threads, with a much rougher flamegraph UI. 242 243 244 245This UI is fairly rough; we recommend using the Continuous PProf UI or Firefox Profiler instead. But 246it's useful for a quick look at your data. 247 248Each of the following commands take as input ./perf.data and output ./report.html. 249 250``` 251# Make an HTML report. 252./report_html.py 253 254# Make an HTML report with Proguard mapping. 255./report_html.py --proguard-mapping-file proguard.map 256``` 257 258This will print some debug logs about Failed to read symbols: this is usually OK, unless those 259symbols are hotspots. 260 261See also [report_html.py's README](scripts_reference.md#report_htmlpy) and `report_html.py -h`. 262 263 264## PProf Interactive Command Line 265 266Unlike Continuous PProf UI, [PProf](https://github.com/google/pprof) command line is publicly 267available, and allows drilldown, pivoting and filtering. 268 269The below session demonstrates filtering to stack frames containing processBitmap. 270 271``` 272$ pprof pprof.profile 273(pprof) show=processBitmap 274(pprof) top 275Active filters: 276 show=processBitmap 277Showing nodes accounting for 2.45s, 11.44% of 21.46s total 278 flat flat% sum% cum cum% 279 2.45s 11.44% 11.44% 2.45s 11.44% com.example.android.displayingbitmaps.util.ImageFetcher.processBitmap 280``` 281 282And then showing the tags of those frames, to tell what threads they are running on: 283 284``` 285(pprof) tags 286 pid: Total 2.5s 287 2.5s ( 100%): 31112 288 289 thread: Total 2.5s 290 1.4s (57.21%): AsyncTask #3 291 1.1s (42.79%): AsyncTask #4 292 293 threadpool: Total 2.5s 294 2.5s ( 100%): AsyncTask #%d 295 296 tid: Total 2.5s 297 1.4s (57.21%): 31174 298 1.1s (42.79%): 31175 299``` 300 301Contrast with another method: 302 303``` 304(pprof) show=addBitmapToCache 305(pprof) top 306Active filters: 307 show=addBitmapToCache 308Showing nodes accounting for 1.05s, 4.88% of 21.46s total 309 flat flat% sum% cum cum% 310 1.05s 4.88% 4.88% 1.05s 4.88% com.example.android.displayingbitmaps.util.ImageCache.addBitmapToCache 311``` 312 313For more information, see the [pprof README](https://github.com/google/pprof/blob/master/doc/README.md#interactive-terminal-use). 314 315 316## Simpleperf Report Command Line 317 318The simpleperf report command reports profiles in text format. 319 320 321 322You can call `simpleperf report` directly or call it via `report.py`. 323 324``` 325# Report symbols in table format. 326$ ./report.py --children 327 328# Report call graph. 329$ bin/linux/x86_64/simpleperf report -g -i perf.data 330``` 331 332See also [report command's README](executable_commands_reference.md#The-report-command) and 333`report.py -h`. 334 335 336## Custom Report Interface 337 338If the above View UIs can't fulfill your need, you can use `simpleperf_report_lib.py` to parse 339perf.data, extract sample information, and feed it to any views you like. 340 341See [simpleperf_report_lib.py's README](scripts_reference.md#simpleperf_report_libpy) for more 342details. 343