1# Simpleperf Introduction 2## What is simpleperf 3Simpleperf is a native profiling tool for Android. Its command-line interface 4supports broadly the same options as the linux-tools perf, but also supports 5various Android-specific improvements. 6 7Simpleperf is part of the Android Open Source Project. The source code is at 8https://android.googlesource.com/platform/system/extras/+/master/simpleperf/. 9The latest document is at 10https://android.googlesource.com/platform/system/extras/+show/master/simpleperf/README.md. 11Bugs and feature requests can be submitted at 12http://github.com/android-ndk/ndk/issues. 13 14## How simpleperf works 15Modern CPUs have a hardware component called the performance monitoring unit 16(PMU). The PMU has several hardware counters, counting events like how many cpu 17cycles have happened, how many instructions have executed, or how many cache 18misses have happened. 19 20The Linux kernel wraps these hardware counters into hardware perf events. In 21addition, the Linux kernel also provides hardware independent software events 22and tracepoint events. The Linux kernel exposes all this to userspace via the 23perf_event_open system call, which simpleperf uses. 24 25Simpleperf has three main functions: stat, record and report. 26 27The stat command gives a summary of how many events have happened in the 28profiled processes in a time period. Here’s how it works: 291. Given user options, simpleperf enables profiling by making a system call to 30linux kernel. 312. Linux kernel enables counters while scheduling on the profiled processes. 323. After profiling, simpleperf reads counters from linux kernel, and reports a 33counter summary. 34 35The record command records samples of the profiled process in a time period. 36Here’s how it works: 371. Given user options, simpleperf enables profiling by making a system call to 38linux kernel. 392. Simpleperf creates mapped buffers between simpleperf and linux kernel. 403. Linux kernel enable counters while scheduling on the profiled processes. 414. Each time a given number of events happen, linux kernel dumps a sample to a 42mapped buffer. 435. Simpleperf reads samples from the mapped buffers and generates perf.data. 44 45The report command reads a "perf.data" file and any shared libraries used by 46the profiled processes, and outputs a report showing where the time was spent. 47 48## Main simpleperf commands 49Simpleperf supports several subcommands, including list, stat, record, report. 50Each subcommand supports different options. This section only covers the most 51important subcommands and options. To see all subcommands and options, 52use --help. 53 54 # List all subcommands. 55 $simpleperf --help 56 57 # Print help message for record subcommand. 58 $simpleperf record --help 59 60### simpleperf list 61simpleperf list is used to list all events available on the device. Different 62devices may support different events because of differences in hardware and 63kernel. 64 65 $simpleperf list 66 List of hw-cache events: 67 branch-loads 68 ... 69 List of hardware events: 70 cpu-cycles 71 instructions 72 ... 73 List of software events: 74 cpu-clock 75 task-clock 76 ... 77 78### simpleperf stat 79simpleperf stat is used to get a raw event counter information of the profiled program 80or system-wide. By passing options, we can select which events to use, which 81processes/threads to monitor, how long to monitor and the print interval. 82Below is an example. 83 84 # Stat using default events (cpu-cycles,instructions,...), and monitor 85 # process 7394 for 10 seconds. 86 $simpleperf stat -p 7394 --duration 10 87 Performance counter statistics: 88 89 1,320,496,145 cpu-cycles # 0.131736 GHz (100%) 90 510,426,028 instructions # 2.587047 cycles per instruction (100%) 91 4,692,338 branch-misses # 468.118 K/sec (100%) 92 886.008130(ms) task-clock # 0.088390 cpus used (100%) 93 753 context-switches # 75.121 /sec (100%) 94 870 page-faults # 86.793 /sec (100%) 95 96 Total test time: 10.023829 seconds. 97 98#### Select events 99We can select which events to use via -e option. Below are examples: 100 101 # Stat event cpu-cycles. 102 $simpleperf stat -e cpu-cycles -p 11904 --duration 10 103 104 # Stat event cache-references and cache-misses. 105 $simpleperf stat -e cache-references,cache-misses -p 11904 --duration 10 106 107When running the stat command, if the number of hardware events is larger than 108the number of hardware counters available in the PMU, the kernel shares hardware 109counters between events, so each event is only monitored for part of the total 110time. In the example below, there is a percentage at the end of each row, 111showing the percentage of the total time that each event was actually monitored. 112 113 # Stat using event cache-references, cache-references:u,.... 114 $simpleperf stat -p 7394 -e cache-references,cache-references:u,cache-references:k,cache-misses,cache-misses:u,cache-misses:k,instructions --duration 1 115 Performance counter statistics: 116 117 4,331,018 cache-references # 4.861 M/sec (87%) 118 3,064,089 cache-references:u # 3.439 M/sec (87%) 119 1,364,959 cache-references:k # 1.532 M/sec (87%) 120 91,721 cache-misses # 102.918 K/sec (87%) 121 45,735 cache-misses:u # 51.327 K/sec (87%) 122 38,447 cache-misses:k # 43.131 K/sec (87%) 123 9,688,515 instructions # 10.561 M/sec (89%) 124 125 Total test time: 1.026802 seconds. 126 127In the example above, each event is monitored about 87% of the total time. But 128there is no guarantee that any pair of events are always monitored at the same 129time. If we want to have some events monitored at the same time, we can use 130--group option. Below is an example. 131 132 # Stat using event cache-references, cache-references:u,.... 133 $simpleperf stat -p 7394 --group cache-references,cache-misses --group cache-references:u,cache-misses:u --group cache-references:k,cache-misses:k -e instructions --duration 1 134 Performance counter statistics: 135 136 3,638,900 cache-references # 4.786 M/sec (74%) 137 65,171 cache-misses # 1.790953% miss rate (74%) 138 2,390,433 cache-references:u # 3.153 M/sec (74%) 139 32,280 cache-misses:u # 1.350383% miss rate (74%) 140 879,035 cache-references:k # 1.251 M/sec (68%) 141 30,303 cache-misses:k # 3.447303% miss rate (68%) 142 8,921,161 instructions # 10.070 M/sec (86%) 143 144 Total test time: 1.029843 seconds. 145 146#### Select target to monitor 147We can select which processes or threads to monitor via -p option or -t option. 148Monitoring a process is the same as monitoring all threads in the process. 149Simpleperf can also fork a child process to run the new command and then monitor 150the child process. Below are examples. 151 152 # Stat process 11904 and 11905. 153 $simpleperf stat -p 11904,11905 --duration 10 154 155 # Stat thread 11904 and 11905. 156 $simpleperf stat -t 11904,11905 --duration 10 157 158 # Start a child process running `ls`, and stat it. 159 $simpleperf stat ls 160 161#### Decide how long to monitor 162When monitoring existing threads, we can use --duration option to decide how long 163to monitor. When monitoring a child process running a new command, simpleperf 164monitors until the child process ends. In this case, we can use Ctrl-C to stop monitoring 165at any time. Below are examples. 166 167 # Stat process 11904 for 10 seconds. 168 $simpleperf stat -p 11904 --duration 10 169 170 # Stat until the child process running `ls` finishes. 171 $simpleperf stat ls 172 173 # Stop monitoring using Ctrl-C. 174 $simpleperf stat -p 11904 --duration 10 175 ^C 176 177#### Decide the print interval 178When monitoring perf counters, we can also use --interval option to decide the print 179interval. Below are examples. 180 181 # Print stat for process 11904 every 300ms. 182 $simpleperf stat -p 11904 --duration 10 --interval 300 183 184 # Print system wide stat at interval of 300ms for 10 seconds (rooted device only). 185 # system wide profiling needs root privilege 186 $su 0 simpleperf stat -a --duration 10 --interval 300 187 188#### Display counters in systrace 189simpleperf can also work with systrace to dump counters in the collected trace. 190Below is an example to do a system wide stat 191 192 # capture instructions (kernel only) and cache misses with interval of 300 milliseconds for 15 seconds 193 $su 0 simpleperf stat -e instructions:k,cache-misses -a --interval 300 --duration 15 194 # on host launch systrace to collect trace for 10 seconds 195 (HOST)$external/chromium-trace/systrace.py --time=10 -o new.html sched gfx view 196 # open the collected new.html in browser and perf counters will be shown up 197 198### simpleperf record 199simpleperf record is used to dump records of the profiled program. By passing 200options, we can select which events to use, which processes/threads to monitor, 201what frequency to dump records, how long to monitor, and where to store records. 202 203 # Record on process 7394 for 10 seconds, using default event (cpu-cycles), 204 # using default sample frequency (4000 samples per second), writing records 205 # to perf.data. 206 $simpleperf record -p 7394 --duration 10 207 simpleperf I 07-11 21:44:11 17522 17522 cmd_record.cpp:316] Samples recorded: 21430. Samples lost: 0. 208 209#### Select events 210In most cases, the cpu-cycles event is used to evaluate consumed cpu time. 211As a hardware event, it is both accurate and efficient. We can also use other 212events via -e option. Below is an example. 213 214 # Record using event instructions. 215 $simpleperf record -e instructions -p 11904 --duration 10 216 217#### Select target to monitor 218The way to select target in record command is similar to that in stat command. 219Below are examples. 220 221 # Record process 11904 and 11905. 222 $simpleperf record -p 11904,11905 --duration 10 223 224 # Record thread 11904 and 11905. 225 $simpleperf record -t 11904,11905 --duration 10 226 227 # Record a child process running `ls`. 228 $simpleperf record ls 229 230#### Set the frequency to record 231We can set the frequency to dump records via the -f or -c options. For example, 232-f 4000 means dumping approximately 4000 records every second when the monitored 233thread runs. If a monitored thread runs 0.2s in one second (it can be preempted 234or blocked in other times), simpleperf dumps about 4000 * 0.2 / 1.0 = 800 235records every second. Another way is using -c option. For example, -c 10000 236means dumping one record whenever 10000 events happen. Below are examples. 237 238 # Record with sample frequency 1000: sample 1000 times every second running. 239 $simpleperf record -f 1000 -p 11904,11905 --duration 10 240 241 # Record with sample period 100000: sample 1 time every 100000 events. 242 $simpleperf record -c 100000 -t 11904,11905 --duration 10 243 244#### Decide how long to monitor 245The way to decide how long to monitor in record command is similar to that in 246stat command. Below are examples. 247 248 # Record process 11904 for 10 seconds. 249 $simpleperf record -p 11904 --duration 10 250 251 # Record until the child process running `ls` finishes. 252 $simpleperf record ls 253 254 # Stop monitoring using Ctrl-C. 255 $simpleperf record -p 11904 --duration 10 256 ^C 257 258#### Set the path to store records 259By default, simpleperf stores records in perf.data in current directory. We can 260use -o option to set the path to store records. Below is an example. 261 262 # Write records to data/perf2.data. 263 $simpleperf record -p 11904 -o data/perf2.data --duration 10 264 265### simpleperf report 266simpleperf report is used to report based on perf.data generated by simpleperf 267record command. Report command groups records into different sample entries, 268sorts sample entries based on how many events each sample entry contains, and 269prints out each sample entry. By passing options, we can select where to find 270perf.data and executable binaries used by the monitored program, filter out 271uninteresting records, and decide how to group records. 272 273Below is an example. Records are grouped into 4 sample entries, each entry is 274a row. There are several columns, each column shows piece of information 275belonging to a sample entry. The first column is Overhead, which shows the 276percentage of events inside current sample entry in total events. As the 277perf event is cpu-cycles, the overhead can be seen as the percentage of cpu 278time used in each function. 279 280 # Reports perf.data, using only records sampled in libsudo-game-jni.so, 281 # grouping records using thread name(comm), process id(pid), thread id(tid), 282 # function name(symbol), and showing sample count for each row. 283 $simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so --sort comm,pid,tid,symbol -n 284 Cmdline: /data/data/com.example.sudogame/simpleperf record -p 7394 --duration 10 285 Arch: arm64 286 Event: cpu-cycles (type 0, config 0) 287 Samples: 28235 288 Event count: 546356211 289 290 Overhead Sample Command Pid Tid Symbol 291 59.25% 16680 sudogame 7394 7394 checkValid(Board const&, int, int) 292 20.42% 5620 sudogame 7394 7394 canFindSolution_r(Board&, int, int) 293 13.82% 4088 sudogame 7394 7394 randomBlock_r(Board&, int, int, int, int, int) 294 6.24% 1756 sudogame 7394 7394 @plt 295 296#### Set the path to read records 297By default, simpleperf reads perf.data in current directory. We can use -i 298option to select another file to read records. 299 300 $simpleperf report -i data/perf2.data 301 302#### Set the path to find executable binaries 303If reporting function symbols, simpleperf needs to read executable binaries 304used by the monitored processes to get symbol table and debug information. By 305default, the paths are the executable binaries used by monitored processes while 306recording. However, these binaries may not exist when reporting or not contain 307symbol table and debug information. So we can use --symfs to redirect the paths. 308Below is an example. 309 310 $simpleperf report 311 # In this case, when simpleperf wants to read executable binary /A/b, 312 # it reads file in /A/b. 313 314 $simpleperf report --symfs /debug_dir 315 # In this case, when simpleperf wants to read executable binary /A/b, 316 # it prefers file in /debug_dir/A/b to file in /A/b. 317 318#### Filter records 319When reporting, it happens that not all records are of interest. Simpleperf 320supports five filters to select records of interest. Below are examples. 321 322 # Report records in threads having name sudogame. 323 $simpleperf report --comms sudogame 324 325 # Report records in process 7394 or 7395 326 $simpleperf report --pids 7394,7395 327 328 # Report records in thread 7394 or 7395. 329 $simpleperf report --tids 7394,7395 330 331 # Report records in libsudo-game-jni.so. 332 $simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so 333 334 # Report records in function checkValid or canFindSolution_r. 335 $simpleperf report --symbols "checkValid(Board const&, int, int);canFindSolution_r(Board&, int, int)" 336 337#### Decide how to group records into sample entries 338Simpleperf uses --sort option to decide how to group sample entries. Below are 339examples. 340 341 # Group records based on their process id: records having the same process 342 # id are in the same sample entry. 343 $simpleperf report --sort pid 344 345 # Group records based on their thread id and thread comm: records having 346 # the same thread id and thread name are in the same sample entry. 347 $simpleperf report --sort tid,comm 348 349 # Group records based on their binary and function: records in the same 350 # binary and function are in the same sample entry. 351 $simpleperf report --sort dso,symbol 352 353 # Default option: --sort comm,pid,tid,dso,symbol. Group records in the same 354 # thread, and belong to the same function in the same binary. 355 $simpleperf report 356 357## Features of simpleperf 358Simpleperf works similar to linux-tools-perf, but it has following improvements: 3591. Aware of Android environment. Simpleperf handles some Android specific 360situations when profiling. For example, it can profile embedded shared libraries 361in apk, read symbol table and debug information from .gnu_debugdata section. If 362possible, it gives suggestions when facing errors, like how to disable 363perf_harden to enable profiling. 3642. Support unwinding while recording. If we want to use -g option to record and 365report call-graph of a program, we need to dump user stack and register set in 366each record, and then unwind the stack to find the call chain. Simpleperf 367supports unwinding while recording, so it doesn’t need to store user stack in 368perf.data. So we can profile for a longer time with limited space on device.' 3693. Support scripts to make profiling on Android more convenient. 3704. Build in static binaries. Simpleperf is a static binary, so it doesn’t need 371supporting shared libraries to run. It means there is no limitation of Android 372version that simpleperf can run on, although some devices don’t support 373profiling. 374 375# Simpleperf tools in ndk 376Simpleperf tools in ndk contain three parts: simpleperf executable running on 377Android device, simpleperf executable running on host, and python scripts. 378 379## Simpleperf on device 380Simpleperf running on device is located at bin/android directory. It contains 381static binaries running on Android on different architectures. They can be used 382to profile processes running device, and generate perf.data. 383 384## Simpleperf on host 385Simpleperfs running on host are located at bin/darwin, bin/linux and 386bin/windows.They can be used to parse perf.data on host. 387 388## Scripts 389Scripts are used to make it convenient to profile and parse profiling results. 390app_profiler.py is used to profile an android application. It prepares 391profiling environments, downloads simpleperf on device, generates and pulls 392perf.data on host. It is configured by app_profiler.config. 393binary_cache_builder.py is used to pull native binaries from device to host. 394It is used by app_profiler.py. 395annotate.py is used to annotate source files using perf.data. It is configured 396by annotate.config. 397report.py reports perf.data in a GUI window. 398simpleperf_report_lib.py is used to enumerate samples in perf.data. Internally 399it uses libsimpleperf_report.so to parse perf.data. It can be used to translate 400samples in perf.data to other forms. One example using simpleperf_report_lib.py 401is report_sample.py. 402 403# Examples of using simpleperf tools 404This section shows how to use simpleperf tools to profile an Android 405application. 406 407## Prepare a debuggable Application 408The package name of the application is com.example.sudogame. It has both java 409code and c++ code. We need to run a copy of the app with 410android:debuggable=”true” in its AndroidManifest.xml <application> element, 411because we can’t use run-as for non-debuggable apps. The application should 412has been installed on device, and we can connect device via adb. 413 414## Profile using command line 415To record profiling data, we need to download simpleperf and native libraries 416with debug information on device, run simpleperf to generate profiling data 417file: perf.data, and run simpleperf to report perf.data. Below are the steps. 418 419### 1. Enable profiling 420 421 $adb shell setprop security.perf_harden 0 422 423### 2. Find the process running the app 424Run `ps` in the app’s context. On >=O devices, run `ps -e` instead. 425 426 $adb shell 427 angler:/ $ run-as com.example.sudogame 428 angler:/data/data/com.example.sudogame $ ps 429 u0_a93 10324 570 1030480 58104 SyS_epoll_ 00f41b7528 S com.example.sudogame 430 u0_a93 10447 10441 7716 1588 sigsuspend 753c515d34 S sh 431 u0_a93 10453 10447 9112 1644 0 7ba07ff664 R ps 432 433So process 10324 runs the app. 434 435### 3. Download simpleperf to the app’s data directory 436First we need to find out which architecture the app is using. There are many 437ways, here we just check the map of the process. 438 439 angler:/data/data/com.example.sudogame $cat /proc/10324/maps | grep boot.art 440 70f34000-7144e000 r--p 00000000 fd:00 1082 /system/framework/arm/boot.oat 441 442The file path shows it is arm. So we download simpleperf in arm directory on 443device. 444 445 $adb push bin/android/arm/simpleperf /data/local/tmp 446 $adb shell 447 angler:/ $ run-as com.example.sudogame 448 angler:/data/data/com.example.sudogame $ cp /data/local/tmp/simpleperf . 449 450### 4. Record perf.data 451 452 angler:/data/data/com.example.sudogame $./simpleperf record -p 10324 --duration 30 453 simpleperf I 01-01 09:26:39 10598 10598 cmd_record.cpp:341] Samples recorded: 49471. Samples lost: 0. 454 angler:/data/data/com.example.sudogame $ls -lh perf.data 455 -rw-rw-rw- 1 u0_a93 u0_a93 2.6M 2017-01-01 09:26 perf.data 456 457Don’t forget to run the app while recording. Otherwise, we may get no samples 458because the process is always sleeping. 459 460### 5. Report perf.data 461There are different ways to report perf.data. Below shows some examples. 462 463Report samples in different threads. 464 465 angler:/data/data/com.example.sudogame $./simpleperf report --sort pid,tid,comm 466 Cmdline: /data/data/com.example.sudogame/simpleperf record -p 10324 --duration 30 467 Arch: arm64 468 Event: cpu-cycles (type 0, config 0) 469 Samples: 49471 470 Event count: 16700769019 471 472 Overhead Pid Tid Command 473 66.31% 10324 10324 xample.sudogame 474 30.97% 10324 10340 RenderThread 475 ... 476 477Report samples in different binaries in the main thread. 478 479 angler:/data/data/com.example.sudogame $./simpleperf report --tids 10324 --sort dso -n 480 ... 481 Overhead Sample Shared Object 482 37.71% 9970 /system/lib/libc.so 483 35.45% 9786 [kernel.kallsyms] 484 8.71% 3305 /system/lib/libart.so 485 6.44% 2405 /system/framework/arm/boot-framework.oat 486 5.64% 1480 /system/lib/libcutils.so 487 1.55% 426 /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so 488 ... 489 490Report samples in different functions in libsudo-game-jni.so in the main thread. 491 492 angler:/data/data/com.example.sudogame $./simpleperf report --tids 10324 --dsos /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so --sort symbol -n 493 ... 494 Overhead Sample Symbol 495 8.94% 35 libsudo-game-jni.so[+1d54] 496 5.71% 25 libsudo-game-jni.so[+1dae] 497 5.70% 23 @plt 498 5.09% 22 libsudo-game-jni.so[+1d88] 499 4.54% 19 libsudo-game-jni.so[+1d82] 500 3.61% 14 libsudo-game-jni.so[+1f3c] 501 ... 502 503In the above result, most symbols are binary name[+virual_addr]. It is because 504libsudo-game-jni.so used on device has stripped .symbol section. We can 505download libsudo-game-jni.so having debug information on device. In android 506studio project, it locates at 507app/build/intermediates/binaries/debug/arm/obj/armeabi-v7a/libsudo-game-jni.so. 508We have to download libsudo-game-jni.so to the same relative path as recorded 509in perf.data (otherwise, simpleperf can’t find it). In this case, it is 510/data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so. 511 512Report symbols using libraries with debug information. 513 514 $adb push app/build/intermediates/binaries/debug/arm/obj/armeabi-v7a/libsudo-game-jni.so /data/local/tmp 515 $adb shell 516 angler:/ $ run-as com.example.sudogame 517 angler:/data/data/com.example.sudogame $ mkdir -p data/app/com.example.sudogame-1/lib/arm 518 angler:/data/data/com.example.sudogame $cp /data/local/tmp/libsudo-game-jni.so data/app/com.example.sudogame-1/lib/arm 519 angler:/data/data/com.example.sudogame $./simpleperf report --tids 10324 --dsos /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so --sort symbol -n --symfs . 520 ... 521 Overhead Sample Symbol 522 75.18% 317 checkValid(Board const&, int, int) 523 14.43% 60 canFindSolution_r(Board&, int, int) 524 5.70% 23 @plt 525 3.51% 20 randomBlock_r(Board&, int, int, int, int, int) 526 ... 527 528Report samples in one function 529 530 angler:/data/data/com.example.sudogame $./simpleperf report --tids 10324 --dsos /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so --symbols “checkValid(Board const&, int, int)” --sort vaddr_in_file -n --symfs . 531 ... 532 Overhead Sample VaddrInFile 533 11.89% 35 0x1d54 534 7.59% 25 0x1dae 535 6.77% 22 0x1d88 536 6.03% 19 0x1d82 537 ... 538 539### 6. Record and report call graph 540A call graph is a tree showing function call relations. Below is an example. 541 542 main() { 543 FunctionOne(); 544 FunctionTwo(); 545 } 546 FunctionOne() { 547 FunctionTwo(); 548 FunctionThree(); 549 } 550 callgraph: 551 main-> FunctionOne 552 | | 553 | |-> FunctionTwo 554 | |-> FunctionThree 555 | 556 |-> FunctionTwo 557 558#### Record dwarf based call graph 559To generate call graph, simpleperf needs to generate call chain for each record. 560Simpleperf requests kernel to dump user stack and user register set for each 561record, then it backtraces the user stack to find the function call chain. To 562parse the call chain, it needs support of dwarf call frame information, which 563usually resides in .eh_frame or .debug_frame section of the binary. So we need 564to use --symfs to point out where is libsudo-game-jni.so with debug information. 565 566 angler:/data/data/com.example.sudogame $./simpleperf record -p 10324 -g --symfs . --duration 30 567 simpleperf I 01-01 09:59:42 11021 11021 cmd_record.cpp:341] Samples recorded: 60700. Samples lost: 1240. 568 569Note that kernel can’t dump user stack >= 64K, so the dwarf based call graph 570doesn’t contain call chains consuming >= 64K stack. What’s more, because we 571need to dump stack in each record, it is likely to lost records. Usually, it 572doesn’t matter to lost some records. 573 574#### Record stack frame based call graph 575Another way to generate call graph is to rely on the kernel parsing the call 576chain for each record. To make it possible, kernel has to be able to identify 577the stack frame of each function call. This is not always possible, because 578compilers can optimize away stack frames, or use a stack frame style not 579recognized by the kernel. So how well it works depends (It works well on arm64, 580but not well on arm). 581 582 angler:/data/data/com.example.sudogame $./simpleperf record -p 10324 --call-graph fp --symfs . --duration 30 583 simpleperf I 01-01 10:03:58 11267 11267 cmd_record.cpp:341] Samples recorded: 56736. Samples lost: 0. 584 585#### Report call graph 586Report accumulated period. In the table below, the first column is “Children”, 587it is the cpu cycle percentage of a function and functions called by that 588function. The second column is “Self”, it is the cpu cycle percentage of just a 589function. For example, checkValid() itself takes 1.28% cpus, but it takes 59029.43% by running itself and calling other functions. 591 592 angler:/data/data/com.example.sudogame $./simpleperf report --children --symfs . 593 ... 594 Children Self Command Pid Tid Shared Object Symbol 595 31.94% 0.00% xample.sudogame 10324 10324 [kernel.kallsyms] [kernel.kallsyms][+ffffffc000204268] 596 31.10% 0.92% xample.sudogame 10324 10324 /system/lib/libc.so writev 597 29.43% 1.28% xample.sudogame 10324 10324 /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so checkValid(Board const&, int, int) 598 28.43% 0.34% xample.sudogame 10324 10324 /system/lib/liblog.so __android_log_print 599 28.24% 0.00% xample.sudogame 10324 10324 /system/lib/libcutils.so libcutils.so[+107b7] 600 28.10% 0.27% xample.sudogame 10324 10324 /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so canFindSolution_r(Board&, int, int) 601 ... 602 603Report call graph. 604 605 angler:/data/data/com.example.sudogame $./simpleperf report -g --symfs . >report 606 angler:/data/data/com.example.sudogame $exit 607 angler:/ $cp /data/data/com.example.sudogame/report /data/local/tmp 608 angler:/ $exit 609 $adb pull /data/local/tmp/report . 610 $cat report 611 ... 612 29.43% 1.28% xample.sudogame 10324 10324 /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so checkValid(Board const&, int, int) 613 | 614 -- checkValid(Board const&, int, int) 615 | 616 |--95.50%-- __android_log_print 617 | |--0.68%-- [hit in function] 618 | | 619 | |--51.84%-- __android_log_buf_write 620 | | |--2.07%-- [hit in function] 621 | | | 622 | | |--30.74%-- libcutils.so[+c69d] 623 ... 624 625Report call graph in callee mode. We can also show how a function is called by 626other functions. 627 628 angler:/data/data/com.example.sudogame $./simpleperf report -g callee --symfs . >report 629 $adb shell run-as com.example.sudogame cat report >report 630 $cat report 631 … 632 28.43% 0.34% xample.sudogame 10324 10324 /system/lib/liblog.so __android_log_print 633 | 634 -- __android_log_print 635 | 636 |--97.82%-- checkValid(Board const&, int, int) 637 | |--0.13%-- [hit in function] 638 | | 639 | |--94.89%-- canFindSolution_r(Board&, int, int) 640 | | |--0.01%-- [hit in function] 641 | | | 642 ... 643 644## Profile java code 645Simpleperf only supports profiling native instructions in binaries in ELF 646format. If the java code is executed by interpreter, or with jit cache, it 647can’t be profiled by simpleperf. As Android supports Ahead-of-time compilation, 648it can compile java bytecode into native instructions with debug information. 649On devices with Android version <= M, we need root privilege to compile java 650bytecode with debug information. However, on devices with Android version >= N, 651we don't need root privilege to do so. 652 653### On Android N 654#### 1. Fully compile java code into native instructions. 655 656 $adb shell setprop debug.generate-debug-info true 657 $adb shell cmd package compile -f -m speed com.example.sudogame 658 // restart the app to take effect 659 660#### 2. Record perf.data 661 662 angler:/data/data/com.example.sudogame $./simpleperf record -p 11826 -g --symfs . --duration 30 663 simpleperf I 01-01 10:31:40 11859 11859 cmd_record.cpp:341] Samples recorded: 50576. Samples lost: 2139. 664 665#### 3. Report perf.data 666 667 angler:/data/data/com.example.sudogame $./simpleperf report -g --symfs . >report 668 angler:/data/data/com.example.sudogame $exit 669 angler:/ $cp /data/data/com.example.sudogame/report /data/local/tmp 670 angler:/ $exit 671 $adb pull /data/local/tmp/report . 672 $cat report 673 ... 674 21.14% 0.00% xample.sudogame 11826 11826 /data/app/com.example.sudogame-1/oat/arm/base.odex boolean com.example.sudogame.MainActivity.onOptionsItemSelected(android.view.MenuItem) 675 | 676 -- boolean com.example.sudogame.MainActivity.onOptionsItemSelected(android.view.MenuItem) 677 | 678 --99.99%-- void com.example.sudogame.GameView.startNewGame() 679 |--0.01%-- [hit in function] 680 | 681 |--99.87%-- void com.example.sudogame.GameModel.reInit() 682 | |--0.01%-- [hit in function] 683 | | 684 | |--89.65%-- boolean com.example.sudogame.GameModel.canFindSolution(int[][]) 685 | | | 686 | | |--99.95%-- Java_com_example_sudogame_GameModel_canFindSolution 687 | | | | 688 | | | |--99.49%-- canFindSolution(Board&) 689 | | | | |--0.01%-- [hit in function] 690 | | | | | 691 | | | | |--99.97%-- canFindSolution_r(Board&, int, int) 692 | | | | | canFindSolution_r(Board&, int, int) 693 ... 694 695### On Android M 696On M devices, We need root privilege to force Android fully compiling java code 697into native instructions in ELF binaries with debug information. We also need 698root privilege to read compiled native binaries (because installd writes them 699to a directory whose uid/gid is system:install). So profiling java code can 700only be done on rooted devices. 701 702 $adb root 703 $adb shell setprop dalvik.vm.dex2oat-flags -g 704 705 # Reinstall the app. 706 $adb install -r app-debug.apk 707 708 # Change to the app’s data directory. 709 $ adb root && adb shell 710 device# cd `run-as com.example.sudogame pwd` 711 712 # Record as root as simpleperf needs to read the generated native binary. 713 device#./simpleperf record -p 25636 -g --symfs . -f 1000 --duration 30 714 simpleperf I 01-02 07:18:20 27182 27182 cmd_record.cpp:323] Samples recorded: 23552. Samples lost: 39. 715 716### On Android L 717On L devices, we also need root privilege to compile the app with debug info 718and access the native binaries. 719 720 $adb root 721 $adb shell setprop dalvik.vm.dex2oat-flags --include-debug-symbols 722 723 # Reinstall the app. 724 $adb install -r app-debug.apk 725 726## Profile using scripts 727Although using command line is flexible, it can be too complex. So we have 728python scripts to help running commands. 729 730### Record using app_profiler.py 731app_profiler.py is used to profile an Android application. It sets up profiling 732environment, downloads simpleperf and native libraries with debug information, 733runs simpleperf to generate perf.data, and pulls perf.data and binaries from 734device to host. 735It is configured by app_profiler.config. Below is an example. 736 737app_profiler.config: 738 739 app_package_name = “com.example.sudogame” 740 android_studio_project_dir = “/AndroidStudioProjects/SudoGame” # absolute path of the project 741 ... 742 record_options = "-e cpu-cycles:u -f 4000 -g --dump-symbols --duration 30" 743 ... 744 745run app_profiler.py: 746 747 $python app_profiler.py 748 ... 749 INFO:root:profiling is finished. 750 751It pulls generated perf.data on host, and collects binaries from device in 752binary_cache. 753 754### Report using report.py 755 756 $python report.py -g 757 758It generates a GUI interface to report data. 759 760### Process samples using simpleperf_report_lib.py 761simpleperf_report_lib.py provides an interface reading samples from perf.data. 762An example is report_sample.py. 763 764### Show flamegraph 765 766 $python report_sample.py >out.perf 767 $stackcollapse-perf.pl out.perf >out.folded 768 $./flamegraph.pl out.folded >a.svg 769 770### Visualize using pprof 771pprof is a tool for visualization and analysis of profiling data. It can 772be got from https://github.com/google/pprof. pprof_proto_generator.py can 773generate profiling data in a format acceptable by pprof. 774 775 $python pprof_proto_generator.py 776 $pprof -pdf pprof.profile 777 778### Annotate source code 779annotate.py reads perf.data and binaries in binary_cache. Then it knows which 780source file:line each sample hits. So it can annotate source code. annotate.py 781is configured by annotate.config. Below is an example. 782 783annotate.config: 784 785 ... 786 source_dirs = [“/AndroidStudio/SudoGame”] # It is a directory containing source code. 787 ... 788 789run annotate.py: 790 791 $python annotate.py 792 793It generates annotated_files directory. 794annotated_files/summary file contains summary information for each source file. 795An example is as below. 796 797 /AndroidStudioProjects/SudoGame/app/src/main/jni/sudo-game-jni.cpp: accumulated_period: 25.587937%, period: 1.250961% 798 function (checkValid(Board const&, int, int)): line 99, accumulated_period: 23.564356%, period: 0.908457% 799 function (canFindSolution_r(Board&, int, int)): line 135, accumulated_period: 22.260125%, period: 0.142359% 800 function (canFindSolution(Board&)): line 166, accumulated_period: 22.233101%, period: 0.000000% 801 function (Java_com_example_sudogame_GameModel_canFindSolution): line 470, accumulated_period: 21.983184%, period: 0.000000% 802 function (Java_com_example_sudogame_GameModel_initRandomBoard): line 430, accumulated_period: 2.226896%, period: 0.000000% 803 804 line 27: accumulated_period: 0.011729%, period: 0.000000% 805 line 32: accumulated_period: 0.004362%, period: 0.000000% 806 line 33: accumulated_period: 0.004427%, period: 0.000000% 807 line 36: accumulated_period: 0.003303%, period: 0.000000% 808 line 39: accumulated_period: 0.010367%, period: 0.004123% 809 line 41: accumulated_period: 0.162219%, period: 0.000000% 810 811annotated_files/ also contains annotated source files which are found by 812annotate.py. For example, part of checkValid() function in libsudo-game-jni.cpp 813is annotated as below. 814 815 /* [func] acc_p: 23.564356%, p: 0.908457% */static bool checkValid(const Board& board, int curR, int curC) { 816 /* acc_p: 0.037933%, p: 0.037933% */ int digit = board.digits[curR][curC]; 817 /* acc_p: 0.162355%, p: 0.162355% */ for (int r = 0; r < BOARD_ROWS; ++r) { 818 /* acc_p: 0.020880%, p: 0.020880% */ if (r == curR) { 819 /* acc_p: 0.034691%, p: 0.034691% */ continue; 820 } 821 /* acc_p: 0.176490%, p: 0.176490% */ if (board.digits[r][curC] == digit) { 822 /* acc_p: 14.957673%, p: 0.059022% */ LOGI("conflict (%d, %d) (%d, %d)", curR, curC, r, curC); 823 /* acc_p: 0.016296%, p: 0.016296% */ return false; 824 } 825 } 826