1page.title=Testing Display Performance 2page.image=images/cards/card-test-performance_2x.png 3page.keywords=performance, fps, tools 4 5@jd:body 6 7 8<div id="qv-wrapper"> 9 <div id="qv"> 10 <h2>In this document</h2> 11 <ol> 12 <li><a href="#measure">Measuring UI Performance</a> 13 <ul> 14 <li><a href="#aggregate">Aggregate frame stats</a></li> 15 <li><a href="#timing-info">Precise frame timing info</a></li> 16 <li><a href="#timing-dump">Simple frame timing dump</a></li> 17 <li><a href="#collection-window">Controlling the window of stat collection</a></li> 18 <li><a href="#diagnose">Diagnosing performance regressions</a></li> 19 <li><a href="#resources">Additional resources</a></li> 20 </ul> 21 </li> 22 <li><a href="#automate">Automating UI Perfomance Tests</a> 23 <ul> 24 <li><a href="#ui-tests">Setting up UI tests</a></li> 25 <li><a href="#automated-tests">Setting up automated UI testing</a></li> 26 <li><a href="#triage">Triaging and fixing observed problems</a></li> 27 </ul> 28 </li> 29 </ol> 30 </div> 31</div> 32 33 34<p> 35 User interface (UI) performance testing ensures that your app not only meets its functional 36 requirements, but that user interactions with your app are buttery smooth, running at a 37 consistent 60 frames per second (<a href= 38 "https://www.youtube.com/watch?v=CaMTIgxCSqU&index=25&list=PLWz5rJ2EKKc9CBxr3BVjPTPoDPLdPIFCE">why 39 60fps?</a>), without any dropped or delayed frames, or as we like to call it, <em>jank</em>. This 40 document explains tools available to measure UI performance, and lays out an approach to 41 integrate UI performance measurements into your testing practices. 42</p> 43 44 45<h2 id="measure">Measuring UI Performance</h2> 46 47<p> 48 In order to improve performance you first need the ability to measure the performance of 49 your system, and then diagnose and identify problems that may arrive from various parts of your 50 pipeline. 51</p> 52 53<p> 54 <em><a href="https://source.android.com/devices/tech/debug/dumpsys.html">dumpsys</a></em> is an 55 Android tool that runs on the device and dumps interesting information about the status of system 56 services. Passing the <em>gfxinfo</em> command to dumpsys provides an output in logcat with 57 performance information relating to frames of animation that are occurring during the recording 58 phase. 59</p> 60 61<pre> 62> adb shell dumpsys gfxinfo <PACKAGE_NAME> 63</pre> 64 65<p> 66 This command can produce multiple different variants of frame timing data. 67</p> 68 69<h3 id="aggregate">Aggregate frame stats</h3> 70 71<p> 72 With the M Preview the command prints out aggregated analysis of frame data to logcat, collected 73 across the entire lifetime of the process. For example: 74</p> 75 76<pre class="noprettyprint"> 77Stats since: 752958278148ns 78Total frames rendered: 82189 79Janky frames: 35335 (42.99%) 8090th percentile: 34ms 8195th percentile: 42ms 8299th percentile: 69ms 83Number Missed Vsync: 4706 84Number High input latency: 142 85Number Slow UI thread: 17270 86Number Slow bitmap uploads: 1542 87Number Slow draw: 23342 88</pre> 89 90<p> 91 These high level statistics convey at a high level the rendering performance of the app, as well 92 as its stability across many frames. 93</p> 94 95 96<h3 id="timing-info">Precise frame timing info</h3> 97 98<p> 99 With the M Preview comes a new command for gfxinfo, and that’s <em>framestats</em> which provides 100 extremely detailed frame timing information from recent frames, so that you can track down and 101 debug problems more accurately. 102</p> 103 104<pre> 105>adb shell dumpsys gfxinfo <PACKAGE_NAME> framestats 106</pre> 107 108<p> 109 This command prints out frame timing information, with nanosecond timestamps, from the last 120 110 frames produced by the app. Below is example raw output from adb dumpsys gfxinfo 111 <PACKAGE_NAME> framestats: 112</p> 113 114<pre class="noprettyprint"> 1150,27965466202353,27965466202353,27965449758000,27965461202353,27965467153286,27965471442505,27965471925682,27965474025318,27965474588547,27965474860786,27965475078599,27965479796151,27965480589068, 1160,27965482993342,27965482993342,27965465835000,27965477993342,27965483807401,27965486875630,27965487288443,27965489520682,27965490184380,27965490568703,27965491408078,27965496119641,27965496619641, 1170,27965499784331,27965499784331,27965481404000,27965494784331,27965500785318,27965503736099,27965504201151,27965506776568,27965507298443,27965507515005,27965508405474,27965513495318,27965514061984, 1180,27965516575320,27965516575320,27965497155000,27965511575320,27965517697349,27965521276151,27965521734797,27965524350474,27965524884536,27965525160578,27965526020891,27965531371203,27965532114484, 119</pre> 120 121<p> 122 Each line of this output represents a frame produced by the app. Each line has a fixed number of 123 columns describing time spent in each stage of the frame-producing pipeline. The next section 124 describes this format in detail, including what each column represents. 125</p> 126 127 128<h4 id="fs-data-format">Framestats data format</h4> 129 130<p> 131 Since the block of data is output in CSV format, it's very straightforward to paste it to your 132 spreadsheet tool of choice, or collect and parse with a script. The following table explains the 133 format of the output data columns. All timestamps are in nanoseconds. 134</p> 135 136<ul> 137 <li>FLAGS 138 <ul> 139 <li>Rows with a ‘0’ for the FLAGS column can have their total frame time computed by 140 subtracting the INTENDED_VSYNC column from the FRAME_COMPLETED column. 141 </li> 142 143 <li>If this is non-zero the row should be ignored, as the frame has been determined as being 144 an outlier from normal performance, where it is expected that layout & draw take longer 145 than 16ms. Here are a few reasons this could occur: 146 <ul> 147 <li>The window layout changed (such as the first frame of the application or after a 148 rotation) 149 </li> 150 151 <li>It is also possible the frame was skipped in which case some of the values will have 152 garbage timestamps. A frame can be skipped if for example it is out-running 60fps or if 153 nothing on-screen ended up being dirty, this is not necessarily a sign of a problem in 154 the app. 155 </li> 156 </ul> 157 </li> 158 </ul> 159 </li> 160 161 <li>INTENDED_VSYNC 162 <ul> 163 <li>The intended start point for the frame. If this value is different from VSYNC, there 164 was work occurring on the UI thread that prevented it from responding to the vsync signal 165 in a timely fashion. 166 </li> 167 </ul> 168 </li> 169 170 <li>VSYNC 171 <ul> 172 <li>The time value that was used in all the vsync listeners and drawing for the frame 173 (Choreographer frame callbacks, animations, View.getDrawingTime(), etc…) 174 </li> 175 176 <li>To understand more about VSYNC and how it influences your application, check out the 177 <a href= 178 "https://www.youtube.com/watch?v=1iaHxmfZGGc&list=PLOU2XLYxmsIKEOXh5TwZEv89aofHzNCiu&index=23"> 179 Understanding VSYNC</a> video. 180 </li> 181 </ul> 182 </li> 183 184 <li>OLDEST_INPUT_EVENT 185 <ul> 186 <li>The timestamp of the oldest input event in the input queue, or Long.MAX_VALUE if 187 there were no input events for the frame. 188 </li> 189 190 <li>This value is primarily intended for platform work and has limited usefulness to app 191 developers. 192 </li> 193 </ul> 194 </li> 195 196 <li>NEWEST_INPUT_EVENT 197 <ul> 198 <li>The timestamp of the newest input event in the input queue, or 0 if there were no 199 input events for the frame. 200 </li> 201 202 <li>This value is primarily intended for platform work and has limited usefulness to app 203 developers. 204 </li> 205 206 <li>However it’s possible to get a rough idea of how much latency the app is adding by 207 looking at (FRAME_COMPLETED - NEWEST_INPUT_EVENT). 208 </li> 209 </ul> 210 </li> 211 212 <li>HANDLE_INPUT_START 213 <ul> 214 <li>The timestamp at which input events were dispatched to the application. 215 </li> 216 217 <li>By looking at the time between this and ANIMATION_START it is possible to measure how 218 long the application spent handling input events. 219 </li> 220 221 <li>If this number is high (>2ms), this indicates the app is spending an unusually 222 long time processing input events, such as View.onTouchEvent(), which may indicate this 223 work needs to be optimized, or offloaded to a different thread. Note that there are some 224 scenarios, such as click events that launch new activities or similar, where it is 225 expected and acceptable that this number is large. 226 </li> 227 </ul> 228 </li> 229 230 <li>ANIMATION_START 231 <ul> 232 <li>The timestamp at which animations registered with Choreographer were run. 233 </li> 234 235 <li>By looking at the time between this and PERFORM_TRANVERSALS_START it is possible to 236 determine how long it took to evaluate all the animators (ObjectAnimator, 237 ViewPropertyAnimator, and Transitions being the common ones) that are running. 238 </li> 239 240 <li>If this number is high (>2ms), check to see if your app has written any custom 241 animators or what fields ObjectAnimators are animating and ensure they are appropriate 242 for an animation. 243 </li> 244 245 <li>To learn more about Choreographer, check out the <a href= 246 "https://developers.google.com/events/io/sessions/325418001">For Butter or Worse</a> 247 video. 248 </li> 249 </ul> 250 </li> 251 252 <li>PERFORM_TRAVERSALS_START 253 <ul> 254 <li>If you subtract out DRAW_START from this value, you can extract how long the layout 255 & measure phases took to complete. (note, during a scroll, or animation, you would 256 hope this should be close to zero..) 257 </li> 258 259 <li>To learn more about the measure & layout phases of the rendering pipeline, check 260 out the <a href= 261 "https://www.youtube.com/watch?v=we6poP0kw6E&list=PLOU2XLYxmsIKEOXh5TwZEv89aofHzNCiu&index=27"> 262 Invalidations, Layouts and Performance</a> video 263 </li> 264 </ul> 265 </li> 266 267 <li>DRAW_START 268 <ul> 269 <li>The time at which the draw phase of performTraversals started. This is the start 270 point of recording the display lists of any views that were invalidated. 271 </li> 272 273 <li>The time between this and SYNC_START is how long it took to call View.draw() on all 274 the invalidated views in the tree. 275 </li> 276 277 <li>For more information on the drawing model, see <a href= 278 "{@docRoot}guide/topics/graphics/hardware-accel.html#hardware-model">Hardware Acceleration</a> 279 or the <a href= 280 "https://www.youtube.com/watch?v=we6poP0kw6E&list=PLOU2XLYxmsIKEOXh5TwZEv89aofHzNCiu&index=27"> 281 Invalidations, Layouts and Performance</a> video 282 </li> 283 </ul> 284 </li> 285 286 <li>SYNC_QUEUED 287 <ul> 288 <li>The time at which a sync request was sent to the RenderThread. 289 </li> 290 291 <li>This marks the point at which a message to start the sync 292 phase was sent to the RenderThread. If the time between this and 293 SYNC_START is substantial (>0.1ms or so), it means that 294 the RenderThread was busy working on a different frame. Internally 295 this is used to differentiate between the frame doing too much work 296 and exceeding the 16ms budget and the frame being stalled due to 297 the previous frame exceeding the 16ms budget. 298 </li> 299 </ul> 300 </li> 301 302 <li>SYNC_START 303 <ul> 304 <li>The time at which the sync phase of the drawing started. 305 </li> 306 307 <li>If the time between this and ISSUE_DRAW_COMMANDS_START is substantial (>0.4ms or 308 so), it typically indicates a lot of new Bitmaps were drawn which must be uploaded to the 309 GPU. 310 </li> 311 312 <li>To understand more about the sync phase, check out the <a href= 313 "https://www.youtube.com/watch?v=VzYkVL1n4M8&index=24&list=PLOU2XLYxmsIKEOXh5TwZEv89aofHzNCiu"> 314 Profile GPU Rendering</a> video 315 </li> 316 </ul> 317 </li> 318 319 <li>ISSUE_DRAW_COMMANDS_START 320 <ul> 321 <li>The time at which the hardware renderer started issuing drawing commands to the GPU. 322 </li> 323 324 <li>The time between this and FRAME_COMPLETED gives a rough idea of how much GPU work the 325 app is producing. Problems like too much overdraw or inefficient rendering effects show 326 up here. 327 </li> 328 </ul> 329 </li> 330 331 <li>SWAP_BUFFERS 332 <ul> 333 <li>The time at which eglSwapBuffers was called, relatively uninteresting outside of 334 platform work. 335 </li> 336 </ul> 337 </li> 338 339 <li>FRAME_COMPLETED 340 <ul> 341 <li>All done! The total time spent working on this frame can be computed by doing 342 FRAME_COMPLETED - INTENDED_VSYNC. 343 </li> 344 </ul> 345 </li> 346 347</ul> 348 349<p> 350 You can use this data in different ways. One simple but useful visualization is a 351 histogram showing the distribution of frames times (FRAME_COMPLETED - INTENDED_VSYNC) in 352 different latency buckets, see figure below. This graph tells us at a glance that most 353 frames were very good - well below the 16ms deadline (depicted in red), but a few frames 354 were significantly over the deadline. We can look at changes in this histogram over time 355 to see wholesale shifts or new outliers being created. You can also graph input latency, 356 time spent in layout, or other similar interesting metrics based on the many timestamps 357 in the data. 358</p> 359 360<img src="{@docRoot}preview/images/perf-test-framestats.png"> 361 362 363<h3 id="timing-dump">Simple frame timing dump</h3> 364 365<p> 366 If <strong>Profile GPU rendering</strong> is set to <strong>In adb shell dumpsys gfxinfo</strong> 367 in Developer Options, the <code>adb shell dumpsys gfxinfo</code> command prints out timing 368 information for the most recent 120 frames, broken into a few different categories with 369 tab-separated-values. This data can be useful for indicating which parts of the drawing pipeline 370 may be slow at a high level. 371</p> 372 373<p> 374 Similar to <a href="#fs-data-format">framestats</a> above, it's very 375 straightforward to paste it to your spreadsheet tool of choice, or collect and parse with 376 a script. The following graph shows a breakdown of where many frames produced by the app 377 were spending their time. 378</p> 379 380<img src="{@docRoot}preview/images/perf-test-frame-latency.png"> 381 382<p> 383 The result of running gfxinfo, copying the output, pasting it into a spreadsheet 384 application, and graphing the data as stacked bars. 385</p> 386 387<p> 388 Each vertical bar represents one frame of animation; its height represents the number of 389 milliseconds it took to compute that frame of animation. Each colored segment of the bar 390 represents a different stage of the rendering pipeline, so that you can see what parts of 391 your application may be creating a bottleneck. For more information on understanding the 392 rendering pipeline, and how to optimize for it, see the <a href= 393 "https://www.youtube.com/watch?v=we6poP0kw6E&index=27&list=PLWz5rJ2EKKc9CBxr3BVjPTPoDPLdPIFCE"> 394 Invalidations Layouts and Performance</a> video. 395</p> 396 397 398<h3 id="collection-window">Controlling the window of stat collection</h3> 399 400<p> 401 Both the framestats and simple frame timings gather data over a very short window - about 402 two seconds worth of rendering. In order to precisely control this window of time - for 403 example, to constrain the data to a particular animation - you can reset all counters, 404 and aggregate statistics gathered. 405</p> 406 407<pre> 408>adb shell dumpsys gfxinfo <PACKAGE_NAME> reset 409</pre> 410 411<p> 412 This can also be used in conjunction with the dumping commands themselves to collect and 413 reset at a regular cadence, capturing less-than-two-second windows of frames 414 continuously. 415</p> 416 417 418<h3 id="diagnose">Diagnosing performance regressions</h3> 419 420<p> 421 Identification of regressions is a good first step to tracking down problems, and 422 maintaining high application health. However, dumpsys just identifies the existence and 423 relative severity of problems. You still need to diagnose the particular cause of the 424 performance problems, and find appropriate ways to fix them. For that, it’s highly 425 recommended to use the <a href="{@docRoot}tools/help/systrace.html">systrace</a> tool. 426</p> 427 428 429<h3 id="resources">Additional resources</h3> 430 431<p> 432 For more information on how Android’s rendering pipeline works, common problems that you 433 can find there, and how to fix them, some of the following resources may be useful to 434 you: 435</p> 436 437<ul> 438 <li>Rendering Performance 101 439 </li> 440 <li>Why 60fps? 441 </li> 442 <li>Android UI and the GPU 443 </li> 444 <li>Invalidations Layouts and performance 445 </li> 446 <li>Analyzing UI Performance with Systrace 447 </li> 448</ul> 449 450 451<h2 id="automate">Automating UI Perfomance Tests</h2> 452 453<p> 454 One approach to UI Performance testing is to simply have a human tester perform a set of 455 user operations on the target app, and either visually look for jank, or spend an very 456 large amount of time using a tool-driven approach to find it. But this manual approach is 457 fraught with peril - human ability to perceive frame rate changes varies tremendously, 458 and this is also time consuming, tedious, and error prone. 459</p> 460 461<p> 462 A more efficient approach is to log and analyze key performance metrics from automated UI 463 tests. The Android M developer preview includes new logging capabilities which make it 464 easy to determine the amount and severity of jank in your application’s animations, and 465 that can be used to build a rigorous process to determine your current performance and 466 track future performance objectives. 467</p> 468 469<p> 470 This article walks you through a recommended approach to using that data to automate your 471 performance testing. 472</p> 473 474<p> 475 This is mostly broken down into two key actions. Firstly, identifying what you're 476 testing, and how you’re testing it. and Secondly, setting up, and maintaining an 477 automated testing environment. 478</p> 479 480 481<h3 id="ui-tests">Setting up UI tests</h3> 482 483<p> 484 Before you can get started with automated testing, it’s important to determine a few high 485 level decisions, in order to properly understand your test space, and needs you may have. 486</p> 487 488<h4> 489 Identify key animations / flows to test 490</h4> 491 492<p> 493 Remember that bad performance is most visible to users when it interrupts a smooth 494 animation. As such, when identifying what types of UI actions to test for, it’s useful to 495 focus on the key animations that users see most, or are most important to their 496 experience. For example, here are some common scenarios that may be useful to identify: 497</p> 498 499<ul> 500 <li>Scrolling a primary ListView or RecyclerView 501 </li> 502 503 <li>Animations during async wait cycles 504 </li> 505 506 <li>Any animation that may have bitmap loading / manipulation in it 507 </li> 508 509 <li>Animations including Alpha Blending 510 </li> 511 512 <li>Custom View drawing with Canvas 513 </li> 514</ul> 515 516<p> 517 Work with engineers, designers, and product managers on your team to prioritize these key 518 product animations for test coverage. 519</p> 520 521<h4> 522 Define your future objectives and track against them 523</h4> 524 525<p> 526 From a high-level, it may be critical to identify your specific performance goals, and 527 focus on writing tests, and collecting data around them. For example: 528</p> 529 530<ul> 531 <li>Do you just want to begin tracking UI performance for the first time to learn more? 532 </li> 533 534 <li>Do you want to prevent regressions that might be introduced in the future? 535 </li> 536 537 <li>Are you at 90% of smooth frames today and want to get to 98% this quarter? 538 </li> 539 540 <li>Are you at 98% smooth frames and don’t want to regress? 541 </li> 542 543 <li>Is your goal to improve performance on low end devices? 544 </li> 545</ul> 546 547<p> 548 In all of these cases, you’ll want historical tracking which shows performance across 549 multiple versions of your application. 550</p> 551 552<h4> 553 Identify devices to test on 554</h4> 555 556<p> 557 Application performance varies depending on the device it's running on. Some devices may 558 contain less memory, less powerful GPUs, or slower CPU chips. This means that animations 559 which may perform well on one set of hardware, may not on others, and worse, may be a 560 result of a bottleneck in a different part of the pipeline. So, to account for this 561 variation in what a user might see, pick a range of devices to execute tests on, both 562 current high end devices, low end devices, tablets, etc. Look for variation in CPU 563 performance, RAM, screen density, size, and so on. Tests that pass on a high end device 564 may fail on a low end device. 565</p> 566 567<h4> 568 Basic frameworks for UI Testing 569</h4> 570 571<p> 572 Tool suites, like <a href= 573 "{@docRoot}training/testing/ui-testing/uiautomator-testing.html">UI Automator</a> and 574 <a href="{@docRoot}training/testing/ui-testing/espresso-testing.html">Espresso</a>, are 575 built to help automate the action of a user moving through your application. These are simple 576 frameworks which mimic user interaction with your device. To use these frameworks, you 577 effectively create unique scripts, which run through a set of user-actions, and play them 578 out on the device itself. 579</p> 580 581<p> 582 By combining these automated tests, alongside <code>dumpsys gfxinfo</code> you can quickly 583 create a reproducible system that allows you to execute a test, and measure the 584 performance information of that particular condition. 585</p> 586 587 588<h3 id="automated-tests">Setting up automated UI testing</h3> 589 590<p> 591 Once you have the ability to execute a UI test, and a pipeline to gather the data from a 592 single test, the next important step is to embrace a framework which can execute that 593 test multiple times, across multiple devices, and aggregate the resulting performance 594 data for further analysis by your development team. 595</p> 596 597<h4> 598 A framework for test automation 599</h4> 600 601<p> 602 It’s worth noting that UI testing frameworks (like <a href= 603 "{@docRoot}training/testing/ui-testing/uiautomator-testing.html">UI Automator</a>) 604 run on the target device/emulator directly. While performance gathering information done 605 by <em>dumpsys gfxinfo</em> is driven by a host machine, sending commands over ADB. To 606 help bridge the automation of these separate entities, <a href= 607 "{@docRoot}tools/help/monkeyrunner_concepts.html">MonkeyRunner</a> framework was 608 developed; A scripting system that runs on your host machine, which can issue commands to 609 a set of connected devices, as well as receive data from them. 610</p> 611 612<p> 613 Building a set of scripts for proper automation of UI Performance testing, at a minimum, 614 should be able to utilize monkeyRunner to accomplish the following tasks: 615</p> 616 617<ul> 618 <li>Load & Launch a desired APK to a target device, devices, or emulator. 619 </li> 620 621 <li>Launch a UI Automator UI test, and allow it to be executed 622 </li> 623 624 <li>Collect performance information through <em>dumpsys gfxinfo</em><em>.</em> 625 </li> 626 627 <li>Aggregate information and display it back in a useful fashion to the developer. 628 </li> 629</ul> 630 631 632<h3 id="triage">Triaging and fixing observed problems</h3> 633 634<p> 635 Once problem patterns or regressions are identified, the next step is identifying and 636 applying the fix. If your automated test framework preserves precise timing breakdowns 637 for frames, it can help you scrutinize recent suspicious code/layout changes (in the case 638 of regression), or narrow down the part of the system you’re analyzing when you switch to 639 manual investigation. For manual investigation, <a href= 640 "{@docRoot}tools/help/systrace.html">systrace</a> is a great place to start, showing 641 precise timing information about every stage of the rendering pipeline, every thread and 642 core in the system, as well as any custom event markers you define. 643</p> 644 645<h4> 646 Properly profiling temporal timings 647</h4> 648 649<p> 650 It is important to note the difficulties in obtaining and measuring timings that come from 651 rendering performance. These numbers are, by nature, non deterministic, and often 652 fluctuate depending on the state of the system, amount of memory available, thermal 653 throttling, and the last time a sun flare hit your area of the earth. The point is that 654 you can run the same test, twice and get slightly different numbers that may be close to 655 each other, but not exact. 656</p> 657 658<p> 659 Properly gathering and profiling data in this manner means running the same test, 660 multiple times, and accumulating the results as an average, or median value. (for the 661 sake of simplicity, let’s call this a ‘batch’) This gives you the rough approximation of 662 the performance of the test, while not needing exact timings. 663</p> 664 665<p> 666 Batches can be used between code changes to see the relative impact of those changes on 667 performance. If the average frame rate for the pre-change Batch is larger than the 668 post-change batch, then you generally have an overall win wrt performance for that 669 particular change. 670</p> 671 672<p> 673 This means that any Automated UI testing you do should take this concept into 674 consideration, and also account for any anomalies that might occur during a test. For 675 example, if your application performance suddenly dips, due to some device issue (that 676 isn’t caused by your application) then you may want to re-run the batch in order to get 677 less chaotic timings. 678</p> 679 680<p> 681 So, how many times should you run a test, before the measurements become meaningful? 10 682 times should be the minimum, with higher numbers like 50 or 100 yielding more accurate 683 results (of course, you’re now trading off time for accuracy) 684</p> 685