1page.title=Debugging ART Garbage Collection 2@jd:body 3 4<!-- 5 Copyright 2015 The Android Open Source Project 6 7 Licensed under the Apache License, Version 2.0 (the "License"); 8 you may not use this file except in compliance with the License. 9 You may obtain a copy of the License at 10 11 http://www.apache.org/licenses/LICENSE-2.0 12 13 Unless required by applicable law or agreed to in writing, software 14 distributed under the License is distributed on an "AS IS" BASIS, 15 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 16 See the License for the specific language governing permissions and 17 limitations under the License. 18--> 19 20 21<div id="qv-wrapper"> 22<div id="qv"> 23 <h2 id="Contents">In this document</h2> 24 <ol id="auto-toc"> 25 </ol> 26</div> 27</div> 28 29<p>This document describes how to debug Android Runtime (ART) Garbage Collection 30(GC) correctness and performance issues. It explains how to use GC verification 31options, identify solutions for GC verification failures, and measure and 32address GC performance problems.</p> 33 34<p>See <a href="index.html">ART and Dalvik</a>, the <a 35href="dex-format.html">Dalvik Executable format</a>, and the remaining pages 36within this <a href="index.html">ART and Dalvik</a> section to work with 37ART. See <a 38href="http://developer.android.com/guide/practices/verifying-apps-art.html">Verifying 39App Behavior on the Android Runtime (ART)</a> for additional help verifying app 40behavior.</p> 41 42<h2 id=art_gc_overview>ART GC overview</h2> 43 44<p>ART, introduced as a developer option in Android 4.4, is the default Android 45runtime for Android 5.0 and beyond. The Dalvik runtime is no longer maintained 46or available and its byte-code format is now used by ART. Please note this 47section merely summarizes ART’s GC. For additional information, see the <a 48href="https://www.google.com/events/io/io14videos/b750c8da-aebe-e311-b297-00155d5066d7">Android 49runtime</a> presentation conducted at Google I/O 2014. </p> 50 51<p>ART has a few different GC plans that consist of running different garbage 52collectors. The default plan is the CMS (concurrent mark sweep) plan, which 53uses mostly sticky CMS and partial CMS. Sticky CMS is ART’s non-moving 54generational garbage collector. It scans only the portion of the heap that was 55modified since the last GC and can reclaim only the objects allocated since the 56last GC. In addition to the CMS plan, ART performs heap compaction when an app 57changes process state to a jank-imperceptible process state (e.g. background or 58cached).</p> 59 60<p>Aside from the new garbage collectors, ART also introduces a new bitmap-based 61memory allocator called RosAlloc (runs of slots allocator). This new allocator 62has sharded locking and outperforms DlMalloc by adding thread local buffers for 63small allocation sizes.</p> 64 65<p>Compared to Dalvik, the ART CMS garbage collection plan has a number of 66improvements:</p> 67 68<ul> 69 <li>The number of pauses is reduced from two to one compared to Dalvik. 70Dalvik’s first pause, which did mostly root marking, is done concurrently in 71ART by getting the threads to mark their own roots, then resume running right away. 72 <li>Similarly to Dalvik, the ART GC also has a pause before the sweeping process. 73The key difference in this area is that some of the Dalvik phases during this 74pause are done concurrently in ART. These phases include 75<code>java.lang.ref.Reference</code> processing, system weak sweeping (e.g. jni 76weak globals, etc.), re-marking non-thread roots, and card pre-cleaning. The 77phases that are still done paused in ART are scanning the dirty cards and 78re-marking the thread roots, which helps reduce the pause time. 79 <li>The last area where the ART GC improves over Dalvik is with increased GC 80throughput enabled by the sticky CMS collector. Unlike normal generational GC, 81sticky CMS is non-moving. Instead of having a dedicated region for young 82objects, young objects are kept in an allocation stack, which is basically an 83array of <code>java.lang.Object</code>. This avoids moving objects required to 84maintain low pauses but has the disadvantage of having longer collections for 85heaps with complex object graphs. 86</ul> 87 88<p>The other main other area where the ART GC is different than Dalvik is the 89introduction of moving garbage collectors. The goal of moving GCs is to 90reduce memory usage of backgrounded apps through heap compaction. Currently, 91the event that triggers heap compaction is ActivityManager process-state 92changes. When an app goes to background, it notifies ART the process state is 93no longer jank “perceptible.” This enables ART do things that cause long 94application thread pauses, such as compaction and monitor deflation. The two 95current moving GCs that are in use are homogeneous space compaction and 96semi-space compaction.</p> 97 98<ul> 99 <li>Semi-space compaction moves objects between two tightly packed bump pointer 100spaces. This moving GC occurs on low-memory devices since it saves slightly 101more memory than homogeneous space compaction. The additional savings come 102mostly from having tightly packed objects, which avoid RosAlloc / DlMalloc 103allocator accounting overhead. Since CMS is still used in the foreground and it 104can’t collect from a bump pointer space, semi space requires another transition 105when the app is foregrounded. This is not ideal since it can cause a noticeable pause. 106 <li>Homogenous space compaction works by copying from one RosAlloc space to another 107RosAlloc space. This helps reduce memory usage by reducing heap fragmentation. 108This is currently the default compaction mode for non-low-memory devices. The 109main advantage that homogeneous space compaction has over semi-space compaction 110is not needing a heap transition when the app goes back to foreground. 111</ul> 112 113<h2 id=gc_verification_and_performance_options>GC verification and performance options</h2> 114 115<p>It is possible to change the GC type if you are an OEM. The process for doing 116this involves modifying system properties through adb. Keep in mind that these 117are only modifiable on non-user or rooted builds.</p> 118 119<h3 id=changing_the_gc_type>Changing the GC type</h3> 120 121<p>There are ways to change the GC plans that ART uses. The main way to change the 122foreground GC plan is by changing the <code>dalvik.vm.gctype</code> property or 123passing in an <code>-Xgc:</code> option. It is possible to pass in multiple GC 124options separated by commas.</p> 125 126<p>In order to derive the entire list of available <code>-Xgc</code> settings, 127it is possible to type <code>adb shell dalvikvm -help</code> to print the 128various runtime command-line options.</p> 129 130<p>Here is one example that changes the GC to semi space and turns on pre-GC heap 131verification: 132<code>adb shell setprop dalvik.vm.gctype SS,preverify</code></p> 133 134<ul> 135 <li><code>CMS</code>, which is also the default value, specifies the 136concurrent mark sweep GC plan. This plan consists of running sticky 137generational CMS, partial CMS, and full CMS. The allocator for this plan is 138RosAlloc for movable objects and DlMalloc for non-movable objects. 139 <li><code>SS</code> specifies the semi space GC plan. This plan has two semi 140spaces for movable objects and a DlMalloc space for non-movable objects. The 141movable object allocator defaults to a shared bump pointer allocator which uses 142atomic operations. However, if the <code>-XX:UseTLAB</code> flag is also passed 143in, the allocator uses thread local bump pointer allocation. 144 <li><code>GSS</code> specifies the generational semi space plan. This plan is 145very similar to the semi-space plan with the exception that older-lived objects 146are promoted into a large RosAlloc space. This has the advantage of needing to 147copy significantly fewer objects for typical use cases. 148</ul> 149 150<h3 id=verifying_the_heap>Verifying the heap</h3> 151 152<p>Heap verification is probably the most useful GC option for debugging 153GC-related errors or heap corruption. Enabling heap verification causes the GC 154to check the correctness of the heap at a few points during the garbage 155collection process. Heap verification shares the same options as the ones that 156change the GC type. If enabled, heap verification verifies the roots and 157ensures that reachable objects reference only other reachable objects. GC 158verification is enabled by passing in the following -<code>Xgc</code> values:</p> 159 160<ul> 161 <li>If enabled, <code>[no]preverify</code> performs heap verification before starting the GC. 162 <li>If enabled, <code>[no]presweepingverify</code> performs heap verification 163before starting the garbage collector sweeping 164process. 165 <li>If enabled, <code>[no]postverify</code> performs heap verification after 166the GC has finished sweeping. 167 <li><code>[no]preverify_rosalloc</code>, 168<code>[no]postsweepingverify_rosalloc</code>, 169<code>[no]postverify_rosalloc</code> are also additional GC options that verify 170only the state of RosAlloc’s internal accounting. The main things verified are 171that magic values match expected constants, and free blocks of memory are all 172registered in the <code>free_page_runs_</code> map. 173</ul> 174 175<h3 id=using_the_tlab_allocator_option>Using the TLAB allocator option</h3> 176 177<p>Currently, the only option that changes the allocator used without affecting 178the active GC type is the TLAB option. This option is not available through 179system properties but can be enabled by passing in -<code>XX:UseTLAB</code> to 180<code>dalvikvm</code>. This option enables faster allocation by having a 181shorter allocation code path. Since this option requires using either the SS or 182GSS GC types, which have rather long pauses, it is not enabled by default.</p> 183 184<h2 id=performance>Performance</h2> 185 186<p>There are two main tools that can be used to measure GC performance. GC timing 187dumps and systrace. The most visual way to measure GC performance problems 188would be to use systrace to determine which GCs are causing long pauses or 189preempting application threads. Although the ART GC is relatively efficient, 190there are still a few ways to get performance problems through excessive 191allocations or bad mutator behavior.</p> 192 193<h3 id=ergonomics>Ergonomics</h3> 194 195<p>Compared to Dalvik, ART has a few key differences regarding GC ergonomics. One 196of the major improvements compared to Dalvik is no longer having GC for alloc 197in cases where we start the concurrent GC later than needed. However, there is 198one downside to this, not blocking on the GC can cause the heap to grow more 199than Dalvik in some circumstances. Fortunately, ART has heap compaction, which 200mitigates this issue by defragmenting the heap when the process changes to a 201background process state.</p> 202 203<p>The CMS GC ergonomics have two types of GCs that are regularly run. Ideally, 204the GC ergonomics will run the generational sticky CMS more often than the 205partial CMS. The GC does sticky CMS until the throughput (calculated by bytes 206freed / second of GC duration) of the last GC is less than the mean throughput 207of partial CMS. When this occurs, the ergonomics plan the next concurrent GC to 208be a partial CMS instead of sticky CMS. After the partial CMS completes, the 209ergonomics changes the next GC back to sticky CMS. One key factor that makes 210the ergonomics work is that sticky CMS doesn’t adjust the heap footprint limit 211after it completes. This causes sticky CMS to happen more and more often until 212the throughput is lower than partial CMS, which ends up growing the heap.</p> 213 214<h3 id=using_sigquit_to_obtain_gc_performance_info>Using SIGQUIT to obtain GC performance info</h3> 215 216<p>It is possible to get GC performance timings for apps by sending SIGQUIT to 217already running apps or passing in -<code>XX:DumpGCPerformanceOnShutdown</code> 218to <code>dalvikvm</code> when starting a command line program. When an app gets 219the ANR request signal (SIGQUIT) it dumps information related to its locks, 220thread stacks, and GC performance.</p> 221 222<p>The way to get GC timing dumps is to use:<br> 223<code>$ adb shell kill -S QUIT <pid></code></p> 224 225<p>This creates a <code>traces.txt</code> file in <code>/data/anr/</code>. This 226file contains some ANR dumps as well as GC timings. You can locate the 227GC timings by searching for: “Dumping cumulative Gc timings” These timings will 228show a few things that may be of interest. It will show the histogram info for 229each GC type’s phases and pauses. The pauses are usually more important to look 230at. For example:</p> 231 232<pre> 233sticky concurrent mark sweep paused: Sum: 5.491ms 99% C.I. 1.464ms-2.133ms Avg: 1.830ms Max: 2.133ms 234</pre> 235 236<p><code>This</code> shows that the average pause was 1.83ms. This should be low enough to not 237cause missed frames in most applications and shouldn’t be a concern.</p> 238 239<p>Another area of interest is time to suspend. What time to suspend measures is 240how long it takes a thread to reach a suspend point after the GC requests that 241it suspends. This time is included in the GC pauses, so it is useful to 242determine if long pauses are caused by the GC being slow or the thread 243suspending slowly. Here is an example of what a normal time to suspend 244resembles on a Nexus 5:</p> 245 246<pre> 247suspend all histogram: Sum: 1.513ms 99% C.I. 3us-546.560us Avg: 47.281us Max: 601us 248</pre> 249 250<p>There are also a few other areas of interest, such as total time spent, GC 251throughput, etc. Examples:</p> 252 253<pre> 254Total time spent in GC: 502.251ms 255Mean GC size throughput: 92MB/s 256Mean GC object throughput: 1.54702e+06 objects/s 257</pre> 258 259<p>Here is an example of how to dump the GC timings of an already running app: 260 261<pre> 262$ adb shell kill -s QUIT <pid> 263$ adb pull /data/anr/traces.txt 264</pre> 265 266<p>At this point the GC timings are inside of traces.txt. Here is example output 267from Google maps:</p> 268 269<pre> 270Start Dumping histograms for 34 iterations for sticky concurrent mark sweep 271ScanGrayAllocSpaceObjects: Sum: 196.174ms 99% C.I. 0.011ms-11.615ms Avg: 1.442ms Max: 14.091ms 272FreeList: Sum: 140.457ms 99% C.I. 6us-1676.749us Avg: 128.505us Max: 9886us 273MarkRootsCheckpoint: Sum: 110.687ms 99% C.I. 0.056ms-9.515ms Avg: 1.627ms Max: 10.280ms 274SweepArray: Sum: 78.727ms 99% C.I. 0.121ms-11.780ms Avg: 2.315ms Max: 12.744ms 275ProcessMarkStack: Sum: 77.825ms 99% C.I. 1.323us-9120us Avg: 576.481us Max: 10185us 276(Paused)ScanGrayObjects: Sum: 32.538ms 99% C.I. 286us-3235.500us Avg: 986us Max: 3434us 277AllocSpaceClearCards: Sum: 30.592ms 99% C.I. 10us-2249.999us Avg: 224.941us Max: 4765us 278MarkConcurrentRoots: Sum: 30.245ms 99% C.I. 3us-3017.999us Avg: 444.779us Max: 3774us 279ReMarkRoots: Sum: 13.144ms 99% C.I. 66us-712us Avg: 386.588us Max: 712us 280ScanGrayImageSpaceObjects: Sum: 13.075ms 99% C.I. 29us-2538.999us Avg: 192.279us Max: 3080us 281MarkingPhase: Sum: 9.743ms 99% C.I. 170us-518us Avg: 286.558us Max: 518us 282SweepSystemWeaks: Sum: 8.046ms 99% C.I. 28us-479us Avg: 236.647us Max: 479us 283MarkNonThreadRoots: Sum: 5.215ms 99% C.I. 31us-698.999us Avg: 76.691us Max: 703us 284ImageModUnionClearCards: Sum: 2.708ms 99% C.I. 26us-92us Avg: 39.823us Max: 92us 285ScanGrayZygoteSpaceObjects: Sum: 2.488ms 99% C.I. 19us-250.499us Avg: 37.696us Max: 295us 286ResetStack: Sum: 2.226ms 99% C.I. 24us-449us Avg: 65.470us Max: 452us 287ZygoteModUnionClearCards: Sum: 2.124ms 99% C.I. 18us-233.999us Avg: 32.181us Max: 291us 288FinishPhase: Sum: 1.881ms 99% C.I. 31us-431.999us Avg: 55.323us Max: 466us 289RevokeAllThreadLocalAllocationStacks: Sum: 1.749ms 99% C.I. 8us-349us Avg: 51.441us Max: 377us 290EnqueueFinalizerReferences: Sum: 1.513ms 99% C.I. 3us-201us Avg: 44.500us Max: 201us 291ProcessReferences: Sum: 438us 99% C.I. 3us-212us Avg: 12.882us Max: 212us 292ProcessCards: Sum: 381us 99% C.I. 4us-17us Avg: 5.602us Max: 17us 293PreCleanCards: Sum: 363us 99% C.I. 8us-17us Avg: 10.676us Max: 17us 294ReclaimPhase: Sum: 357us 99% C.I. 7us-91.500us Avg: 10.500us Max: 93us 295(Paused)PausePhase: Sum: 312us 99% C.I. 7us-15us Avg: 9.176us Max: 15us 296SwapBitmaps: Sum: 166us 99% C.I. 4us-8us Avg: 4.882us Max: 8us 297(Paused)ScanGrayAllocSpaceObjects: Sum: 126us 99% C.I. 14us-112us Avg: 63us Max: 112us 298MarkRoots: Sum: 121us 99% C.I. 2us-7us Avg: 3.558us Max: 7us 299(Paused)ScanGrayImageSpaceObjects: Sum: 68us 99% C.I. 68us-68us Avg: 68us Max: 68us 300BindBitmaps: Sum: 50us 99% C.I. 1us-3us Avg: 1.470us Max: 3us 301UnBindBitmaps: Sum: 49us 99% C.I. 1us-3us Avg: 1.441us Max: 3us 302SwapStacks: Sum: 47us 99% C.I. 1us-3us Avg: 1.382us Max: 3us 303RecordFree: Sum: 42us 99% C.I. 1us-3us Avg: 1.235us Max: 3us 304ForwardSoftReferences: Sum: 37us 99% C.I. 1us-2us Avg: 1.121us Max: 2us 305InitializePhase: Sum: 36us 99% C.I. 1us-2us Avg: 1.058us Max: 2us 306FindDefaultSpaceBitmap: Sum: 32us 99% C.I. 250ns-1000ns Avg: 941ns Max: 1000ns 307(Paused)ProcessMarkStack: Sum: 5us 99% C.I. 250ns-3000ns Avg: 147ns Max: 3000ns 308PreSweepingGcVerification: Sum: 0 99% C.I. 0ns-0ns Avg: 0ns Max: 0ns 309Done Dumping histograms 310sticky concurrent mark sweep paused: Sum: 63.268ms 99% C.I. 0.308ms-8.405ms 311Avg: 1.860ms Max: 8.883ms 312sticky concurrent mark sweep total time: 763.787ms mean time: 22.464ms 313sticky concurrent mark sweep freed: 1072342 objects with total size 75MB 314sticky concurrent mark sweep throughput: 1.40543e+06/s / 98MB/s 315Total time spent in GC: 4.805s 316Mean GC size throughput: 18MB/s 317Mean GC object throughput: 330899 objects/s 318Total number of allocations 2015049 319Total bytes allocated 177MB 320Free memory 4MB 321Free memory until GC 4MB 322Free memory until OOME 425MB 323Total memory 90MB 324Max memory 512MB 325Zygote space size 4MB 326Total mutator paused time: 229.566ms 327Total time waiting for GC to complete: 187.655us 328</pre> 329 330<h2 id=tools_for_analyzing_gc_correctness_problems>Tools for analyzing GC correctness problems</h2> 331 332<p>There are various things that can cause crashes inside of ART. Crashes that 333occur reading or writing to object fields may indicate heap corruption. If the 334GC crashes when it is running, it could also point to heap corruption. There 335are various things that can cause heap corruption, the most common cause is 336probably incorrect app code. Fortunately, there are tools to debug GC and 337heap-related crashes. These include the heap verification options specified 338above, valgrind, and CheckJNI.</p> 339 340<h3 id=checkjni>CheckJNI</h3> 341 342<p>Another way to verify app behavior is to use CheckJNI. CheckJNI is a mode that 343adds additional JNI checks; these aren’t enabled by default due to performance 344reasons. The checks will catch a few errors that could cause heap corruption 345such as using invalid/stale local and global references. Here is how to enable 346CheckJNI:</p> 347 348<pre> 349$ adb shell setprop dalvik.vm.checkjni true 350</pre> 351 352<p>Forcecopy mode is another part of CheckJNI that is very useful for detecting 353writes past the end of array regions. When enabled, forcecopy causes the array 354access JNI functions to always return copies with red zones. A <em>red 355zone</em> is a region at the end/start of the returned pointer that has a 356special value, which is verified when the array is released. If the values in 357the red zone don’t match what is expected, this usually means a buffer overrun 358or underrun occurred. This would cause CheckJNI to abort. Here is how to enable 359forcecopy mode:</p> 360 361<pre> 362$ adb shell setprop dalvik.vm.jniopts forcecopy 363</pre> 364 365<p>One example of an error that CheckJNI should catch is writing past the end of 366an array obtained from <code>GetPrimitiveArrayCritical</code>. This operation 367would very likely corrupt the Java heap. If the write is 368within the CheckJNI red zone area, then CheckJNI would catch the issue when the 369corresponding <code>ReleasePrimitiveArrayCritical</code> is called. Otherwise, 370the write will end up corrupting some random object in 371the Java heap and possibly causing a future GC crash. If the corrupted memory 372is a reference field, then the GC may catch the error and print a “<em>Tried to 373mark <ptr> not contained by any spaces</em>” error.</p> 374 375<p>This error occurs when the GC attempts to mark an object for which it can’t 376find a space. After this check fails, the GC traverses the roots and tries to 377see if the invalid object is a root. From here, there are two options: The 378object is a root or a non-root object.</p> 379 380<h3 id=valgrind>Valgrind</h3> 381 382<p>The ART heap supports optional valgrind instrumentation, which provides a 383way to detect reads and writes to and from an invalid heap address. ART detects 384when the app is running under valgrind and inserts red zones before and after 385each object allocation. If there are any reads or writes to these red zones, 386valgrind prints an error. One example of when this could happen is if you read 387or write past the end of an array’s elements while using direct array access 388through JNI. Since the AOT compilers use implicit null checks, it is 389recommended to use eng builds for running valgrind. Another thing to note is 390that valgrind is orders of magnitude slower than normal execution.</p> 391 392<p>Here is an example use:</p> 393 394<pre> 395# build and install 396$ mmm external/valgrind 397$ adb remount && adb sync 398# disable selinux 399$ adb shell setenforce 0 400$ adb shell setprop wrap.com.android.calculator2 401"TMPDIR=/data/data/com.android.calculator2 logwrapper valgrind" 402# push symbols 403$ adb shell mkdir /data/local/symbols 404$ adb push $OUT/symbols /data/local/symbols 405$ adb logcat 406</pre> 407 408 409<h3 id=invalid_root_example>Invalid root example</h3> 410 411<p>In the case where the object is actually an invalid root, it will print some 412useful information: 413<code>art E 5955 5955 art/runtime/gc/collector/mark_sweep.cc:383] Tried to mark 0x2 414not contained by any spaces</code></p> 415 416<pre> 417art E 5955 5955 art/runtime/gc/collector/mark_sweep.cc:384] Attempting see if 418it's a bad root 419art E 5955 5955 art/runtime/gc/collector/mark_sweep.cc:485] Found invalid 420root: 0x2 421art E 5955 5955 art/runtime/gc/collector/mark_sweep.cc:486] 422Type=RootJavaFrame thread_id=1 location=Visiting method 'java.lang.Object 423com.google.gwt.corp.collections.JavaReadableJsArray.get(int)' at dex PC 0x0002 424(native PC 0xf19609d9) vreg=1 425</pre> 426 427<p>In this case, <code>vreg 1</code> inside of 428<code>com.google.gwt.corp.collections.JavaReadableJsArray.get</code> is 429supposed to contain a heap reference but actually contains an invalid pointer 430of address <code>0x2</code>. This is clearly an invalid root. The next step to 431debug this issue would be to use <code>oatdump</code> on the oat file and look 432at the method that has the invalid root. In this case, the error turned out to 433be a compiler bug in the x86 backend. Here is the changelist that fixed it: <a 434href="https://android-review.googlesource.com/#/c/133932/">https://android-review.googlesource.com/#/c/133932/</a></p> 435 436<h3 id=corrupted_object_example>Corrupted object example</h3> 437 438<p>In the case where the object isn’t a root, output similar to the following 439prints:</p> 440 441<pre> 44201-15 12:38:00.196 1217 1238 E art : Attempting see if it's a bad root 44301-15 12:38:00.196 1217 1238 F art : 444art/runtime/gc/collector/mark_sweep.cc:381] Can't mark invalid object 445</pre> 446 447<p>When heap corruption isn’t an invalid root, it is unfortunately hard to debug. 448This error message indicates that there was at least one object in the heap 449that was pointing to the invalid object.</p> 450