1===================================== 2Garbage Collection with LLVM 3===================================== 4 5.. contents:: 6 :local: 7 8Abstract 9======== 10 11This document covers how to integrate LLVM into a compiler for a language which 12supports garbage collection. **Note that LLVM itself does not provide a 13garbage collector.** You must provide your own. 14 15Quick Start 16============ 17 18First, you should pick a collector strategy. LLVM includes a number of built 19in ones, but you can also implement a loadable plugin with a custom definition. 20Note that the collector strategy is a description of how LLVM should generate 21code such that it interacts with your collector and runtime, not a description 22of the collector itself. 23 24Next, mark your generated functions as using your chosen collector strategy. 25From c++, you can call: 26 27.. code-block:: c++ 28 29 F.setGC(<collector description name>); 30 31 32This will produce IR like the following fragment: 33 34.. code-block:: llvm 35 36 define void @foo() gc "<collector description name>" { ... } 37 38 39When generating LLVM IR for your functions, you will need to: 40 41* Use ``@llvm.gcread`` and/or ``@llvm.gcwrite`` in place of standard load and 42 store instructions. These intrinsics are used to represent load and store 43 barriers. If you collector does not require such barriers, you can skip 44 this step. 45 46* Use the memory allocation routines provided by your garbage collector's 47 runtime library. 48 49* If your collector requires them, generate type maps according to your 50 runtime's binary interface. LLVM is not involved in the process. In 51 particular, the LLVM type system is not suitable for conveying such 52 information though the compiler. 53 54* Insert any coordination code required for interacting with your collector. 55 Many collectors require running application code to periodically check a 56 flag and conditionally call a runtime function. This is often referred to 57 as a safepoint poll. 58 59You will need to identify roots (i.e. references to heap objects your collector 60needs to know about) in your generated IR, so that LLVM can encode them into 61your final stack maps. Depending on the collector strategy chosen, this is 62accomplished by using either the ``@llvm.gcroot`` intrinsics or an 63``gc.statepoint`` relocation sequence. 64 65Don't forget to create a root for each intermediate value that is generated when 66evaluating an expression. In ``h(f(), g())``, the result of ``f()`` could 67easily be collected if evaluating ``g()`` triggers a collection. 68 69Finally, you need to link your runtime library with the generated program 70executable (for a static compiler) or ensure the appropriate symbols are 71available for the runtime linker (for a JIT compiler). 72 73 74Introduction 75============ 76 77What is Garbage Collection? 78--------------------------- 79 80Garbage collection is a widely used technique that frees the programmer from 81having to know the lifetimes of heap objects, making software easier to produce 82and maintain. Many programming languages rely on garbage collection for 83automatic memory management. There are two primary forms of garbage collection: 84conservative and accurate. 85 86Conservative garbage collection often does not require any special support from 87either the language or the compiler: it can handle non-type-safe programming 88languages (such as C/C++) and does not require any special information from the 89compiler. The `Boehm collector 90<http://www.hpl.hp.com/personal/Hans_Boehm/gc/>`__ is an example of a 91state-of-the-art conservative collector. 92 93Accurate garbage collection requires the ability to identify all pointers in the 94program at run-time (which requires that the source-language be type-safe in 95most cases). Identifying pointers at run-time requires compiler support to 96locate all places that hold live pointer variables at run-time, including the 97:ref:`processor stack and registers <gcroot>`. 98 99Conservative garbage collection is attractive because it does not require any 100special compiler support, but it does have problems. In particular, because the 101conservative garbage collector cannot *know* that a particular word in the 102machine is a pointer, it cannot move live objects in the heap (preventing the 103use of compacting and generational GC algorithms) and it can occasionally suffer 104from memory leaks due to integer values that happen to point to objects in the 105program. In addition, some aggressive compiler transformations can break 106conservative garbage collectors (though these seem rare in practice). 107 108Accurate garbage collectors do not suffer from any of these problems, but they 109can suffer from degraded scalar optimization of the program. In particular, 110because the runtime must be able to identify and update all pointers active in 111the program, some optimizations are less effective. In practice, however, the 112locality and performance benefits of using aggressive garbage collection 113techniques dominates any low-level losses. 114 115This document describes the mechanisms and interfaces provided by LLVM to 116support accurate garbage collection. 117 118Goals and non-goals 119------------------- 120 121LLVM's intermediate representation provides :ref:`garbage collection intrinsics 122<gc_intrinsics>` that offer support for a broad class of collector models. For 123instance, the intrinsics permit: 124 125* semi-space collectors 126 127* mark-sweep collectors 128 129* generational collectors 130 131* incremental collectors 132 133* concurrent collectors 134 135* cooperative collectors 136 137* reference counting 138 139We hope that the support built into the LLVM IR is sufficient to support a 140broad class of garbage collected languages including Scheme, ML, Java, C#, 141Perl, Python, Lua, Ruby, other scripting languages, and more. 142 143Note that LLVM **does not itself provide a garbage collector** --- this should 144be part of your language's runtime library. LLVM provides a framework for 145describing the garbage collectors requirements to the compiler. In particular, 146LLVM provides support for generating stack maps at call sites, polling for a 147safepoint, and emitting load and store barriers. You can also extend LLVM - 148possibly through a loadable :ref:`code generation plugins <plugin>` - to 149generate code and data structures which conforms to the *binary interface* 150specified by the *runtime library*. This is similar to the relationship between 151LLVM and DWARF debugging info, for example. The difference primarily lies in 152the lack of an established standard in the domain of garbage collection --- thus 153the need for a flexible extension mechanism. 154 155The aspects of the binary interface with which LLVM's GC support is 156concerned are: 157 158* Creation of GC safepoints within code where collection is allowed to execute 159 safely. 160 161* Computation of the stack map. For each safe point in the code, object 162 references within the stack frame must be identified so that the collector may 163 traverse and perhaps update them. 164 165* Write barriers when storing object references to the heap. These are commonly 166 used to optimize incremental scans in generational collectors. 167 168* Emission of read barriers when loading object references. These are useful 169 for interoperating with concurrent collectors. 170 171There are additional areas that LLVM does not directly address: 172 173* Registration of global roots with the runtime. 174 175* Registration of stack map entries with the runtime. 176 177* The functions used by the program to allocate memory, trigger a collection, 178 etc. 179 180* Computation or compilation of type maps, or registration of them with the 181 runtime. These are used to crawl the heap for object references. 182 183In general, LLVM's support for GC does not include features which can be 184adequately addressed with other features of the IR and does not specify a 185particular binary interface. On the plus side, this means that you should be 186able to integrate LLVM with an existing runtime. On the other hand, it can 187have the effect of leaving a lot of work for the developer of a novel 188language. We try to mitigate this by providing built in collector strategy 189descriptions that can work with many common collector designs and easy 190extension points. If you don't already have a specific binary interface 191you need to support, we recommend trying to use one of these built in collector 192strategies. 193 194.. _gc_intrinsics: 195 196LLVM IR Features 197================ 198 199This section describes the garbage collection facilities provided by the 200:doc:`LLVM intermediate representation <LangRef>`. The exact behavior of these 201IR features is specified by the selected :ref:`GC strategy description 202<plugin>`. 203 204Specifying GC code generation: ``gc "..."`` 205------------------------------------------- 206 207.. code-block:: llvm 208 209 define <returntype> @name(...) gc "name" { ... } 210 211The ``gc`` function attribute is used to specify the desired GC strategy to the 212compiler. Its programmatic equivalent is the ``setGC`` method of ``Function``. 213 214Setting ``gc "name"`` on a function triggers a search for a matching subclass 215of GCStrategy. Some collector strategies are built in. You can add others 216using either the loadable plugin mechanism, or by patching your copy of LLVM. 217It is the selected GC strategy which defines the exact nature of the code 218generated to support GC. If none is found, the compiler will raise an error. 219 220Specifying the GC style on a per-function basis allows LLVM to link together 221programs that use different garbage collection algorithms (or none at all). 222 223.. _gcroot: 224 225Identifying GC roots on the stack 226---------------------------------- 227 228LLVM currently supports two different mechanisms for describing references in 229compiled code at safepoints. ``llvm.gcroot`` is the older mechanism; 230``gc.statepoint`` has been added more recently. At the moment, you can choose 231either implementation (on a per :ref:`GC strategy <plugin>` basis). Longer 232term, we will probably either migrate away from ``llvm.gcroot`` entirely, or 233substantially merge their implementations. Note that most new development 234work is focused on ``gc.statepoint``. 235 236Using ``gc.statepoint`` 237^^^^^^^^^^^^^^^^^^^^^^^^ 238:doc:`This page <Statepoints>` contains detailed documentation for 239``gc.statepoint``. 240 241Using ``llvm.gcwrite`` 242^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 243 244.. code-block:: llvm 245 246 void @llvm.gcroot(i8** %ptrloc, i8* %metadata) 247 248The ``llvm.gcroot`` intrinsic is used to inform LLVM that a stack variable 249references an object on the heap and is to be tracked for garbage collection. 250The exact impact on generated code is specified by the Function's selected 251:ref:`GC strategy <plugin>`. All calls to ``llvm.gcroot`` **must** reside 252inside the first basic block. 253 254The first argument **must** be a value referring to an alloca instruction or a 255bitcast of an alloca. The second contains a pointer to metadata that should be 256associated with the pointer, and **must** be a constant or global value 257address. If your target collector uses tags, use a null pointer for metadata. 258 259A compiler which performs manual SSA construction **must** ensure that SSA 260values representing GC references are stored in to the alloca passed to the 261respective ``gcroot`` before every call site and reloaded after every call. 262A compiler which uses mem2reg to raise imperative code using ``alloca`` into 263SSA form need only add a call to ``@llvm.gcroot`` for those variables which 264are pointers into the GC heap. 265 266It is also important to mark intermediate values with ``llvm.gcroot``. For 267example, consider ``h(f(), g())``. Beware leaking the result of ``f()`` in the 268case that ``g()`` triggers a collection. Note, that stack variables must be 269initialized and marked with ``llvm.gcroot`` in function's prologue. 270 271The ``%metadata`` argument can be used to avoid requiring heap objects to have 272'isa' pointers or tag bits. [Appel89_, Goldberg91_, Tolmach94_] If specified, 273its value will be tracked along with the location of the pointer in the stack 274frame. 275 276Consider the following fragment of Java code: 277 278.. code-block:: java 279 280 { 281 Object X; // A null-initialized reference to an object 282 ... 283 } 284 285This block (which may be located in the middle of a function or in a loop nest), 286could be compiled to this LLVM code: 287 288.. code-block:: llvm 289 290 Entry: 291 ;; In the entry block for the function, allocate the 292 ;; stack space for X, which is an LLVM pointer. 293 %X = alloca %Object* 294 295 ;; Tell LLVM that the stack space is a stack root. 296 ;; Java has type-tags on objects, so we pass null as metadata. 297 %tmp = bitcast %Object** %X to i8** 298 call void @llvm.gcroot(i8** %tmp, i8* null) 299 ... 300 301 ;; "CodeBlock" is the block corresponding to the start 302 ;; of the scope above. 303 CodeBlock: 304 ;; Java null-initializes pointers. 305 store %Object* null, %Object** %X 306 307 ... 308 309 ;; As the pointer goes out of scope, store a null value into 310 ;; it, to indicate that the value is no longer live. 311 store %Object* null, %Object** %X 312 ... 313 314Reading and writing references in the heap 315------------------------------------------ 316 317Some collectors need to be informed when the mutator (the program that needs 318garbage collection) either reads a pointer from or writes a pointer to a field 319of a heap object. The code fragments inserted at these points are called *read 320barriers* and *write barriers*, respectively. The amount of code that needs to 321be executed is usually quite small and not on the critical path of any 322computation, so the overall performance impact of the barrier is tolerable. 323 324Barriers often require access to the *object pointer* rather than the *derived 325pointer* (which is a pointer to the field within the object). Accordingly, 326these intrinsics take both pointers as separate arguments for completeness. In 327this snippet, ``%object`` is the object pointer, and ``%derived`` is the derived 328pointer: 329 330.. code-block:: llvm 331 332 ;; An array type. 333 %class.Array = type { %class.Object, i32, [0 x %class.Object*] } 334 ... 335 336 ;; Load the object pointer from a gcroot. 337 %object = load %class.Array** %object_addr 338 339 ;; Compute the derived pointer. 340 %derived = getelementptr %object, i32 0, i32 2, i32 %n 341 342LLVM does not enforce this relationship between the object and derived pointer 343(although a particular :ref:`collector strategy <plugin>` might). However, it 344would be an unusual collector that violated it. 345 346The use of these intrinsics is naturally optional if the target GC does not 347require the corresponding barrier. The GC strategy used with such a collector 348should replace the intrinsic calls with the corresponding ``load`` or 349``store`` instruction if they are used. 350 351One known deficiency with the current design is that the barrier intrinsics do 352not include the size or alignment of the underlying operation performed. It is 353currently assumed that the operation is of pointer size and the alignment is 354assumed to be the target machine's default alignment. 355 356Write barrier: ``llvm.gcwrite`` 357^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 358 359.. code-block:: llvm 360 361 void @llvm.gcwrite(i8* %value, i8* %object, i8** %derived) 362 363For write barriers, LLVM provides the ``llvm.gcwrite`` intrinsic function. It 364has exactly the same semantics as a non-volatile ``store`` to the derived 365pointer (the third argument). The exact code generated is specified by the 366Function's selected :ref:`GC strategy <plugin>`. 367 368Many important algorithms require write barriers, including generational and 369concurrent collectors. Additionally, write barriers could be used to implement 370reference counting. 371 372Read barrier: ``llvm.gcread`` 373^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 374 375.. code-block:: llvm 376 377 i8* @llvm.gcread(i8* %object, i8** %derived) 378 379For read barriers, LLVM provides the ``llvm.gcread`` intrinsic function. It has 380exactly the same semantics as a non-volatile ``load`` from the derived pointer 381(the second argument). The exact code generated is specified by the Function's 382selected :ref:`GC strategy <plugin>`. 383 384Read barriers are needed by fewer algorithms than write barriers, and may have a 385greater performance impact since pointer reads are more frequent than writes. 386 387.. _plugin: 388 389.. _builtin-gc-strategies: 390 391Built In GC Strategies 392====================== 393 394LLVM includes built in support for several varieties of garbage collectors. 395 396The Shadow Stack GC 397---------------------- 398 399To use this collector strategy, mark your functions with: 400 401.. code-block:: c++ 402 403 F.setGC("shadow-stack"); 404 405Unlike many GC algorithms which rely on a cooperative code generator to compile 406stack maps, this algorithm carefully maintains a linked list of stack roots 407[:ref:`Henderson2002 <henderson02>`]. This so-called "shadow stack" mirrors the 408machine stack. Maintaining this data structure is slower than using a stack map 409compiled into the executable as constant data, but has a significant portability 410advantage because it requires no special support from the target code generator, 411and does not require tricky platform-specific code to crawl the machine stack. 412 413The tradeoff for this simplicity and portability is: 414 415* High overhead per function call. 416 417* Not thread-safe. 418 419Still, it's an easy way to get started. After your compiler and runtime are up 420and running, writing a :ref:`plugin <plugin>` will allow you to take advantage 421of :ref:`more advanced GC features <collector-algos>` of LLVM in order to 422improve performance. 423 424 425The shadow stack doesn't imply a memory allocation algorithm. A semispace 426collector or building atop ``malloc`` are great places to start, and can be 427implemented with very little code. 428 429When it comes time to collect, however, your runtime needs to traverse the stack 430roots, and for this it needs to integrate with the shadow stack. Luckily, doing 431so is very simple. (This code is heavily commented to help you understand the 432data structure, but there are only 20 lines of meaningful code.) 433 434.. code-block:: c++ 435 436 /// @brief The map for a single function's stack frame. One of these is 437 /// compiled as constant data into the executable for each function. 438 /// 439 /// Storage of metadata values is elided if the %metadata parameter to 440 /// @llvm.gcroot is null. 441 struct FrameMap { 442 int32_t NumRoots; //< Number of roots in stack frame. 443 int32_t NumMeta; //< Number of metadata entries. May be < NumRoots. 444 const void *Meta[0]; //< Metadata for each root. 445 }; 446 447 /// @brief A link in the dynamic shadow stack. One of these is embedded in 448 /// the stack frame of each function on the call stack. 449 struct StackEntry { 450 StackEntry *Next; //< Link to next stack entry (the caller's). 451 const FrameMap *Map; //< Pointer to constant FrameMap. 452 void *Roots[0]; //< Stack roots (in-place array). 453 }; 454 455 /// @brief The head of the singly-linked list of StackEntries. Functions push 456 /// and pop onto this in their prologue and epilogue. 457 /// 458 /// Since there is only a global list, this technique is not threadsafe. 459 StackEntry *llvm_gc_root_chain; 460 461 /// @brief Calls Visitor(root, meta) for each GC root on the stack. 462 /// root and meta are exactly the values passed to 463 /// @llvm.gcroot. 464 /// 465 /// Visitor could be a function to recursively mark live objects. Or it 466 /// might copy them to another heap or generation. 467 /// 468 /// @param Visitor A function to invoke for every GC root on the stack. 469 void visitGCRoots(void (*Visitor)(void **Root, const void *Meta)) { 470 for (StackEntry *R = llvm_gc_root_chain; R; R = R->Next) { 471 unsigned i = 0; 472 473 // For roots [0, NumMeta), the metadata pointer is in the FrameMap. 474 for (unsigned e = R->Map->NumMeta; i != e; ++i) 475 Visitor(&R->Roots[i], R->Map->Meta[i]); 476 477 // For roots [NumMeta, NumRoots), the metadata pointer is null. 478 for (unsigned e = R->Map->NumRoots; i != e; ++i) 479 Visitor(&R->Roots[i], NULL); 480 } 481 } 482 483 484The 'Erlang' and 'Ocaml' GCs 485----------------------------- 486 487LLVM ships with two example collectors which leverage the ``gcroot`` 488mechanisms. To our knowledge, these are not actually used by any language 489runtime, but they do provide a reasonable starting point for someone interested 490in writing an ``gcroot`` compatible GC plugin. In particular, these are the 491only in tree examples of how to produce a custom binary stack map format using 492a ``gcroot`` strategy. 493 494As there names imply, the binary format produced is intended to model that 495used by the Erlang and OCaml compilers respectively. 496 497.. _statepoint_example_gc: 498 499The Statepoint Example GC 500------------------------- 501 502.. code-block:: c++ 503 504 F.setGC("statepoint-example"); 505 506This GC provides an example of how one might use the infrastructure provided 507by ``gc.statepoint``. This example GC is compatible with the 508:ref:`PlaceSafepoints` and :ref:`RewriteStatepointsForGC` utility passes 509which simplify ``gc.statepoint`` sequence insertion. If you need to build a 510custom GC strategy around the ``gc.statepoints`` mechanisms, it is recommended 511that you use this one as a starting point. 512 513This GC strategy does not support read or write barriers. As a result, these 514intrinsics are lowered to normal loads and stores. 515 516The stack map format generated by this GC strategy can be found in the 517:ref:`stackmap-section` using a format documented :ref:`here 518<statepoint-stackmap-format>`. This format is intended to be the standard 519format supported by LLVM going forward. 520 521The CoreCLR GC 522------------------------- 523 524.. code-block:: c++ 525 526 F.setGC("coreclr"); 527 528This GC leverages the ``gc.statepoint`` mechanism to support the 529`CoreCLR <https://github.com/dotnet/coreclr>`__ runtime. 530 531Support for this GC strategy is a work in progress. This strategy will 532differ from 533:ref:`statepoint-example GC<statepoint_example_gc>` strategy in 534certain aspects like: 535 536* Base-pointers of interior pointers are not explicitly 537 tracked and reported. 538 539* A different format is used for encoding stack maps. 540 541* Safe-point polls are only needed before loop-back edges 542 and before tail-calls (not needed at function-entry). 543 544Custom GC Strategies 545==================== 546 547If none of the built in GC strategy descriptions met your needs above, you will 548need to define a custom GCStrategy and possibly, a custom LLVM pass to perform 549lowering. Your best example of where to start defining a custom GCStrategy 550would be to look at one of the built in strategies. 551 552You may be able to structure this additional code as a loadable plugin library. 553Loadable plugins are sufficient if all you need is to enable a different 554combination of built in functionality, but if you need to provide a custom 555lowering pass, you will need to build a patched version of LLVM. If you think 556you need a patched build, please ask for advice on llvm-dev. There may be an 557easy way we can extend the support to make it work for your use case without 558requiring a custom build. 559 560Collector Requirements 561---------------------- 562 563You should be able to leverage any existing collector library that includes the following elements: 564 565#. A memory allocator which exposes an allocation function your compiled 566 code can call. 567 568#. A binary format for the stack map. A stack map describes the location 569 of references at a safepoint and is used by precise collectors to identify 570 references within a stack frame on the machine stack. Note that collectors 571 which conservatively scan the stack don't require such a structure. 572 573#. A stack crawler to discover functions on the call stack, and enumerate the 574 references listed in the stack map for each call site. 575 576#. A mechanism for identifying references in global locations (e.g. global 577 variables). 578 579#. If you collector requires them, an LLVM IR implementation of your collectors 580 load and store barriers. Note that since many collectors don't require 581 barriers at all, LLVM defaults to lowering such barriers to normal loads 582 and stores unless you arrange otherwise. 583 584 585Implementing a collector plugin 586------------------------------- 587 588User code specifies which GC code generation to use with the ``gc`` function 589attribute or, equivalently, with the ``setGC`` method of ``Function``. 590 591To implement a GC plugin, it is necessary to subclass ``llvm::GCStrategy``, 592which can be accomplished in a few lines of boilerplate code. LLVM's 593infrastructure provides access to several important algorithms. For an 594uncontroversial collector, all that remains may be to compile LLVM's computed 595stack map to assembly code (using the binary representation expected by the 596runtime library). This can be accomplished in about 100 lines of code. 597 598This is not the appropriate place to implement a garbage collected heap or a 599garbage collector itself. That code should exist in the language's runtime 600library. The compiler plugin is responsible for generating code which conforms 601to the binary interface defined by library, most essentially the :ref:`stack map 602<stack-map>`. 603 604To subclass ``llvm::GCStrategy`` and register it with the compiler: 605 606.. code-block:: c++ 607 608 // lib/MyGC/MyGC.cpp - Example LLVM GC plugin 609 610 #include "llvm/CodeGen/GCStrategy.h" 611 #include "llvm/CodeGen/GCMetadata.h" 612 #include "llvm/Support/Compiler.h" 613 614 using namespace llvm; 615 616 namespace { 617 class LLVM_LIBRARY_VISIBILITY MyGC : public GCStrategy { 618 public: 619 MyGC() {} 620 }; 621 622 GCRegistry::Add<MyGC> 623 X("mygc", "My bespoke garbage collector."); 624 } 625 626This boilerplate collector does nothing. More specifically: 627 628* ``llvm.gcread`` calls are replaced with the corresponding ``load`` 629 instruction. 630 631* ``llvm.gcwrite`` calls are replaced with the corresponding ``store`` 632 instruction. 633 634* No safe points are added to the code. 635 636* The stack map is not compiled into the executable. 637 638Using the LLVM makefiles, this code 639can be compiled as a plugin using a simple makefile: 640 641.. code-block:: make 642 643 # lib/MyGC/Makefile 644 645 LEVEL := ../.. 646 LIBRARYNAME = MyGC 647 LOADABLE_MODULE = 1 648 649 include $(LEVEL)/Makefile.common 650 651Once the plugin is compiled, code using it may be compiled using ``llc 652-load=MyGC.so`` (though MyGC.so may have some other platform-specific 653extension): 654 655:: 656 657 $ cat sample.ll 658 define void @f() gc "mygc" { 659 entry: 660 ret void 661 } 662 $ llvm-as < sample.ll | llc -load=MyGC.so 663 664It is also possible to statically link the collector plugin into tools, such as 665a language-specific compiler front-end. 666 667.. _collector-algos: 668 669Overview of available features 670------------------------------ 671 672``GCStrategy`` provides a range of features through which a plugin may do useful 673work. Some of these are callbacks, some are algorithms that can be enabled, 674disabled, or customized. This matrix summarizes the supported (and planned) 675features and correlates them with the collection techniques which typically 676require them. 677 678.. |v| unicode:: 0x2714 679 :trim: 680 681.. |x| unicode:: 0x2718 682 :trim: 683 684+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 685| Algorithm | Done | Shadow | refcount | mark- | copying | incremental | threaded | concurrent | 686| | | stack | | sweep | | | | | 687+============+======+========+==========+=======+=========+=============+==========+============+ 688| stack map | |v| | | | |x| | |x| | |x| | |x| | |x| | 689+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 690| initialize | |v| | |x| | |x| | |x| | |x| | |x| | |x| | |x| | 691| roots | | | | | | | | | 692+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 693| derived | NO | | | | | | **N**\* | **N**\* | 694| pointers | | | | | | | | | 695+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 696| **custom | |v| | | | | | | | | 697| lowering** | | | | | | | | | 698+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 699| *gcroot* | |v| | |x| | |x| | | | | | | 700+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 701| *gcwrite* | |v| | | |x| | | | |x| | | |x| | 702+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 703| *gcread* | |v| | | | | | | | |x| | 704+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 705| **safe | | | | | | | | | 706| points** | | | | | | | | | 707+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 708| *in | |v| | | | |x| | |x| | |x| | |x| | |x| | 709| calls* | | | | | | | | | 710+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 711| *before | |v| | | | | | | |x| | |x| | 712| calls* | | | | | | | | | 713+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 714| *for | NO | | | | | | **N** | **N** | 715| loops* | | | | | | | | | 716+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 717| *before | |v| | | | | | | |x| | |x| | 718| escape* | | | | | | | | | 719+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 720| emit code | NO | | | | | | **N** | **N** | 721| at safe | | | | | | | | | 722| points | | | | | | | | | 723+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 724| **output** | | | | | | | | | 725+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 726| *assembly* | |v| | | | |x| | |x| | |x| | |x| | |x| | 727+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 728| *JIT* | NO | | | **?** | **?** | **?** | **?** | **?** | 729+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 730| *obj* | NO | | | **?** | **?** | **?** | **?** | **?** | 731+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 732| live | NO | | | **?** | **?** | **?** | **?** | **?** | 733| analysis | | | | | | | | | 734+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 735| register | NO | | | **?** | **?** | **?** | **?** | **?** | 736| map | | | | | | | | | 737+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 738| \* Derived pointers only pose a hasard to copying collections. | 739+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 740| **?** denotes a feature which could be utilized if available. | 741+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 742 743To be clear, the collection techniques above are defined as: 744 745Shadow Stack 746 The mutator carefully maintains a linked list of stack roots. 747 748Reference Counting 749 The mutator maintains a reference count for each object and frees an object 750 when its count falls to zero. 751 752Mark-Sweep 753 When the heap is exhausted, the collector marks reachable objects starting 754 from the roots, then deallocates unreachable objects in a sweep phase. 755 756Copying 757 As reachability analysis proceeds, the collector copies objects from one heap 758 area to another, compacting them in the process. Copying collectors enable 759 highly efficient "bump pointer" allocation and can improve locality of 760 reference. 761 762Incremental 763 (Including generational collectors.) Incremental collectors generally have all 764 the properties of a copying collector (regardless of whether the mature heap 765 is compacting), but bring the added complexity of requiring write barriers. 766 767Threaded 768 Denotes a multithreaded mutator; the collector must still stop the mutator 769 ("stop the world") before beginning reachability analysis. Stopping a 770 multithreaded mutator is a complicated problem. It generally requires highly 771 platform-specific code in the runtime, and the production of carefully 772 designed machine code at safe points. 773 774Concurrent 775 In this technique, the mutator and the collector run concurrently, with the 776 goal of eliminating pause times. In a *cooperative* collector, the mutator 777 further aids with collection should a pause occur, allowing collection to take 778 advantage of multiprocessor hosts. The "stop the world" problem of threaded 779 collectors is generally still present to a limited extent. Sophisticated 780 marking algorithms are necessary. Read barriers may be necessary. 781 782As the matrix indicates, LLVM's garbage collection infrastructure is already 783suitable for a wide variety of collectors, but does not currently extend to 784multithreaded programs. This will be added in the future as there is 785interest. 786 787.. _stack-map: 788 789Computing stack maps 790-------------------- 791 792LLVM automatically computes a stack map. One of the most important features 793of a ``GCStrategy`` is to compile this information into the executable in 794the binary representation expected by the runtime library. 795 796The stack map consists of the location and identity of each GC root in the 797each function in the module. For each root: 798 799* ``RootNum``: The index of the root. 800 801* ``StackOffset``: The offset of the object relative to the frame pointer. 802 803* ``RootMetadata``: The value passed as the ``%metadata`` parameter to the 804 ``@llvm.gcroot`` intrinsic. 805 806Also, for the function as a whole: 807 808* ``getFrameSize()``: The overall size of the function's initial stack frame, 809 not accounting for any dynamic allocation. 810 811* ``roots_size()``: The count of roots in the function. 812 813To access the stack map, use ``GCFunctionMetadata::roots_begin()`` and 814-``end()`` from the :ref:`GCMetadataPrinter <assembly>`: 815 816.. code-block:: c++ 817 818 for (iterator I = begin(), E = end(); I != E; ++I) { 819 GCFunctionInfo *FI = *I; 820 unsigned FrameSize = FI->getFrameSize(); 821 size_t RootCount = FI->roots_size(); 822 823 for (GCFunctionInfo::roots_iterator RI = FI->roots_begin(), 824 RE = FI->roots_end(); 825 RI != RE; ++RI) { 826 int RootNum = RI->Num; 827 int RootStackOffset = RI->StackOffset; 828 Constant *RootMetadata = RI->Metadata; 829 } 830 } 831 832If the ``llvm.gcroot`` intrinsic is eliminated before code generation by a 833custom lowering pass, LLVM will compute an empty stack map. This may be useful 834for collector plugins which implement reference counting or a shadow stack. 835 836.. _init-roots: 837 838Initializing roots to null: ``InitRoots`` 839----------------------------------------- 840 841.. code-block:: c++ 842 843 MyGC::MyGC() { 844 InitRoots = true; 845 } 846 847When set, LLVM will automatically initialize each root to ``null`` upon entry to 848the function. This prevents the GC's sweep phase from visiting uninitialized 849pointers, which will almost certainly cause it to crash. This initialization 850occurs before custom lowering, so the two may be used together. 851 852Since LLVM does not yet compute liveness information, there is no means of 853distinguishing an uninitialized stack root from an initialized one. Therefore, 854this feature should be used by all GC plugins. It is enabled by default. 855 856Custom lowering of intrinsics: ``CustomRoots``, ``CustomReadBarriers``, and ``CustomWriteBarriers`` 857--------------------------------------------------------------------------------------------------- 858 859For GCs which use barriers or unusual treatment of stack roots, these 860flags allow the collector to perform arbitrary transformations of the 861LLVM IR: 862 863.. code-block:: c++ 864 865 class MyGC : public GCStrategy { 866 public: 867 MyGC() { 868 CustomRoots = true; 869 CustomReadBarriers = true; 870 CustomWriteBarriers = true; 871 } 872 }; 873 874If any of these flags are set, LLVM suppresses its default lowering for 875the corresponding intrinsics. Instead, you must provide a custom Pass 876which lowers the intrinsics as desired. If you have opted in to custom 877lowering of a particular intrinsic your pass **must** eliminate all 878instances of the corresponding intrinsic in functions which opt in to 879your GC. The best example of such a pass is the ShadowStackGC and it's 880ShadowStackGCLowering pass. 881 882There is currently no way to register such a custom lowering pass 883without building a custom copy of LLVM. 884 885.. _safe-points: 886 887Generating safe points: ``NeededSafePoints`` 888-------------------------------------------- 889 890LLVM can compute four kinds of safe points: 891 892.. code-block:: c++ 893 894 namespace GC { 895 /// PointKind - The type of a collector-safe point. 896 /// 897 enum PointKind { 898 Loop, //< Instr is a loop (backwards branch). 899 Return, //< Instr is a return instruction. 900 PreCall, //< Instr is a call instruction. 901 PostCall //< Instr is the return address of a call. 902 }; 903 } 904 905A collector can request any combination of the four by setting the 906``NeededSafePoints`` mask: 907 908.. code-block:: c++ 909 910 MyGC::MyGC() { 911 NeededSafePoints = 1 << GC::Loop 912 | 1 << GC::Return 913 | 1 << GC::PreCall 914 | 1 << GC::PostCall; 915 } 916 917It can then use the following routines to access safe points. 918 919.. code-block:: c++ 920 921 for (iterator I = begin(), E = end(); I != E; ++I) { 922 GCFunctionInfo *MD = *I; 923 size_t PointCount = MD->size(); 924 925 for (GCFunctionInfo::iterator PI = MD->begin(), 926 PE = MD->end(); PI != PE; ++PI) { 927 GC::PointKind PointKind = PI->Kind; 928 unsigned PointNum = PI->Num; 929 } 930 } 931 932Almost every collector requires ``PostCall`` safe points, since these correspond 933to the moments when the function is suspended during a call to a subroutine. 934 935Threaded programs generally require ``Loop`` safe points to guarantee that the 936application will reach a safe point within a bounded amount of time, even if it 937is executing a long-running loop which contains no function calls. 938 939Threaded collectors may also require ``Return`` and ``PreCall`` safe points to 940implement "stop the world" techniques using self-modifying code, where it is 941important that the program not exit the function without reaching a safe point 942(because only the topmost function has been patched). 943 944.. _assembly: 945 946Emitting assembly code: ``GCMetadataPrinter`` 947--------------------------------------------- 948 949LLVM allows a plugin to print arbitrary assembly code before and after the rest 950of a module's assembly code. At the end of the module, the GC can compile the 951LLVM stack map into assembly code. (At the beginning, this information is not 952yet computed.) 953 954Since AsmWriter and CodeGen are separate components of LLVM, a separate abstract 955base class and registry is provided for printing assembly code, the 956``GCMetadaPrinter`` and ``GCMetadataPrinterRegistry``. The AsmWriter will look 957for such a subclass if the ``GCStrategy`` sets ``UsesMetadata``: 958 959.. code-block:: c++ 960 961 MyGC::MyGC() { 962 UsesMetadata = true; 963 } 964 965This separation allows JIT-only clients to be smaller. 966 967Note that LLVM does not currently have analogous APIs to support code generation 968in the JIT, nor using the object writers. 969 970.. code-block:: c++ 971 972 // lib/MyGC/MyGCPrinter.cpp - Example LLVM GC printer 973 974 #include "llvm/CodeGen/GCMetadataPrinter.h" 975 #include "llvm/Support/Compiler.h" 976 977 using namespace llvm; 978 979 namespace { 980 class LLVM_LIBRARY_VISIBILITY MyGCPrinter : public GCMetadataPrinter { 981 public: 982 virtual void beginAssembly(AsmPrinter &AP); 983 984 virtual void finishAssembly(AsmPrinter &AP); 985 }; 986 987 GCMetadataPrinterRegistry::Add<MyGCPrinter> 988 X("mygc", "My bespoke garbage collector."); 989 } 990 991The collector should use ``AsmPrinter`` to print portable assembly code. The 992collector itself contains the stack map for the entire module, and may access 993the ``GCFunctionInfo`` using its own ``begin()`` and ``end()`` methods. Here's 994a realistic example: 995 996.. code-block:: c++ 997 998 #include "llvm/CodeGen/AsmPrinter.h" 999 #include "llvm/IR/Function.h" 1000 #include "llvm/IR/DataLayout.h" 1001 #include "llvm/Target/TargetAsmInfo.h" 1002 #include "llvm/Target/TargetMachine.h" 1003 1004 void MyGCPrinter::beginAssembly(AsmPrinter &AP) { 1005 // Nothing to do. 1006 } 1007 1008 void MyGCPrinter::finishAssembly(AsmPrinter &AP) { 1009 MCStreamer &OS = AP.OutStreamer; 1010 unsigned IntPtrSize = AP.TM.getSubtargetImpl()->getDataLayout()->getPointerSize(); 1011 1012 // Put this in the data section. 1013 OS.SwitchSection(AP.getObjFileLowering().getDataSection()); 1014 1015 // For each function... 1016 for (iterator FI = begin(), FE = end(); FI != FE; ++FI) { 1017 GCFunctionInfo &MD = **FI; 1018 1019 // A compact GC layout. Emit this data structure: 1020 // 1021 // struct { 1022 // int32_t PointCount; 1023 // void *SafePointAddress[PointCount]; 1024 // int32_t StackFrameSize; // in words 1025 // int32_t StackArity; 1026 // int32_t LiveCount; 1027 // int32_t LiveOffsets[LiveCount]; 1028 // } __gcmap_<FUNCTIONNAME>; 1029 1030 // Align to address width. 1031 AP.EmitAlignment(IntPtrSize == 4 ? 2 : 3); 1032 1033 // Emit PointCount. 1034 OS.AddComment("safe point count"); 1035 AP.EmitInt32(MD.size()); 1036 1037 // And each safe point... 1038 for (GCFunctionInfo::iterator PI = MD.begin(), 1039 PE = MD.end(); PI != PE; ++PI) { 1040 // Emit the address of the safe point. 1041 OS.AddComment("safe point address"); 1042 MCSymbol *Label = PI->Label; 1043 AP.EmitLabelPlusOffset(Label/*Hi*/, 0/*Offset*/, 4/*Size*/); 1044 } 1045 1046 // Stack information never change in safe points! Only print info from the 1047 // first call-site. 1048 GCFunctionInfo::iterator PI = MD.begin(); 1049 1050 // Emit the stack frame size. 1051 OS.AddComment("stack frame size (in words)"); 1052 AP.EmitInt32(MD.getFrameSize() / IntPtrSize); 1053 1054 // Emit stack arity, i.e. the number of stacked arguments. 1055 unsigned RegisteredArgs = IntPtrSize == 4 ? 5 : 6; 1056 unsigned StackArity = MD.getFunction().arg_size() > RegisteredArgs ? 1057 MD.getFunction().arg_size() - RegisteredArgs : 0; 1058 OS.AddComment("stack arity"); 1059 AP.EmitInt32(StackArity); 1060 1061 // Emit the number of live roots in the function. 1062 OS.AddComment("live root count"); 1063 AP.EmitInt32(MD.live_size(PI)); 1064 1065 // And for each live root... 1066 for (GCFunctionInfo::live_iterator LI = MD.live_begin(PI), 1067 LE = MD.live_end(PI); 1068 LI != LE; ++LI) { 1069 // Emit live root's offset within the stack frame. 1070 OS.AddComment("stack index (offset / wordsize)"); 1071 AP.EmitInt32(LI->StackOffset); 1072 } 1073 } 1074 } 1075 1076References 1077========== 1078 1079.. _appel89: 1080 1081[Appel89] Runtime Tags Aren't Necessary. Andrew W. Appel. Lisp and Symbolic 1082Computation 19(7):703-705, July 1989. 1083 1084.. _goldberg91: 1085 1086[Goldberg91] Tag-free garbage collection for strongly typed programming 1087languages. Benjamin Goldberg. ACM SIGPLAN PLDI'91. 1088 1089.. _tolmach94: 1090 1091[Tolmach94] Tag-free garbage collection using explicit type parameters. Andrew 1092Tolmach. Proceedings of the 1994 ACM conference on LISP and functional 1093programming. 1094 1095.. _henderson02: 1096 1097[Henderson2002] `Accurate Garbage Collection in an Uncooperative Environment 1098<http://citeseer.ist.psu.edu/henderson02accurate.html>`__ 1099