1Object allocation and lifetime in ICE 2===================================== 3 4This document discusses object lifetime and scoping issues, starting with 5bitcode parsing and ending with ELF file emission. 6 7Multithreaded translation model 8------------------------------- 9 10A single thread is responsible for parsing PNaCl bitcode (possibly concurrently 11with downloading the bitcode file) and constructing the initial high-level ICE. 12The result is a queue of Cfg pointers. The parser thread incrementally adds a 13Cfg pointer to the queue after the Cfg is created, and then moves on to parse 14the next function. 15 16Multiple translation worker threads draw from the queue of Cfg pointers as they 17are added to the queue, such that several functions can be translated in parallel. 18The result is a queue of assembler buffers, each of which consists of machine code 19plus fixups. 20 21A single thread is responsible for writing the assembler buffers to an ELF file. 22It consumes the assembler buffers from the queue that the translation threads 23write to. 24 25This means that Cfgs are created by the parser thread and destroyed by the 26translation thread (including Cfg nodes, instructions, and most kinds of 27operands), and assembler buffers are created by the translation thread and 28destroyed by the writer thread. 29 30Deterministic execution 31^^^^^^^^^^^^^^^^^^^^^^^ 32 33Although code randomization is a key aspect of security, deterministic and 34repeatable translation is sometimes needed, e.g. for regression testing. 35Multithreaded translation introduces potential for randomness that may need to 36be made deterministic. 37 38* Bitcode parsing is sequential, so it's easy to use a FIFO queue to keep the 39 translation queue in deterministic order. But since translation is 40 multithreaded, FIFO order for the assembler buffer queue may not be 41 deterministic. The writer thread would be responsible for reordering the 42 buffers, potentially waiting for slower translations to complete even if other 43 assembler buffers are available. 44 45* Different translation threads may add new constant pool entries at different 46 times. Some constant pool entries are emitted as read-only data. This 47 includes floating-point constants for x86, as well as integer immediate 48 randomization through constant pooling. These constant pool entries are 49 emitted after all assembler buffers have been written. The writer needs to be 50 able to sort them deterministically before emitting them. 51 52Object lifetimes 53---------------- 54 55Objects of type Constant, or a subclass of Constant, are pooled globally. The 56pooling is managed by the GlobalContext class. Since Constants are added or 57looked up by translation threads and the parser thread, access to the constant 58pools, as well as GlobalContext in general, need to be arbitrated by locks. 59(It's possible that if there's too much contention, we can maintain a 60thread-local cache for Constant pool lookups.) Constants live across all 61function translations, and are destroyed only at the end. 62 63Several object types are scoped within the lifetime of the Cfg. These include 64CfgNode, Inst, Variable, and any target-specific subclasses of Inst and Operand. 65When the Cfg is destroyed, these scoped objects are destroyed as well. To keep 66this cheap, the Cfg includes a slab allocator from which these objects are 67allocated, and the objects should not contain fields with non-trivial 68destructors. Most of these fields are POD, but in a couple of cases these 69fields are STL containers. We deal with this, and avoid leaking memory, by 70providing the container with an allocator that uses the Cfg-local slab 71allocator. Since the container allocator generally needs to be stateless, we 72store a pointer to the slab allocator in thread-local storage (TLS). This is 73straightforward since on any of the threads, only one Cfg is active at a time, 74and a given Cfg is only active in one thread at a time (either the parser 75thread, or at most one translation thread, or the writer thread). 76 77Even though there is a one-to-one correspondence between Cfgs and assembler 78buffers, they need to use different allocators. This is because the translation 79thread wants to destroy the Cfg and reclaim all its memory after translation 80completes, but possibly before the assembly buffer is written to the ELF file. 81Ownership of the assembler buffer and its allocator are transferred to the 82writer thread after translation completes, similar to the way ownership of the 83Cfg and its allocator are transferred to the translation thread after parsing 84completes. 85 86Allocators and TLS 87------------------ 88 89Part of the Cfg building, and transformations on the Cfg, include STL container 90operations which may need to allocate additional memory in a stateless fashion. 91This requires maintaining the proper slab allocator pointer in TLS. 92 93When the parser thread creates a new Cfg object, it puts a pointer to the Cfg's 94slab allocator into its own TLS. This is used as the Cfg is built within the 95parser thread. After the Cfg is built, the parser thread clears its allocator 96pointer, adds the new Cfg pointer to the translation queue, continues with the 97next function. 98 99When the translation thread grabs a new Cfg pointer, it installs the Cfg's slab 100allocator into its TLS and translates the function. When generating the 101assembly buffer, it must take care not to use the Cfg's slab allocator. If 102there is a slab allocator for the assembler buffer, a pointer to it can also be 103installed in TLS if needed. 104 105The translation thread destroys the Cfg when it is done translating, including 106the Cfg's slab allocator, and clears the allocator pointer from its TLS. 107Likewise, the writer thread destroys the assembler buffer when it is finished 108with it. 109 110Thread safety 111------------- 112 113The parse/translate/write stages of the translation pipeline are fairly 114independent, with little opportunity for threads to interfere. The Subzero 115design calls for all shared accesses to go through the GlobalContext, which adds 116locking as appropriate. This includes the coarse-grain work queues for Cfgs and 117assembler buffers. It also includes finer-grain access to constant pool 118entries, as well as output streams for verbose debugging output. 119 120If locked access to constant pools becomes a bottleneck, we can investigate 121thread-local caches of constants (as mentioned earlier). Also, it should be 122safe though slightly less efficient to allow duplicate copies of constants 123across threads (which could be de-dupped by the writer at the end). 124 125We will use ThreadSanitizer as a way to detect potential data races in the 126implementation. 127