1# Memory management and object layout. Design. 2 3## Overview 4 5Panda Runtime should be scalable onto different devices/OSes. So we need some abstraction level for the OS memory management. 6For now, all targets suppose interaction with the user, so we have some limitations for the STW pause metric. 7We have very limited memory resources for IoT target, so we should maximize efforts on reducing memory overhead(fragmentation and object header size). 8 9The main components of Panda memory management and object model: 10* [Allocators](#allocators) 11* [GC](#gc) 12* [Object header](#object-header) 13 14Panda runtime works/interacts with these memory types: 15* internal memory for runtime(ArenaAllocators for JIT, etc) 16* application memory(i.e., memory for objects created by application) 17* native memory via JNI/FFI 18* memory for JITed code 19 20![High-level design](./images/panda-mm-overview.png "Memory management high-level design") 21 22There are several modes for memory management: 23- base mode 24 - allocators with some average metrics and profile-based configuration(if available) 25 - some baseline GC with profile-based configuration(if available) 26- performance 27 - allocators with low allocation cost 28 - low-pause/pauseless GC(for games) or GC with high throughput and acceptable STW pause (for not games) 29- power-saving mode 30 - energy-efficient allocators(if possible) 31 - special thresholds to improve power efficiency, 32 33Mode are chosen at the startup time (we'll use profile info from cloud for that). 34 35## Object header 36 37Rationale see [here](memory-management-overview.md). 38 39### Requirements 40 41* Support all required features from Runtime 42* Similar design for two different platforms - high-end and low-end 43* Compact Object Header for low-end target 44 45### Specification / Implementation 46 47**Common ObjectHeader methods:** 48 49* Get/Set Mark or Class word 50* Get size of the object header and an object itself 51* Get/Generate an object hash 52 53**Methods, specific for Class word:** 54 55* Get different object fields 56* Return object type 57* Verify object 58* Is it a subclass or not, is it an array or not, etc. 59* Get field address 60 61**Methods, specific for Mark word:** 62 63* Object locked/unlocked 64* Marked for GC or not 65* Monitor functions(get monitor, notify, notify all, wait) 66* Forwarded or not 67 68Mark word depends on configuration and can have different sizes and layout. So here all possible configurations: 69 70128 bits object header for high-end devices(64 bits pointers): 71``` 72|--------------------------------------------------------------------------------------|--------------------| 73| Object Header (128 bits) | State | 74|-----------------------------------------------------|--------------------------------|--------------------| 75| Mark Word (64 bits) | Class Word (64 bits) | | 76|-----------------------------------------------------|--------------------------------|--------------------| 77| nothing:61 | GC:1 | state:00 | OOP to metadata object | Unlock | 78|-----------------------------------------------------|--------------------------------|--------------------| 79| tId:29 | Lcount:32 | GC:1 | state:00 | OOP to metadata object | Lightweight Lock | 80|-----------------------------------------------------|--------------------------------|--------------------| 81| Monitor:61 | GC:1 | state:01 | OOP to metadata object | Heavyweight Lock | 82|-----------------------------------------------------|--------------------------------|--------------------| 83| Hash:61 | GC:1 | state:10 | OOP to metadata object | Hashed | 84|-----------------------------------------------------|--------------------------------|--------------------| 85| Forwarding address:62 | state:11 | OOP to metadata object | GC | 86|-----------------------------------------------------|--------------------------------|--------------------| 87``` 8864 bits object header for high-end devices(32 bits pointers): 89``` 90|--------------------------------------------------------------------------------------|--------------------| 91| Object Header (64 bits) | State | 92|-----------------------------------------------------|--------------------------------|--------------------| 93| Mark Word (32 bits) | Class Word (32 bits) | | 94|-----------------------------------------------------|--------------------------------|--------------------| 95| nothing:29 | GC:1 | state:00 | OOP to metadata object | Unlock | 96|-----------------------------------------------------|--------------------------------|--------------------| 97| tId:13 | Lcount:16 | GC:1 | state:00 | OOP to metadata object | Lightweight Lock | 98|-----------------------------------------------------|--------------------------------|--------------------| 99| Monitor:29 | GC:1 | state:01 | OOP to metadata object | Heavyweight Lock | 100|-----------------------------------------------------|--------------------------------|--------------------| 101| Hash:29 | GC:1 | state:10 | OOP to metadata object | Hashed | 102|-----------------------------------------------------|--------------------------------|--------------------| 103| Forwarding address:30 | state:11 | OOP to metadata object | GC | 104|-----------------------------------------------------|--------------------------------|--------------------| 105``` 106 107However, we can also support such version of the object header(Hash is stored just after the object in memory if object was relocated): 108``` 109|--------------------------------------------------------------------------------------|--------------------| 110| Object Header (64 bits) | State | 111|-----------------------------------------------------|--------------------------------|--------------------| 112| Mark Word (32 bits) | Class Word (32 bits) | | 113|-----------------------------------------------------|--------------------------------|--------------------| 114| nothing:28 | Hash:1 | GC:1 | state:00 | OOP to metadata object | Unlock | 115|-----------------------------------------------------|--------------------------------|--------------------| 116| tId:13 | LCount:15 | Hash:1 | GC:1 | state:00 | OOP to metadata object | Lightweight Lock | 117|-----------------------------------------------------|--------------------------------|--------------------| 118| Monitor:28 | Hash:1 | GC:1 | state:01 | OOP to metadata object | Heavyweight Lock | 119|-----------------------------------------------------|--------------------------------|--------------------| 120| Forwarding address:28 | Hash:1 | GC:1 | state:11 | OOP to metadata object | GC | 121|-----------------------------------------------------|--------------------------------|--------------------| 122``` 123This scenario decreases the size of a Monitor instance, and we don't need to save Hash somewhere during Lightweight Lock too. 124Unfortunately, it requires extra memory after GC moved the object (where the original hash value will be stored) and also required extra GC work. 125But, this scenario will be useful if we have allocator and GC which decreases such a situation to a minimum. 126 12732 bits object header for low-end devices: 128``` 129|--------------------------------------------------------------------------------------|--------------------| 130| Object Header (32 bits) | State | 131|-----------------------------------------------------|--------------------------------|--------------------| 132| Mark Word (16 bits) | Class Word (16 bits) | | 133|-----------------------------------------------------|--------------------------------|--------------------| 134| nothing:13 | GC:1 | state:00 | OOP to metadata object | Unlock | 135|-----------------------------------------------------|--------------------------------|--------------------| 136| thread Id:7 | Lock Count:6 | GC:1 | state:00 | OOP to metadata object | Lightweight Lock | 137|-----------------------------------------------------|--------------------------------|--------------------| 138| Monitor:13 | GC:1 | state:01 | OOP to metadata object | Heavyweight Lock | 139|-----------------------------------------------------|--------------------------------|--------------------| 140| Hash:13 | GC:1 | state:10 | OOP to metadata object | Hashed | 141|-----------------------------------------------------|--------------------------------|--------------------| 142| Forwarding address:14 | state:11 | OOP to metadata object | GC | 143|-----------------------------------------------------|--------------------------------|--------------------| 144``` 145 146States description: 147 148Unlock - the object not locked. 149 150Lightweight Lock - object locked by one thread. 151 152Heavyweight Lock - we have competition for this object(few threads try to lock this object). 153 154Hashed - the object has been hashed, and hash has been stored inside MarkWord. 155 156GC - the object has been moved by GC. 157 158## String and array representation 159 160Array: 161``` 162+------------------------------------------------+ 163| Object Header (64 bits) | 164|------------------------------------------------| 165| Length (32 bits) | 166|------------------------------------------------| 167| Array payload | 168+------------------------------------------------+ 169``` 170String: 171 172If we don't use strings compressing, each string has this structure: 173``` 174+------------------------------------------------+ 175| Object Header (64 bits) | 176|------------------------------------------------| 177| Length (32 bits) | 178|------------------------------------------------| 179| String hash value (32 bits) | 180|------------------------------------------------| 181| String payload | 182+------------------------------------------------+ 183``` 184If we use strings compressing, each string has this structure: 185``` 186+------------------------------------------------+ 187| Object Header (64 bits) | 188|------------------------------------------------| 189| Length (31 bits) | 190|------------------------------------------------| 191| Compressed bit (1 bit) | 192|------------------------------------------------| 193| String hash value (32 bits) | 194|------------------------------------------------| 195| String payload | 196+------------------------------------------------+ 197``` 198If the compressed bit is 1, the string has a compressed payload - 8 bits for each element. 199 200If the compressed bit is 0, the string has not been compressed - its payload consists of 16 bits elements. 201 202One of the ideas about string representation is to use a hash state inside Mark Word as a container for string hash value (of course we should save object hash somewhere else if it is needed or should use string hash value as the object hash value). 203 204String: 205``` 206+------------------------------------------------+ 207| String Hash | GC bit (1 bit) | Status (2 bits) | <--- Mark Word (32 bits) 208|------------------------------------------------| 209| Class Word (32 bits) | 210|------------------------------------------------| 211| Length (32 bits) | 212|------------------------------------------------| 213| String payload | 214+------------------------------------------------+ 215``` 216 217See research [here](./memory-management-overview.md#possible-string-objects-size-reduction). 218About JS strings and arrays see [here](./memory-management-overview.md#js-strings-and-arrays). 219 220## Allocators 221 222Requirements: 223- simple and effective allocator for JIT 224 - no need to manual cleanup memory 225 - efficient all at once deallocation to improve performance 226- reasonable fragmentation 227- scalable 228- support for pool extension and reduction(i.e., we can add another memory chunk to the allocator, and it can give it back to the global "pool" when it is empty) 229- cache awareness 230 231*(optional) power efficiency 232 233All allocators should have these methods: 234- method which allocates ```X``` bytes 235- method which allocates ```X``` bytes with specified alignment 236- method which frees allocated pointed by pointer memory(ArenaAllocator is an exception) 237 238### Arena Allocator 239 240It is a region-based allocator, i.e., all allocated in region/arena objects can be efficiently deallocated all at once. 241Deallocation for the specific object doesn't have effect in this allocator. 242 243JIT flow looks like this: 244``` 245IR -> Optimizations -> Code 246``` 247 248After code generation, all internal structures of JIT should be deleted. 249So, if we can hold JIT memory usage at some reasonable level - Arena Allocator ideally fits JIT requirements for allocator. 250 251### Code Allocator 252 253Requirements: 254- should allocate executable memory for JITed code 255 256This allocator can be tuned to provide more performance. 257For example, if we have some callgraph info, we can use it and allocate code for connected methods with a minimized potential cache-collision rate. 258 259### Main allocator 260 261Requirements: 262- acceptable fragmentation 263- acceptable allocation cost 264- possibility to iterate over the heap 265- scalable 266desired: 267- flexible allocation size list(required to support profile-guided allocation to improve fragmentation and power efficiency) 268 269#### Implementation details 270 271Each allocator works over some pool 272 273Size classes(numbers just informational - they will be tuned after performance analysis): 274- small(1b-4Kb) 275- large(4Kb - 4Mb) 276- humongous(4Mb - Inf) 277 278Size-segregated algorithm used for small size class to reduce fragmentation. 279Small objects are joined in "runs"(not individual element for each size, but some "container" with X elements of the same size in it). 280``` 281+--------------------------------------+-----------------+-----------------+-----+-----------------+ 282| header for run of objects with size X| obj with size X | free mem size X | ... | obj with size X | 283+--------------------------------------+-----------------+-----------------+-----+-----------------+ 284``` 285 286Large objects are not joined in "runs". 287 288Humongous objects can be allocated just by proxying requests to the OS(but keep reference to it somewhere) or by using special allocator. 289 290_Note: below for non-embedded target_ 291 292Each thread maintains a cache for objects(at least for all objects with small size). 293This should reduce overhead because of synchronization tasks. 294 295Locking policy: 296- locks should protect localized/categorized resources(for example one lock for each size in small size class) 297- avoid holding locks during memory related system calls(mmap etc.) 298 299#### Profile-guided allocation 300 301We can use profile information about allocation size for improving main allocator metrics. 302If we see a very popular allocation size in profile, we can add it as an explicit segregated size and reduce fragmentation. 303To make it work, allocator should support dynamic size table(or should have possibility choose from statically predefined). 304 305### Energy efficiency in allocators 306 307As shown in this [paper](https://www.cs.york.ac.uk/rts/docs/CODES-EMSOFT-CASES-2006/emsoft/p215.pdf) by changing 308various settings of the allocator, it is possible to get very energy efficient allocator. 309There is no universal approach in this paper, but we can try to mix approach from this paper 310with our profile-guided approach. 311 312## Pools and OS interactions 313 314All used memory is divided in chunks. Main allocator can extend his pool with these chunks. 315 316For the cases when we can get memory shortage we should have some preallocated buffer which allow Runtime to continue to work, while GC trying to free memory. 317 318Note: 319For the IoT systems without MMU Pools should have non-trivial implementation. 320 321For some systems/languages will be implemented context-scoped allocator. 322This allocator works over some arena and after the program will be out of the context - this arena will be returned to the OS. 323 324## Spaces 325 326- MemMapSpace, shared between these: 327 - Code space (executable) 328 - Compiler Internal Space(linked list of arenas) 329 - Internal memory space for non-compiler part of runtime (including GC internals) 330 - Object space 331 - BumpPointerSpace 332 - Regular object space 333 - Humonguous objects space 334 - TLAB space(optional) 335 - RegionSpace(optional for some GCs) 336 - Non-moving space 337- MallocMemSpace 338 - Humonguous objects space(optional) 339 340Logical GC spaces: 341- young space (optional for some GCs) 342- survivor space (optional) 343- tenured space 344 345## GC 346 347Garbage collector(GC) automatically recycles memory that it can prove will never be used again. 348 349GC development will be iterative process. 350 351Common requirements: 352- precise GC (see [glossary](./glossary.md#memory-management-terms)) 353- GC should support various [modes](#overview)(performance, power-saving mode, normal mode); 354- GC suitable for each mode he shouldn't violate requirements for this mode(see [here](#overview)) 355 356Requirements for Runtime: 357- support for precise/exact roots 358- GC barriers support by Interpreter and JIT 359- safepoints support by Interpreter and JIT 360 361Panda should support multiple GCs, since different GCs have different advantages(memory usage, throughput) at different benchmarks/applications. 362So we should have possibility to use optimal GC for each application. 363 364### Epsilon GC 365 366Epsilon GC does absolutely nothing but makes the impression that Runtime has GC. I.e., it supports all required GC interfaces and can be integrated into Runtime. 367 368Epsilon GC should be used only for debug and profiling purposes. I.e., we can disable GC and measure in mode "What if we don't have GC". 369 370### STW GC 371 372Stop-The-World GC. 373 374Non-generational non-moving GC, during the work all mutator threads should be at safepoint. 375 3761. Root scan 3771. Mark 3781. Sweep 379 380### Concurrent Mark Sweep GC 381 382Requirements: 383- concurrent 384- generational 385- low cpu usage (high throughput) 386- acceptable STW pause 387- (optional) compaction 388 389We need to conduct more performance analysis experiments for choosing optimal scheme, but for now let's consider these options: 390- generational moving (optionally compacting) GC 391- (optional) generational non-moving (optionally compacting) GC 392 393Spaces(for moving CMS): 394``` 395+------------+------------+----------------------------+ 396| Eden/young | Survivor | Tenured/old | 397| | (optional) | | 398+------------+------------+----------------------------+ 399``` 400 401Survivor space is optional and only for high-end targets. 402Since one of the metric for this GC - high throughput, the most of the objects in the Eden will live enough to die. 403If we prioritize energy-efficiency metric and the heap sizes at average not gigantic, it seems that we should avoid using survivor spaces. 404So we can support it optionally for experiments. As alternative we can introduce some average age metadata for run of small objects. 405 406Minor GC(STW): 4071. Root scan for young gen, CardTable used for finding roots in old gen 4081. Mark eden and move alive objects to the tenured(or survivor) 4091. Sweep eden 410 411Note: we'll use adaptive thresholds for triggering Minor GC for minimizing STW pause 412Note #2: we can tune minor GC by trying make concurrent marking and re-mark, but it will require copy of the card table. 413 414Major GC 4151. Concurrent scan of static roots 4161. Initial Mark - root scan(STW #1) 4171. Concurrent Marking + Reference processor 4181. Remark missed during concurrent marking objects (STW #2) 4191. Concurrent Sweep + Finalizers 4201. Reset 421 422Reference processor - prevents issues with wrong finalization order. 423 424Note: If we don't have Survivor spaces we can implement non-moving generational GC. 425 426### Region based GC (main) 427 428Requirements: 429- concurrent 430- generational 431- acceptable stable STW pause 432- (optional) compaction 433 434Since typical heap size for mobile applications is small - this GC can be considered as good choice for production. 435 436All heap consists of memory regions with fixed size(it can vary, i.e. size of memory region #K+1 can be different than size of memory region #K). 437``` 438+------------------+------------------+-----+------------------+ 439| Memory region #1 | Memory region #2 | ... | Memory region #N | 440| young | tenured | ... | tenured | 441+------------------+------------------+-----+------------------+ 442``` 443 444Regions types: 445- young regions 446- tenured regions 447- humonguous regions(for humonguous objects) 448- empty regions 449 450Incoming references for each region are tracked via remembered sets: 451- old-to-young references 452- old-to-old references 453 454Minor GC(only for young regions - STW): 4551. Root scan for young gen, remembered sets used for finding roots in old gen 4561. Marking young gen + Reference processor + moving alive objects to the tenured space 4571. Sweep + finalizers 458 459The size of young space selected to satisfy 460 461Mixed GC - minor GC + some tenured regions added to the young gen regions after the concurrent marking. 462Concurrent marking(triggered when we reach some threshold for tenured generation size): 4631. Root scan (STW #1) 4641. Concurrent marking + Reference processor 4651. Re-mark - finishes marking and update liveness statistics (STW #2) 4661. Cleanup - reclaims empty regions and determines if we need mixed collections to reclaim tenured space. Tenured regions selected by using different thresholds. 467 468Note: RSets optionally can be refined with special threads 469 470### Low-pause GC (deffered) 471 472Requirements: 473- stable low STW pause/pauseless 474- (optional)incremental 475- with compaction 476 477No explicit minor GC. 478 479Major GC 4801. Concurrent scan of static roots 4811. Initial Mark - root scan(STW #1) 4821. Concurrent Marking + Reference processor 4831. Concurrent Sweep + Finalizers + Concurrent Copy & Compact 4841. Reset 485 486Note: good choice for the applications with big heap or for applications when it is hard to provide stable low pause with Region based GC. 487 488Note: compaction is target and mode dependent, so for low-memory devices we can consider [semi-space compaction](./glossary.md#memory-management-terms). 489For straight-forward approach we can consider some support from OS to minimize overlapping of semi-space compaction phases between applications. 490 491### GC: interaction with Interpreter, JIT and AOT 492 493#### Safepoints 494 495Prerequisites: 496* one HW register reserved for the pointer to the ExecState(per-thread state), let's call it `RVState` 497* ExecState structure has field with address of some page used for safepoints and we knew offset of this field `SPaddrOffset` 498 499In general, safepoint will be just presented as some implicit or explicit load from the `[RVState, SPaddrOffset]`. 500For example, it can be something like this: `LDR R12, [RVState, #SPaddrOffset]` 501 502Note: In some architectures it is make sense to use store instead of load because it requires less registers. 503 504Note: If it is no MMU available - it is allowed to use explicit condition for safepoint, i.e. something like this(pseudocode): 505``` 506if (SafepointFlag == true) { 507 call Runtime::SafepointHandler 508} 509``` 510 511When GC wants to stop the world, it forces it by stopping all threads at the safepoint. 512It protects some predefined safepoint memory page, and it leads to segmentation faults in all execution threads when they do the load from this address. 513 514Safepoints should be inserted at the beginning of the method and at the head of each loop. 515 516For each safepoint, we should have a method that can provide GC with information about objects on the stack. 517Interpreter already supports such info in the frames. 518But for JIT/compiler, it looks like we need some generated(by JIT/compiler) method that can get all necessary data for the safepoint. 519This method can actually be just some code without prologue and epilogue. 520We'll jump to its beginning from signal handler, and in the end, we should jump back to the safepoint, so probably we should put it near the original code. 521 522So the flow looks like this: 523 524``` 525 ... 526 | compiled/jitted code | ------> 527 | safepoint #X in the code | ---seg fault---> 528 | signal handler | ---change return pc explicitly---> 529 | method that prepares data about objects on stack for the #X safepoint and waits until STW ends | ---jump via encoded relative branch to safepoint---> 530 | safepoint #X in the code | ---normal execution---> 531 | compiled/jitted code | ------> 532 ... 533``` 534 535**Opens**: 536* should we generate method for each safepoint, or all safepoints at once? 537 538#### GC Barriers 539 540GC barrier is a block on writing to(write barrier) or reading from(read barrier) certain memory by the application code. GC Barriers used to ensure heap consistency and optimize some of GC flows. 541 542##### GC Write Barriers 543 544Heap inconsistency can happen when GC reclaim alive/reachable object. 545I.e. these two conditions should happen to reclaim active/reachable: 5461. We store reference to a white object into a black object 5471. There are no paths from any gray object to that white object 548 549Besides addressing of heap inconsistency problem, write barrier can be used for maintaining incoming references for young generation or region. 550 551So we can solve these issues with GC WRB(write barrier). GC WRB can be _pre_(inserted before the store) and _post_(inserted after the store). This barriers used **only** when we store reference to the object to some field of an object. 552 553_Pre_ barrier usually used to solve issue with lost alive object during concurrent marking. Pseudocode(example): 554```c++ 555if (UNLIKELY(concurrent_marking)) { 556 auto pre_val = obj.field; 557 if (pre_val != nullptr) { 558 store_in_buff_to_mark(pre_val); // call function which stores reference to object stored in the field to process it later 559 } 560} 561obj.field = new_val; // STORE for which barrier generated 562``` 563 564_Post_ barrier can be used to solve issue with tracking references from tenured generation to the young generation(or inter-region references). In this case we always know external roots for the young generation space(or for region). Pseudocode(abstract example, not real one): 565```c++ 566obj.field = new_val; // STORE for which barrier generated 567if ((AddressOf(obj.field) not in [YOUNG_GENERATION_ADDR_BEG, YOUNG_GENERATION_ADDR_END]) && 568 (AddressOf(new_val) in [YOUNG_GENERATION_ADDR_BEG, YOUNG_GENERATION_ADDR_END])) { 569 update_card(AddressOf(obj.field)); // call function which marks some memory range as containing roots for young generation 570} 571``` 572Note: Sometimes we don't check if object and stored reference in different generations. Because we get much less overhead this way. 573 574##### GC Read Barriers 575 576Read barriers used during concurrent compaction in some GCs. 577For example we concurrently moving object from one place(`from-space`) to the another(`to-space`). 578At some moment we can have two instance of the one object. 579So we need one of these conditions should stand if we want to keep heap consistent: 5801. All writes happen into `to-space` instance of the object, but reads can happen from both `from-space` and `to-space` instances 5811. All writes and reads happen into/from `to-space` 582 583#### GC Barriers integration with Interpreter and compiler 584 585 586From Interpreter you could use runtime interface methods: 587```c++ 588static void PreBarrier(void *obj_field_addr, void *pre_val_addr); 589static void PostBarrier(void *obj_field_addr, void *val_addr); 590``` 591Note: for performance, we can put into ExecState address of conditional flag for conditional barriers with trivial condition (`if (*x) ...`). 592 593It is critical to make compiler to encode barriers very optimally. At least fast path should be encoded effectively. 594There are several approaches for that: 595 1. To describe barrier use some meta-language or IR which can be interpreted/encoded by all compilers compatible with runtime (it is currently not applicable for the runtime) 596 1. (a lot of open questions here, so consider this as an idea) One compiler knows how to encode barrier using runtime interfaces (see next item) and could provide some more compiler-friendly interface to the other compilers to encode GC barriers. 597 1. The compiler knows for each barrier type how it should be encoded (see pseudocode in `libpandabase/mem/gc_barrier.h`). And could use the runtime to get all required operands to do this. 598Let's consider below encoding of PRE_ barrier: 599 - get barrier type via RuntimeInterface: `BarrierType GetPreType() const` 600 - for this barrier type get all needed operands provided by Runtime via 601 `BarrierOperand GCBarrierSet::GetBarrierOperand(BarrierPosition barrier_position, std::string_view name);` 602 (you should use operand/parameters names from pseudocode provided in `enum BarrierType`) 603 - encode barrier code using loaded operands and pseudocode from `enum BarrierType` 604 605## Memory sanitizers support 606 607Panda Runtime should support [ASAN](https://github.com/google/sanitizers/wiki/AddressSanitizer). 608 609Optional: [MSAN](https://github.com/google/sanitizers/wiki/MemorySanitizer) 610(Note: not possible to use without custom built toolchain) 611 612Desirable, but not easy to support: [HWSAN](https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html) 613