1===================================== 2Coroutines in LLVM 3===================================== 4 5.. contents:: 6 :local: 7 :depth: 3 8 9.. warning:: 10 This is a work in progress. Compatibility across LLVM releases is not 11 guaranteed. 12 13Introduction 14============ 15 16.. _coroutine handle: 17 18LLVM coroutines are functions that have one or more `suspend points`_. 19When a suspend point is reached, the execution of a coroutine is suspended and 20control is returned back to its caller. A suspended coroutine can be resumed 21to continue execution from the last suspend point or it can be destroyed. 22 23In the following example, we call function `f` (which may or may not be a 24coroutine itself) that returns a handle to a suspended coroutine 25(**coroutine handle**) that is used by `main` to resume the coroutine twice and 26then destroy it: 27 28.. code-block:: llvm 29 30 define i32 @main() { 31 entry: 32 %hdl = call i8* @f(i32 4) 33 call void @llvm.coro.resume(i8* %hdl) 34 call void @llvm.coro.resume(i8* %hdl) 35 call void @llvm.coro.destroy(i8* %hdl) 36 ret i32 0 37 } 38 39.. _coroutine frame: 40 41In addition to the function stack frame which exists when a coroutine is 42executing, there is an additional region of storage that contains objects that 43keep the coroutine state when a coroutine is suspended. This region of storage 44is called **coroutine frame**. It is created when a coroutine is called and 45destroyed when a coroutine runs to completion or destroyed by a call to 46the `coro.destroy`_ intrinsic. 47 48An LLVM coroutine is represented as an LLVM function that has calls to 49`coroutine intrinsics`_ defining the structure of the coroutine. 50After lowering, a coroutine is split into several 51functions that represent three different ways of how control can enter the 52coroutine: 53 541. a ramp function, which represents an initial invocation of the coroutine that 55 creates the coroutine frame and executes the coroutine code until it 56 encounters a suspend point or reaches the end of the function; 57 582. a coroutine resume function that is invoked when the coroutine is resumed; 59 603. a coroutine destroy function that is invoked when the coroutine is destroyed. 61 62.. note:: Splitting out resume and destroy functions are just one of the 63 possible ways of lowering the coroutine. We chose it for initial 64 implementation as it matches closely the mental model and results in 65 reasonably nice code. 66 67Coroutines by Example 68===================== 69 70Coroutine Representation 71------------------------ 72 73Let's look at an example of an LLVM coroutine with the behavior sketched 74by the following pseudo-code. 75 76.. code-block:: c++ 77 78 void *f(int n) { 79 for(;;) { 80 print(n++); 81 <suspend> // returns a coroutine handle on first suspend 82 } 83 } 84 85This coroutine calls some function `print` with value `n` as an argument and 86suspends execution. Every time this coroutine resumes, it calls `print` again with an argument one bigger than the last time. This coroutine never completes by itself and must be destroyed explicitly. If we use this coroutine with 87a `main` shown in the previous section. It will call `print` with values 4, 5 88and 6 after which the coroutine will be destroyed. 89 90The LLVM IR for this coroutine looks like this: 91 92.. code-block:: llvm 93 94 define i8* @f(i32 %n) { 95 entry: 96 %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null) 97 %size = call i32 @llvm.coro.size.i32() 98 %alloc = call i8* @malloc(i32 %size) 99 %hdl = call noalias i8* @llvm.coro.begin(token %id, i8* %alloc) 100 br label %loop 101 loop: 102 %n.val = phi i32 [ %n, %entry ], [ %inc, %loop ] 103 %inc = add nsw i32 %n.val, 1 104 call void @print(i32 %n.val) 105 %0 = call i8 @llvm.coro.suspend(token none, i1 false) 106 switch i8 %0, label %suspend [i8 0, label %loop 107 i8 1, label %cleanup] 108 cleanup: 109 %mem = call i8* @llvm.coro.free(token %id, i8* %hdl) 110 call void @free(i8* %mem) 111 br label %suspend 112 suspend: 113 %unused = call i1 @llvm.coro.end(i8* %hdl, i1 false) 114 ret i8* %hdl 115 } 116 117The `entry` block establishes the coroutine frame. The `coro.size`_ intrinsic is 118lowered to a constant representing the size required for the coroutine frame. 119The `coro.begin`_ intrinsic initializes the coroutine frame and returns the 120coroutine handle. The second parameter of `coro.begin` is given a block of memory 121to be used if the coroutine frame needs to be allocated dynamically. 122The `coro.id`_ intrinsic serves as coroutine identity useful in cases when the 123`coro.begin`_ intrinsic get duplicated by optimization passes such as 124jump-threading. 125 126The `cleanup` block destroys the coroutine frame. The `coro.free`_ intrinsic, 127given the coroutine handle, returns a pointer of the memory block to be freed or 128`null` if the coroutine frame was not allocated dynamically. The `cleanup` 129block is entered when coroutine runs to completion by itself or destroyed via 130call to the `coro.destroy`_ intrinsic. 131 132The `suspend` block contains code to be executed when coroutine runs to 133completion or suspended. The `coro.end`_ intrinsic marks the point where 134a coroutine needs to return control back to the caller if it is not an initial 135invocation of the coroutine. 136 137The `loop` blocks represents the body of the coroutine. The `coro.suspend`_ 138intrinsic in combination with the following switch indicates what happens to 139control flow when a coroutine is suspended (default case), resumed (case 0) or 140destroyed (case 1). 141 142Coroutine Transformation 143------------------------ 144 145One of the steps of coroutine lowering is building the coroutine frame. The 146def-use chains are analyzed to determine which objects need be kept alive across 147suspend points. In the coroutine shown in the previous section, use of virtual register 148`%n.val` is separated from the definition by a suspend point, therefore, it 149cannot reside on the stack frame since the latter goes away once the coroutine 150is suspended and control is returned back to the caller. An i32 slot is 151allocated in the coroutine frame and `%n.val` is spilled and reloaded from that 152slot as needed. 153 154We also store addresses of the resume and destroy functions so that the 155`coro.resume` and `coro.destroy` intrinsics can resume and destroy the coroutine 156when its identity cannot be determined statically at compile time. For our 157example, the coroutine frame will be: 158 159.. code-block:: llvm 160 161 %f.frame = type { void (%f.frame*)*, void (%f.frame*)*, i32 } 162 163After resume and destroy parts are outlined, function `f` will contain only the 164code responsible for creation and initialization of the coroutine frame and 165execution of the coroutine until a suspend point is reached: 166 167.. code-block:: llvm 168 169 define i8* @f(i32 %n) { 170 entry: 171 %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null) 172 %alloc = call noalias i8* @malloc(i32 24) 173 %0 = call noalias i8* @llvm.coro.begin(token %id, i8* %alloc) 174 %frame = bitcast i8* %0 to %f.frame* 175 %1 = getelementptr %f.frame, %f.frame* %frame, i32 0, i32 0 176 store void (%f.frame*)* @f.resume, void (%f.frame*)** %1 177 %2 = getelementptr %f.frame, %f.frame* %frame, i32 0, i32 1 178 store void (%f.frame*)* @f.destroy, void (%f.frame*)** %2 179 180 %inc = add nsw i32 %n, 1 181 %inc.spill.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2 182 store i32 %inc, i32* %inc.spill.addr 183 call void @print(i32 %n) 184 185 ret i8* %frame 186 } 187 188Outlined resume part of the coroutine will reside in function `f.resume`: 189 190.. code-block:: llvm 191 192 define internal fastcc void @f.resume(%f.frame* %frame.ptr.resume) { 193 entry: 194 %inc.spill.addr = getelementptr %f.frame, %f.frame* %frame.ptr.resume, i64 0, i32 2 195 %inc.spill = load i32, i32* %inc.spill.addr, align 4 196 %inc = add i32 %n.val, 1 197 store i32 %inc, i32* %inc.spill.addr, align 4 198 tail call void @print(i32 %inc) 199 ret void 200 } 201 202Whereas function `f.destroy` will contain the cleanup code for the coroutine: 203 204.. code-block:: llvm 205 206 define internal fastcc void @f.destroy(%f.frame* %frame.ptr.destroy) { 207 entry: 208 %0 = bitcast %f.frame* %frame.ptr.destroy to i8* 209 tail call void @free(i8* %0) 210 ret void 211 } 212 213Avoiding Heap Allocations 214------------------------- 215 216A particular coroutine usage pattern, which is illustrated by the `main` 217function in the overview section, where a coroutine is created, manipulated and 218destroyed by the same calling function, is common for coroutines implementing 219RAII idiom and is suitable for allocation elision optimization which avoid 220dynamic allocation by storing the coroutine frame as a static `alloca` in its 221caller. 222 223In the entry block, we will call `coro.alloc`_ intrinsic that will return `true` 224when dynamic allocation is required, and `false` if dynamic allocation is 225elided. 226 227.. code-block:: llvm 228 229 entry: 230 %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null) 231 %need.dyn.alloc = call i1 @llvm.coro.alloc(token %id) 232 br i1 %need.dyn.alloc, label %dyn.alloc, label %coro.begin 233 dyn.alloc: 234 %size = call i32 @llvm.coro.size.i32() 235 %alloc = call i8* @CustomAlloc(i32 %size) 236 br label %coro.begin 237 coro.begin: 238 %phi = phi i8* [ null, %entry ], [ %alloc, %dyn.alloc ] 239 %hdl = call noalias i8* @llvm.coro.begin(token %id, i8* %phi) 240 241In the cleanup block, we will make freeing the coroutine frame conditional on 242`coro.free`_ intrinsic. If allocation is elided, `coro.free`_ returns `null` 243thus skipping the deallocation code: 244 245.. code-block:: llvm 246 247 cleanup: 248 %mem = call i8* @llvm.coro.free(token %id, i8* %hdl) 249 %need.dyn.free = icmp ne i8* %mem, null 250 br i1 %need.dyn.free, label %dyn.free, label %if.end 251 dyn.free: 252 call void @CustomFree(i8* %mem) 253 br label %if.end 254 if.end: 255 ... 256 257With allocations and deallocations represented as described as above, after 258coroutine heap allocation elision optimization, the resulting main will be: 259 260.. code-block:: llvm 261 262 define i32 @main() { 263 entry: 264 call void @print(i32 4) 265 call void @print(i32 5) 266 call void @print(i32 6) 267 ret i32 0 268 } 269 270Multiple Suspend Points 271----------------------- 272 273Let's consider the coroutine that has more than one suspend point: 274 275.. code-block:: c++ 276 277 void *f(int n) { 278 for(;;) { 279 print(n++); 280 <suspend> 281 print(-n); 282 <suspend> 283 } 284 } 285 286Matching LLVM code would look like (with the rest of the code remaining the same 287as the code in the previous section): 288 289.. code-block:: llvm 290 291 loop: 292 %n.addr = phi i32 [ %n, %entry ], [ %inc, %loop.resume ] 293 call void @print(i32 %n.addr) #4 294 %2 = call i8 @llvm.coro.suspend(token none, i1 false) 295 switch i8 %2, label %suspend [i8 0, label %loop.resume 296 i8 1, label %cleanup] 297 loop.resume: 298 %inc = add nsw i32 %n.addr, 1 299 %sub = xor i32 %n.addr, -1 300 call void @print(i32 %sub) 301 %3 = call i8 @llvm.coro.suspend(token none, i1 false) 302 switch i8 %3, label %suspend [i8 0, label %loop 303 i8 1, label %cleanup] 304 305In this case, the coroutine frame would include a suspend index that will 306indicate at which suspend point the coroutine needs to resume. The resume 307function will use an index to jump to an appropriate basic block and will look 308as follows: 309 310.. code-block:: llvm 311 312 define internal fastcc void @f.Resume(%f.Frame* %FramePtr) { 313 entry.Resume: 314 %index.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i64 0, i32 2 315 %index = load i8, i8* %index.addr, align 1 316 %switch = icmp eq i8 %index, 0 317 %n.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i64 0, i32 3 318 %n = load i32, i32* %n.addr, align 4 319 br i1 %switch, label %loop.resume, label %loop 320 321 loop.resume: 322 %sub = xor i32 %n, -1 323 call void @print(i32 %sub) 324 br label %suspend 325 loop: 326 %inc = add nsw i32 %n, 1 327 store i32 %inc, i32* %n.addr, align 4 328 tail call void @print(i32 %inc) 329 br label %suspend 330 331 suspend: 332 %storemerge = phi i8 [ 0, %loop ], [ 1, %loop.resume ] 333 store i8 %storemerge, i8* %index.addr, align 1 334 ret void 335 } 336 337If different cleanup code needs to get executed for different suspend points, 338a similar switch will be in the `f.destroy` function. 339 340.. note :: 341 342 Using suspend index in a coroutine state and having a switch in `f.resume` and 343 `f.destroy` is one of the possible implementation strategies. We explored 344 another option where a distinct `f.resume1`, `f.resume2`, etc. are created for 345 every suspend point, and instead of storing an index, the resume and destroy 346 function pointers are updated at every suspend. Early testing showed that the 347 current approach is easier on the optimizer than the latter so it is a 348 lowering strategy implemented at the moment. 349 350Distinct Save and Suspend 351------------------------- 352 353In the previous example, setting a resume index (or some other state change that 354needs to happen to prepare a coroutine for resumption) happens at the same time as 355a suspension of a coroutine. However, in certain cases, it is necessary to control 356when coroutine is prepared for resumption and when it is suspended. 357 358In the following example, a coroutine represents some activity that is driven 359by completions of asynchronous operations `async_op1` and `async_op2` which get 360a coroutine handle as a parameter and resume the coroutine once async 361operation is finished. 362 363.. code-block:: text 364 365 void g() { 366 for (;;) 367 if (cond()) { 368 async_op1(<coroutine-handle>); // will resume once async_op1 completes 369 <suspend> 370 do_one(); 371 } 372 else { 373 async_op2(<coroutine-handle>); // will resume once async_op2 completes 374 <suspend> 375 do_two(); 376 } 377 } 378 } 379 380In this case, coroutine should be ready for resumption prior to a call to 381`async_op1` and `async_op2`. The `coro.save`_ intrinsic is used to indicate a 382point when coroutine should be ready for resumption (namely, when a resume index 383should be stored in the coroutine frame, so that it can be resumed at the 384correct resume point): 385 386.. code-block:: llvm 387 388 if.true: 389 %save1 = call token @llvm.coro.save(i8* %hdl) 390 call void @async_op1(i8* %hdl) 391 %suspend1 = call i1 @llvm.coro.suspend(token %save1, i1 false) 392 switch i8 %suspend1, label %suspend [i8 0, label %resume1 393 i8 1, label %cleanup] 394 if.false: 395 %save2 = call token @llvm.coro.save(i8* %hdl) 396 call void @async_op2(i8* %hdl) 397 %suspend2 = call i1 @llvm.coro.suspend(token %save2, i1 false) 398 switch i8 %suspend1, label %suspend [i8 0, label %resume2 399 i8 1, label %cleanup] 400 401.. _coroutine promise: 402 403Coroutine Promise 404----------------- 405 406A coroutine author or a frontend may designate a distinguished `alloca` that can 407be used to communicate with the coroutine. This distinguished alloca is called 408**coroutine promise** and is provided as the second parameter to the 409`coro.id`_ intrinsic. 410 411The following coroutine designates a 32 bit integer `promise` and uses it to 412store the current value produced by a coroutine. 413 414.. code-block:: llvm 415 416 define i8* @f(i32 %n) { 417 entry: 418 %promise = alloca i32 419 %pv = bitcast i32* %promise to i8* 420 %id = call token @llvm.coro.id(i32 0, i8* %pv, i8* null, i8* null) 421 %need.dyn.alloc = call i1 @llvm.coro.alloc(token %id) 422 br i1 %need.dyn.alloc, label %dyn.alloc, label %coro.begin 423 dyn.alloc: 424 %size = call i32 @llvm.coro.size.i32() 425 %alloc = call i8* @malloc(i32 %size) 426 br label %coro.begin 427 coro.begin: 428 %phi = phi i8* [ null, %entry ], [ %alloc, %dyn.alloc ] 429 %hdl = call noalias i8* @llvm.coro.begin(token %id, i8* %phi) 430 br label %loop 431 loop: 432 %n.val = phi i32 [ %n, %coro.begin ], [ %inc, %loop ] 433 %inc = add nsw i32 %n.val, 1 434 store i32 %n.val, i32* %promise 435 %0 = call i8 @llvm.coro.suspend(token none, i1 false) 436 switch i8 %0, label %suspend [i8 0, label %loop 437 i8 1, label %cleanup] 438 cleanup: 439 %mem = call i8* @llvm.coro.free(token %id, i8* %hdl) 440 call void @free(i8* %mem) 441 br label %suspend 442 suspend: 443 %unused = call i1 @llvm.coro.end(i8* %hdl, i1 false) 444 ret i8* %hdl 445 } 446 447A coroutine consumer can rely on the `coro.promise`_ intrinsic to access the 448coroutine promise. 449 450.. code-block:: llvm 451 452 define i32 @main() { 453 entry: 454 %hdl = call i8* @f(i32 4) 455 %promise.addr.raw = call i8* @llvm.coro.promise(i8* %hdl, i32 4, i1 false) 456 %promise.addr = bitcast i8* %promise.addr.raw to i32* 457 %val0 = load i32, i32* %promise.addr 458 call void @print(i32 %val0) 459 call void @llvm.coro.resume(i8* %hdl) 460 %val1 = load i32, i32* %promise.addr 461 call void @print(i32 %val1) 462 call void @llvm.coro.resume(i8* %hdl) 463 %val2 = load i32, i32* %promise.addr 464 call void @print(i32 %val2) 465 call void @llvm.coro.destroy(i8* %hdl) 466 ret i32 0 467 } 468 469After example in this section is compiled, result of the compilation will be: 470 471.. code-block:: llvm 472 473 define i32 @main() { 474 entry: 475 tail call void @print(i32 4) 476 tail call void @print(i32 5) 477 tail call void @print(i32 6) 478 ret i32 0 479 } 480 481.. _final: 482.. _final suspend: 483 484Final Suspend 485------------- 486 487A coroutine author or a frontend may designate a particular suspend to be final, 488by setting the second argument of the `coro.suspend`_ intrinsic to `true`. 489Such a suspend point has two properties: 490 491* it is possible to check whether a suspended coroutine is at the final suspend 492 point via `coro.done`_ intrinsic; 493 494* a resumption of a coroutine stopped at the final suspend point leads to 495 undefined behavior. The only possible action for a coroutine at a final 496 suspend point is destroying it via `coro.destroy`_ intrinsic. 497 498From the user perspective, the final suspend point represents an idea of a 499coroutine reaching the end. From the compiler perspective, it is an optimization 500opportunity for reducing number of resume points (and therefore switch cases) in 501the resume function. 502 503The following is an example of a function that keeps resuming the coroutine 504until the final suspend point is reached after which point the coroutine is 505destroyed: 506 507.. code-block:: llvm 508 509 define i32 @main() { 510 entry: 511 %hdl = call i8* @f(i32 4) 512 br label %while 513 while: 514 call void @llvm.coro.resume(i8* %hdl) 515 %done = call i1 @llvm.coro.done(i8* %hdl) 516 br i1 %done, label %end, label %while 517 end: 518 call void @llvm.coro.destroy(i8* %hdl) 519 ret i32 0 520 } 521 522Usually, final suspend point is a frontend injected suspend point that does not 523correspond to any explicitly authored suspend point of the high level language. 524For example, for a Python generator that has only one suspend point: 525 526.. code-block:: python 527 528 def coroutine(n): 529 for i in range(n): 530 yield i 531 532Python frontend would inject two more suspend points, so that the actual code 533looks like this: 534 535.. code-block:: c 536 537 void* coroutine(int n) { 538 int current_value; 539 <designate current_value to be coroutine promise> 540 <SUSPEND> // injected suspend point, so that the coroutine starts suspended 541 for (int i = 0; i < n; ++i) { 542 current_value = i; <SUSPEND>; // corresponds to "yield i" 543 } 544 <SUSPEND final=true> // injected final suspend point 545 } 546 547and python iterator `__next__` would look like: 548 549.. code-block:: c++ 550 551 int __next__(void* hdl) { 552 coro.resume(hdl); 553 if (coro.done(hdl)) throw StopIteration(); 554 return *(int*)coro.promise(hdl, 4, false); 555 } 556 557Intrinsics 558========== 559 560Coroutine Manipulation Intrinsics 561--------------------------------- 562 563Intrinsics described in this section are used to manipulate an existing 564coroutine. They can be used in any function which happen to have a pointer 565to a `coroutine frame`_ or a pointer to a `coroutine promise`_. 566 567.. _coro.destroy: 568 569'llvm.coro.destroy' Intrinsic 570^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 571 572Syntax: 573""""""" 574 575:: 576 577 declare void @llvm.coro.destroy(i8* <handle>) 578 579Overview: 580""""""""" 581 582The '``llvm.coro.destroy``' intrinsic destroys a suspended 583coroutine. 584 585Arguments: 586"""""""""" 587 588The argument is a coroutine handle to a suspended coroutine. 589 590Semantics: 591"""""""""" 592 593When possible, the `coro.destroy` intrinsic is replaced with a direct call to 594the coroutine destroy function. Otherwise it is replaced with an indirect call 595based on the function pointer for the destroy function stored in the coroutine 596frame. Destroying a coroutine that is not suspended leads to undefined behavior. 597 598.. _coro.resume: 599 600'llvm.coro.resume' Intrinsic 601^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 602 603:: 604 605 declare void @llvm.coro.resume(i8* <handle>) 606 607Overview: 608""""""""" 609 610The '``llvm.coro.resume``' intrinsic resumes a suspended coroutine. 611 612Arguments: 613"""""""""" 614 615The argument is a handle to a suspended coroutine. 616 617Semantics: 618"""""""""" 619 620When possible, the `coro.resume` intrinsic is replaced with a direct call to the 621coroutine resume function. Otherwise it is replaced with an indirect call based 622on the function pointer for the resume function stored in the coroutine frame. 623Resuming a coroutine that is not suspended leads to undefined behavior. 624 625.. _coro.done: 626 627'llvm.coro.done' Intrinsic 628^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 629 630:: 631 632 declare i1 @llvm.coro.done(i8* <handle>) 633 634Overview: 635""""""""" 636 637The '``llvm.coro.done``' intrinsic checks whether a suspended coroutine is at 638the final suspend point or not. 639 640Arguments: 641"""""""""" 642 643The argument is a handle to a suspended coroutine. 644 645Semantics: 646"""""""""" 647 648Using this intrinsic on a coroutine that does not have a `final suspend`_ point 649or on a coroutine that is not suspended leads to undefined behavior. 650 651.. _coro.promise: 652 653'llvm.coro.promise' Intrinsic 654^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 655 656:: 657 658 declare i8* @llvm.coro.promise(i8* <ptr>, i32 <alignment>, i1 <from>) 659 660Overview: 661""""""""" 662 663The '``llvm.coro.promise``' intrinsic obtains a pointer to a 664`coroutine promise`_ given a coroutine handle and vice versa. 665 666Arguments: 667"""""""""" 668 669The first argument is a handle to a coroutine if `from` is false. Otherwise, 670it is a pointer to a coroutine promise. 671 672The second argument is an alignment requirements of the promise. 673If a frontend designated `%promise = alloca i32` as a promise, the alignment 674argument to `coro.promise` should be the alignment of `i32` on the target 675platform. If a frontend designated `%promise = alloca i32, align 16` as a 676promise, the alignment argument should be 16. 677This argument only accepts constants. 678 679The third argument is a boolean indicating a direction of the transformation. 680If `from` is true, the intrinsic returns a coroutine handle given a pointer 681to a promise. If `from` is false, the intrinsics return a pointer to a promise 682from a coroutine handle. This argument only accepts constants. 683 684Semantics: 685"""""""""" 686 687Using this intrinsic on a coroutine that does not have a coroutine promise 688leads to undefined behavior. It is possible to read and modify coroutine 689promise of the coroutine which is currently executing. The coroutine author and 690a coroutine user are responsible to makes sure there is no data races. 691 692Example: 693"""""""" 694 695.. code-block:: llvm 696 697 define i8* @f(i32 %n) { 698 entry: 699 %promise = alloca i32 700 %pv = bitcast i32* %promise to i8* 701 ; the second argument to coro.id points to the coroutine promise. 702 %id = call token @llvm.coro.id(i32 0, i8* %pv, i8* null, i8* null) 703 ... 704 %hdl = call noalias i8* @llvm.coro.begin(token %id, i8* %alloc) 705 ... 706 store i32 42, i32* %promise ; store something into the promise 707 ... 708 ret i8* %hdl 709 } 710 711 define i32 @main() { 712 entry: 713 %hdl = call i8* @f(i32 4) ; starts the coroutine and returns its handle 714 %promise.addr.raw = call i8* @llvm.coro.promise(i8* %hdl, i32 4, i1 false) 715 %promise.addr = bitcast i8* %promise.addr.raw to i32* 716 %val = load i32, i32* %promise.addr ; load a value from the promise 717 call void @print(i32 %val) 718 call void @llvm.coro.destroy(i8* %hdl) 719 ret i32 0 720 } 721 722.. _coroutine intrinsics: 723 724Coroutine Structure Intrinsics 725------------------------------ 726Intrinsics described in this section are used within a coroutine to describe 727the coroutine structure. They should not be used outside of a coroutine. 728 729.. _coro.size: 730 731'llvm.coro.size' Intrinsic 732^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 733:: 734 735 declare i32 @llvm.coro.size.i32() 736 declare i64 @llvm.coro.size.i64() 737 738Overview: 739""""""""" 740 741The '``llvm.coro.size``' intrinsic returns the number of bytes 742required to store a `coroutine frame`_. 743 744Arguments: 745"""""""""" 746 747None 748 749Semantics: 750"""""""""" 751 752The `coro.size` intrinsic is lowered to a constant representing the size of 753the coroutine frame. 754 755.. _coro.begin: 756 757'llvm.coro.begin' Intrinsic 758^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 759:: 760 761 declare i8* @llvm.coro.begin(token <id>, i8* <mem>) 762 763Overview: 764""""""""" 765 766The '``llvm.coro.begin``' intrinsic returns an address of the coroutine frame. 767 768Arguments: 769"""""""""" 770 771The first argument is a token returned by a call to '``llvm.coro.id``' 772identifying the coroutine. 773 774The second argument is a pointer to a block of memory where coroutine frame 775will be stored if it is allocated dynamically. 776 777Semantics: 778"""""""""" 779 780Depending on the alignment requirements of the objects in the coroutine frame 781and/or on the codegen compactness reasons the pointer returned from `coro.begin` 782may be at offset to the `%mem` argument. (This could be beneficial if 783instructions that express relative access to data can be more compactly encoded 784with small positive and negative offsets). 785 786A frontend should emit exactly one `coro.begin` intrinsic per coroutine. 787 788.. _coro.free: 789 790'llvm.coro.free' Intrinsic 791^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 792:: 793 794 declare i8* @llvm.coro.free(token %id, i8* <frame>) 795 796Overview: 797""""""""" 798 799The '``llvm.coro.free``' intrinsic returns a pointer to a block of memory where 800coroutine frame is stored or `null` if this instance of a coroutine did not use 801dynamically allocated memory for its coroutine frame. 802 803Arguments: 804"""""""""" 805 806The first argument is a token returned by a call to '``llvm.coro.id``' 807identifying the coroutine. 808 809The second argument is a pointer to the coroutine frame. This should be the same 810pointer that was returned by prior `coro.begin` call. 811 812Example (custom deallocation function): 813""""""""""""""""""""""""""""""""""""""" 814 815.. code-block:: llvm 816 817 cleanup: 818 %mem = call i8* @llvm.coro.free(token %id, i8* %frame) 819 %mem_not_null = icmp ne i8* %mem, null 820 br i1 %mem_not_null, label %if.then, label %if.end 821 if.then: 822 call void @CustomFree(i8* %mem) 823 br label %if.end 824 if.end: 825 ret void 826 827Example (standard deallocation functions): 828"""""""""""""""""""""""""""""""""""""""""" 829 830.. code-block:: llvm 831 832 cleanup: 833 %mem = call i8* @llvm.coro.free(token %id, i8* %frame) 834 call void @free(i8* %mem) 835 ret void 836 837.. _coro.alloc: 838 839'llvm.coro.alloc' Intrinsic 840^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 841:: 842 843 declare i1 @llvm.coro.alloc(token <id>) 844 845Overview: 846""""""""" 847 848The '``llvm.coro.alloc``' intrinsic returns `true` if dynamic allocation is 849required to obtain a memory for the coroutine frame and `false` otherwise. 850 851Arguments: 852"""""""""" 853 854The first argument is a token returned by a call to '``llvm.coro.id``' 855identifying the coroutine. 856 857Semantics: 858"""""""""" 859 860A frontend should emit at most one `coro.alloc` intrinsic per coroutine. 861The intrinsic is used to suppress dynamic allocation of the coroutine frame 862when possible. 863 864Example: 865"""""""" 866 867.. code-block:: llvm 868 869 entry: 870 %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null) 871 %dyn.alloc.required = call i1 @llvm.coro.alloc(token %id) 872 br i1 %dyn.alloc.required, label %coro.alloc, label %coro.begin 873 874 coro.alloc: 875 %frame.size = call i32 @llvm.coro.size() 876 %alloc = call i8* @MyAlloc(i32 %frame.size) 877 br label %coro.begin 878 879 coro.begin: 880 %phi = phi i8* [ null, %entry ], [ %alloc, %coro.alloc ] 881 %frame = call i8* @llvm.coro.begin(token %id, i8* %phi) 882 883.. _coro.noop: 884 885'llvm.coro.noop' Intrinsic 886^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 887:: 888 889 declare i8* @llvm.coro.noop() 890 891Overview: 892""""""""" 893 894The '``llvm.coro.noop``' intrinsic returns an address of the coroutine frame of 895a coroutine that does nothing when resumed or destroyed. 896 897Arguments: 898"""""""""" 899 900None 901 902Semantics: 903"""""""""" 904 905This intrinsic is lowered to refer to a private constant coroutine frame. The 906resume and destroy handlers for this frame are empty functions that do nothing. 907Note that in different translation units llvm.coro.noop may return different pointers. 908 909.. _coro.frame: 910 911'llvm.coro.frame' Intrinsic 912^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 913:: 914 915 declare i8* @llvm.coro.frame() 916 917Overview: 918""""""""" 919 920The '``llvm.coro.frame``' intrinsic returns an address of the coroutine frame of 921the enclosing coroutine. 922 923Arguments: 924"""""""""" 925 926None 927 928Semantics: 929"""""""""" 930 931This intrinsic is lowered to refer to the `coro.begin`_ instruction. This is 932a frontend convenience intrinsic that makes it easier to refer to the 933coroutine frame. 934 935.. _coro.id: 936 937'llvm.coro.id' Intrinsic 938^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 939:: 940 941 declare token @llvm.coro.id(i32 <align>, i8* <promise>, i8* <coroaddr>, 942 i8* <fnaddrs>) 943 944Overview: 945""""""""" 946 947The '``llvm.coro.id``' intrinsic returns a token identifying a coroutine. 948 949Arguments: 950"""""""""" 951 952The first argument provides information on the alignment of the memory returned 953by the allocation function and given to `coro.begin` by the first argument. If 954this argument is 0, the memory is assumed to be aligned to 2 * sizeof(i8*). 955This argument only accepts constants. 956 957The second argument, if not `null`, designates a particular alloca instruction 958to be a `coroutine promise`_. 959 960The third argument is `null` coming out of the frontend. The CoroEarly pass sets 961this argument to point to the function this coro.id belongs to. 962 963The fourth argument is `null` before coroutine is split, and later is replaced 964to point to a private global constant array containing function pointers to 965outlined resume and destroy parts of the coroutine. 966 967 968Semantics: 969"""""""""" 970 971The purpose of this intrinsic is to tie together `coro.id`, `coro.alloc` and 972`coro.begin` belonging to the same coroutine to prevent optimization passes from 973duplicating any of these instructions unless entire body of the coroutine is 974duplicated. 975 976A frontend should emit exactly one `coro.id` intrinsic per coroutine. 977 978.. _coro.end: 979 980'llvm.coro.end' Intrinsic 981^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 982:: 983 984 declare i1 @llvm.coro.end(i8* <handle>, i1 <unwind>) 985 986Overview: 987""""""""" 988 989The '``llvm.coro.end``' marks the point where execution of the resume part of 990the coroutine should end and control should return to the caller. 991 992 993Arguments: 994"""""""""" 995 996The first argument should refer to the coroutine handle of the enclosing 997coroutine. A frontend is allowed to supply null as the first parameter, in this 998case `coro-early` pass will replace the null with an appropriate coroutine 999handle value. 1000 1001The second argument should be `true` if this coro.end is in the block that is 1002part of the unwind sequence leaving the coroutine body due to an exception and 1003`false` otherwise. 1004 1005Semantics: 1006"""""""""" 1007The purpose of this intrinsic is to allow frontends to mark the cleanup and 1008other code that is only relevant during the initial invocation of the coroutine 1009and should not be present in resume and destroy parts. 1010 1011This intrinsic is lowered when a coroutine is split into 1012the start, resume and destroy parts. In the start part, it is a no-op, 1013in resume and destroy parts, it is replaced with `ret void` instruction and 1014the rest of the block containing `coro.end` instruction is discarded. 1015In landing pads it is replaced with an appropriate instruction to unwind to 1016caller. The handling of coro.end differs depending on whether the target is 1017using landingpad or WinEH exception model. 1018 1019For landingpad based exception model, it is expected that frontend uses the 1020`coro.end`_ intrinsic as follows: 1021 1022.. code-block:: llvm 1023 1024 ehcleanup: 1025 %InResumePart = call i1 @llvm.coro.end(i8* null, i1 true) 1026 br i1 %InResumePart, label %eh.resume, label %cleanup.cont 1027 1028 cleanup.cont: 1029 ; rest of the cleanup 1030 1031 eh.resume: 1032 %exn = load i8*, i8** %exn.slot, align 8 1033 %sel = load i32, i32* %ehselector.slot, align 4 1034 %lpad.val = insertvalue { i8*, i32 } undef, i8* %exn, 0 1035 %lpad.val29 = insertvalue { i8*, i32 } %lpad.val, i32 %sel, 1 1036 resume { i8*, i32 } %lpad.val29 1037 1038The `CoroSpit` pass replaces `coro.end` with ``True`` in the resume functions, 1039thus leading to immediate unwind to the caller, whereas in start function it 1040is replaced with ``False``, thus allowing to proceed to the rest of the cleanup 1041code that is only needed during initial invocation of the coroutine. 1042 1043For Windows Exception handling model, a frontend should attach a funclet bundle 1044referring to an enclosing cleanuppad as follows: 1045 1046.. code-block:: llvm 1047 1048 ehcleanup: 1049 %tok = cleanuppad within none [] 1050 %unused = call i1 @llvm.coro.end(i8* null, i1 true) [ "funclet"(token %tok) ] 1051 cleanupret from %tok unwind label %RestOfTheCleanup 1052 1053The `CoroSplit` pass, if the funclet bundle is present, will insert 1054``cleanupret from %tok unwind to caller`` before 1055the `coro.end`_ intrinsic and will remove the rest of the block. 1056 1057The following table summarizes the handling of `coro.end`_ intrinsic. 1058 1059+--------------------------+-------------------+-------------------------------+ 1060| | In Start Function | In Resume/Destroy Functions | 1061+--------------------------+-------------------+-------------------------------+ 1062|unwind=false | nothing |``ret void`` | 1063+------------+-------------+-------------------+-------------------------------+ 1064| | WinEH | nothing |``cleanupret unwind to caller``| 1065|unwind=true +-------------+-------------------+-------------------------------+ 1066| | Landingpad | nothing | nothing | 1067+------------+-------------+-------------------+-------------------------------+ 1068 1069.. _coro.suspend: 1070.. _suspend points: 1071 1072'llvm.coro.suspend' Intrinsic 1073^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1074:: 1075 1076 declare i8 @llvm.coro.suspend(token <save>, i1 <final>) 1077 1078Overview: 1079""""""""" 1080 1081The '``llvm.coro.suspend``' marks the point where execution of the coroutine 1082need to get suspended and control returned back to the caller. 1083Conditional branches consuming the result of this intrinsic lead to basic blocks 1084where coroutine should proceed when suspended (-1), resumed (0) or destroyed 1085(1). 1086 1087Arguments: 1088"""""""""" 1089 1090The first argument refers to a token of `coro.save` intrinsic that marks the 1091point when coroutine state is prepared for suspension. If `none` token is passed, 1092the intrinsic behaves as if there were a `coro.save` immediately preceding 1093the `coro.suspend` intrinsic. 1094 1095The second argument indicates whether this suspension point is `final`_. 1096The second argument only accepts constants. If more than one suspend point is 1097designated as final, the resume and destroy branches should lead to the same 1098basic blocks. 1099 1100Example (normal suspend point): 1101""""""""""""""""""""""""""""""" 1102 1103.. code-block:: llvm 1104 1105 %0 = call i8 @llvm.coro.suspend(token none, i1 false) 1106 switch i8 %0, label %suspend [i8 0, label %resume 1107 i8 1, label %cleanup] 1108 1109Example (final suspend point): 1110"""""""""""""""""""""""""""""" 1111 1112.. code-block:: llvm 1113 1114 while.end: 1115 %s.final = call i8 @llvm.coro.suspend(token none, i1 true) 1116 switch i8 %s.final, label %suspend [i8 0, label %trap 1117 i8 1, label %cleanup] 1118 trap: 1119 call void @llvm.trap() 1120 unreachable 1121 1122Semantics: 1123"""""""""" 1124 1125If a coroutine that was suspended at the suspend point marked by this intrinsic 1126is resumed via `coro.resume`_ the control will transfer to the basic block 1127of the 0-case. If it is resumed via `coro.destroy`_, it will proceed to the 1128basic block indicated by the 1-case. To suspend, coroutine proceed to the 1129default label. 1130 1131If suspend intrinsic is marked as final, it can consider the `true` branch 1132unreachable and can perform optimizations that can take advantage of that fact. 1133 1134.. _coro.save: 1135 1136'llvm.coro.save' Intrinsic 1137^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1138:: 1139 1140 declare token @llvm.coro.save(i8* <handle>) 1141 1142Overview: 1143""""""""" 1144 1145The '``llvm.coro.save``' marks the point where a coroutine need to update its 1146state to prepare for resumption to be considered suspended (and thus eligible 1147for resumption). 1148 1149Arguments: 1150"""""""""" 1151 1152The first argument points to a coroutine handle of the enclosing coroutine. 1153 1154Semantics: 1155"""""""""" 1156 1157Whatever coroutine state changes are required to enable resumption of 1158the coroutine from the corresponding suspend point should be done at the point 1159of `coro.save` intrinsic. 1160 1161Example: 1162"""""""" 1163 1164Separate save and suspend points are necessary when a coroutine is used to 1165represent an asynchronous control flow driven by callbacks representing 1166completions of asynchronous operations. 1167 1168In such a case, a coroutine should be ready for resumption prior to a call to 1169`async_op` function that may trigger resumption of a coroutine from the same or 1170a different thread possibly prior to `async_op` call returning control back 1171to the coroutine: 1172 1173.. code-block:: llvm 1174 1175 %save1 = call token @llvm.coro.save(i8* %hdl) 1176 call void @async_op1(i8* %hdl) 1177 %suspend1 = call i1 @llvm.coro.suspend(token %save1, i1 false) 1178 switch i8 %suspend1, label %suspend [i8 0, label %resume1 1179 i8 1, label %cleanup] 1180 1181.. _coro.param: 1182 1183'llvm.coro.param' Intrinsic 1184^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1185:: 1186 1187 declare i1 @llvm.coro.param(i8* <original>, i8* <copy>) 1188 1189Overview: 1190""""""""" 1191 1192The '``llvm.coro.param``' is used by a frontend to mark up the code used to 1193construct and destruct copies of the parameters. If the optimizer discovers that 1194a particular parameter copy is not used after any suspends, it can remove the 1195construction and destruction of the copy by replacing corresponding coro.param 1196with `i1 false` and replacing any use of the `copy` with the `original`. 1197 1198Arguments: 1199"""""""""" 1200 1201The first argument points to an `alloca` storing the value of a parameter to a 1202coroutine. 1203 1204The second argument points to an `alloca` storing the value of the copy of that 1205parameter. 1206 1207Semantics: 1208"""""""""" 1209 1210The optimizer is free to always replace this intrinsic with `i1 true`. 1211 1212The optimizer is also allowed to replace it with `i1 false` provided that the 1213parameter copy is only used prior to control flow reaching any of the suspend 1214points. The code that would be DCE'd if the `coro.param` is replaced with 1215`i1 false` is not considered to be a use of the parameter copy. 1216 1217The frontend can emit this intrinsic if its language rules allow for this 1218optimization. 1219 1220Example: 1221"""""""" 1222Consider the following example. A coroutine takes two parameters `a` and `b` 1223that has a destructor and a move constructor. 1224 1225.. code-block:: c++ 1226 1227 struct A { ~A(); A(A&&); bool foo(); void bar(); }; 1228 1229 task<int> f(A a, A b) { 1230 if (a.foo()) 1231 return 42; 1232 1233 a.bar(); 1234 co_await read_async(); // introduces suspend point 1235 b.bar(); 1236 } 1237 1238Note that, uses of `b` is used after a suspend point and thus must be copied 1239into a coroutine frame, whereas `a` does not have to, since it never used 1240after suspend. 1241 1242A frontend can create parameter copies for `a` and `b` as follows: 1243 1244.. code-block:: text 1245 1246 task<int> f(A a', A b') { 1247 a = alloca A; 1248 b = alloca A; 1249 // move parameters to its copies 1250 if (coro.param(a', a)) A::A(a, A&& a'); 1251 if (coro.param(b', b)) A::A(b, A&& b'); 1252 ... 1253 // destroy parameters copies 1254 if (coro.param(a', a)) A::~A(a); 1255 if (coro.param(b', b)) A::~A(b); 1256 } 1257 1258The optimizer can replace coro.param(a',a) with `i1 false` and replace all uses 1259of `a` with `a'`, since it is not used after suspend. 1260 1261The optimizer must replace coro.param(b', b) with `i1 true`, since `b` is used 1262after suspend and therefore, it has to reside in the coroutine frame. 1263 1264Coroutine Transformation Passes 1265=============================== 1266CoroEarly 1267--------- 1268The pass CoroEarly lowers coroutine intrinsics that hide the details of the 1269structure of the coroutine frame, but, otherwise not needed to be preserved to 1270help later coroutine passes. This pass lowers `coro.frame`_, `coro.done`_, 1271and `coro.promise`_ intrinsics. 1272 1273.. _CoroSplit: 1274 1275CoroSplit 1276--------- 1277The pass CoroSplit buides coroutine frame and outlines resume and destroy parts 1278into separate functions. 1279 1280CoroElide 1281--------- 1282The pass CoroElide examines if the inlined coroutine is eligible for heap 1283allocation elision optimization. If so, it replaces 1284`coro.begin` intrinsic with an address of a coroutine frame placed on its caller 1285and replaces `coro.alloc` and `coro.free` intrinsics with `false` and `null` 1286respectively to remove the deallocation code. 1287This pass also replaces `coro.resume` and `coro.destroy` intrinsics with direct 1288calls to resume and destroy functions for a particular coroutine where possible. 1289 1290CoroCleanup 1291----------- 1292This pass runs late to lower all coroutine related intrinsics not replaced by 1293earlier passes. 1294 1295Areas Requiring Attention 1296========================= 1297#. A coroutine frame is bigger than it could be. Adding stack packing and stack 1298 coloring like optimization on the coroutine frame will result in tighter 1299 coroutine frames. 1300 1301#. Take advantage of the lifetime intrinsics for the data that goes into the 1302 coroutine frame. Leave lifetime intrinsics as is for the data that stays in 1303 allocas. 1304 1305#. The CoroElide optimization pass relies on coroutine ramp function to be 1306 inlined. It would be beneficial to split the ramp function further to 1307 increase the chance that it will get inlined into its caller. 1308 1309#. Design a convention that would make it possible to apply coroutine heap 1310 elision optimization across ABI boundaries. 1311 1312#. Cannot handle coroutines with `inalloca` parameters (used in x86 on Windows). 1313 1314#. Alignment is ignored by coro.begin and coro.free intrinsics. 1315 1316#. Make required changes to make sure that coroutine optimizations work with 1317 LTO. 1318 1319#. More tests, more tests, more tests 1320