1Name 2 3 NV_gpu_multicast 4 5Name Strings 6 7 GL_NV_gpu_multicast 8 9Contact 10 11 Joshua Schnarr, NVIDIA Corporation (jschnarr 'at' nvidia.com) 12 Ingo Esser, NVIDIA Corporation (iesser 'at' nvidia.com) 13 14Contributors 15 16 Christoph Kubisch, NVIDIA 17 Mark Kilgard, NVIDIA 18 Robert Menzel, NVIDIA 19 Kevin Lefebvre, NVIDIA 20 Ralf Biermann, NVIDIA 21 22Status 23 24 Shipping in NVIDIA release 370.XX drivers and up. 25 26Version 27 28 Last Modified Date: April 2, 2019 29 Revision: 7 30 31Number 32 33 OpenGL Extension #494 34 35Dependencies 36 37 This extension is written against the OpenGL 4.5 specification 38 (Compatibility Profile), dated February 2, 2015. 39 40 This extension requires ARB_copy_image. 41 42 This extension interacts with ARB_sample_locations. 43 44 This extension interacts with ARB_sparse_buffer. 45 46 This extension requires EXT_direct_state_access. 47 48 This extension interacts with EXT_bindable_uniform 49 50Overview 51 52 This extension enables novel multi-GPU rendering techniques by providing application control 53 over a group of linked GPUs with identical hardware configuration. 54 55 Multi-GPU rendering techniques fall into two categories: implicit and explicit. Existing 56 explicit approaches like WGL_NV_gpu_affinity have two main drawbacks: CPU overhead and 57 application complexity. An application must manage one context per GPU and multi-pump the API 58 stream. Implicit multi-GPU rendering techniques avoid these issues by broadcasting rendering 59 from one context to multiple GPUs. Common implicit approaches include alternate-frame 60 rendering (AFR), split-frame rendering (SFR) and multi-GPU anti-aliasing. They each have 61 drawbacks. AFR scales nicely but interacts poorly with inter-frame dependencies. SFR can 62 improve latency but has challenges with offscreen rendering and scaling of vertex processing. 63 With multi-GPU anti-aliasing, each GPU renders the same content with alternate sample 64 positions and the driver blends the result to improve quality. This also has issues with 65 offscreen rendering and can conflict with other anti-aliasing techniques. 66 67 These issues with implicit multi-GPU rendering all have the same root cause: the driver lacks 68 adequate knowledge to accelerate every application. To resolve this, NV_gpu_multicast 69 provides fine-grained, explicit application control over multiple GPUs with a single context. 70 71 Key points: 72 73 - One context controls multiple GPUs. Every GPU in the linked group can access every object. 74 75 - Rendering is broadcast. Each draw is repeated across all GPUs in the linked group. 76 77 - Each GPU gets its own instance of all framebuffers, allowing individualized output for each 78 GPU. Input data can be customized for each GPU using buffers created with the storage flag, 79 PER_GPU_STORAGE_BIT_NV and a new API, MulticastBufferSubDataNV. 80 81 - New interfaces provide mechanisms to transfer textures and buffers from one GPU to another. 82 83New Procedures and Functions 84 85 void RenderGpuMaskNV(bitfield mask); 86 87 void MulticastBufferSubDataNV( 88 bitfield gpuMask, uint buffer, 89 intptr offset, sizeiptr size, 90 const void *data); 91 92 void MulticastCopyBufferSubDataNV( 93 uint readGpu, bitfield writeGpuMask, 94 uint readBuffer, uint writeBuffer, 95 intptr readOffset, intptr writeOffset, sizeiptr size); 96 97 void MulticastCopyImageSubDataNV( 98 uint srcGpu, bitfield dstGpuMask, 99 uint srcName, enum srcTarget, 100 int srcLevel, 101 int srcX, int srcY, int srcZ, 102 uint dstName, enum dstTarget, 103 int dstLevel, 104 int dstX, int dstY, int dstZ, 105 sizei srcWidth, sizei srcHeight, sizei srcDepth); 106 107 void MulticastBlitFramebufferNV(uint srcGpu, uint dstGpu, 108 int srcX0, int srcY0, int srcX1, int srcY1, 109 int dstX0, int dstY0, int dstX1, int dstY1, 110 bitfield mask, enum filter); 111 112 void MulticastFramebufferSampleLocationsfvNV(uint gpu, uint framebuffer, uint start, 113 sizei count, const float *v); 114 115 void MulticastBarrierNV(void); 116 117 void MulticastWaitSyncNV(uint signalGpu, bitfield waitGpuMask); 118 119 void MulticastGetQueryObjectivNV(uint gpu, uint id, enum pname, int *params); 120 void MulticastGetQueryObjectuivNV(uint gpu, uint id, enum pname, uint *params); 121 void MulticastGetQueryObjecti64vNV(uint gpu, uint id, enum pname, int64 *params); 122 void MulticastGetQueryObjectui64vNV(uint gpu, uint id, enum pname, uint64 *params); 123 124New Tokens 125 126 Accepted in the <flags> parameter of BufferStorage and NamedBufferStorageEXT: 127 128 PER_GPU_STORAGE_BIT_NV 0x0800 129 130 Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, GetInteger64v, GetFloatv, and 131 GetDoublev: 132 133 MULTICAST_GPUS_NV 0x92BA 134 RENDER_GPU_MASK_NV 0x9558 135 136 Accepted as a value for <pname> for the TexParameter{if}, TexParameter{if}v, 137 TextureParameter{if}, TextureParameter{if}v, MultiTexParameter{if}EXT and 138 MultiTexParameter{if}vEXT commands and for the <value> parameter of GetTexParameter{if}v, 139 GetTextureParameter{if}vEXT and GetMultiTexParameter{if}vEXT: 140 141 PER_GPU_STORAGE_NV 0x9548 142 143 Accepted by the <pname> parameter of GetMultisamplefv: 144 145 MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV 0x9549 146 147Additions to the OpenGL 4.5 Specification (Compatibility Profile) 148 149 (Add a new chapter after chapter 19 "Compute Shaders") 150 151 20 Multicast Rendering 152 153 Some implementations support multiple linked GPUs driven by a single context. Often the 154 distribution of work to individual GPUs is managed by the GL without client knowledge. This 155 chapter specifies commands for explicitly distributing work across GPUs in a linked group. 156 Rendering can be enabled or disabled for specific GPUs. Draw commands are multicast, or 157 repeated across all enabled GPUs. Objects are shared by all GPUs, however each GPU has its 158 own instance (copy) of many resources, including framebuffers. When each GPU has its own 159 instance of a resource, it is considered to have per-GPU storage. When all GPUs share a 160 single instance of a resource, this is considered GPU-shared storage. 161 162 The mechanism for linking GPUs is implementation specific, as is the mechanism for enabling 163 multicast rendering support (if necessary). The number of GPUs usable for multicast rendering 164 by a context can be queried by calling GetIntegerv with the symbolic constant 165 MULTICAST_GPUS_NV. This number is constant for the lifetime of a context. Individual GPUs 166 are identified using zero-based indices in the range [0, n-1], where n is the number of 167 multicast GPUs. GPUs are also identified by bitmasks of the form 2^i, where i is the GPU 168 index. A set of GPUs is specified by the union of masks for each GPU in the set. 169 170 20.1 Controlling Individual GPUs 171 172 Render commands are restricted to a specific set of GPUs with 173 174 void RenderGpuMaskNV(bitfield mask); 175 176 The following errors apply to RenderGpuMaskNV: 177 178 INVALID_OPERATION is generated 179 * if <mask> is zero, 180 * if <mask> is not zero and <mask> is greater than or equal to 2^n, where n is equal 181 to MULTICAST_GPUS_NV, 182 * if issued between BeginConditionalRender and the corresponding EndConditionalRender. 183 184 If the command does not generate an error, RENDER_GPU_MASK_NV is set to <mask>. The default 185 value of RENDER_GPU_MASK_NV is (2^n)-1. 186 187 Render commands are skipped for a GPU that is not present in RENDER_GPU_MASK_NV. For example: 188 draw calls, clears, compute dispatches, and copies or pixel path operations that write to a 189 framebuffer (e.g. DrawPixels, BlitFramebuffer). For a full list of render commands see 190 section 2.4 (page 26). MulticastBlitFramebufferNV is an exception to this policy: while it is 191 a rendering command, it has its own source and destinations mask. Note that buffer and 192 textures updates are not affected by RENDER_GPU_MASK_NV. 193 194 20.2 Multi-GPU Buffer Storage 195 196 Like other resources, buffer objects can have two types of storage, per-GPU storage or 197 GPU-shared storage. Per-GPU storage can be explicitly requested using the 198 PER_GPU_STORAGE_BIT_NV flag with BufferStorage/NamedBufferStorageEXT. If this flag is not 199 set, the type of storage used is undefined. The implementation may use either type and 200 transition between them at any time. Client reads of a buffer with per-GPU storage may source 201 from any GPU. 202 203 The following rules apply to buffer objects with per-GPU storage: 204 205 When mapped updates apply to all GPUs (only WRITE_ONLY access is supported). 206 When used as the write buffer for CopyBufferSubData or CopyNamedBufferSubData, writes apply 207 to all GPUs. 208 209 The following commands affect storage on all GPUs, even if the buffer object has per-GPU 210 storage: 211 212 BufferSubData, NamedBufferSubData, ClearBufferSubData, and ClearNamedBufferData 213 214 An INVALID_VALUE error is generated if BufferStorage/NamedBufferStorageEXT is called with 215 PER_GPU_STORAGE_BIT_NV set with MAP_READ_BIT or SPARSE_STORAGE_BIT_ARB. 216 217 To modify buffer object data on one or more GPUs, the client may use the command 218 219 void MulticastBufferSubDataNV( 220 bitfield gpuMask, uint buffer, 221 intptr offset, sizeiptr size, 222 const void *data); 223 224 This command operates similarly to NamedBufferSubData, except that it updates the per-GPU 225 buffer data on the set of GPUs defined by <gpuMask>. If <buffer> has GPU-shared storage, 226 <gpuMask> is ignored and the shared instance of the buffer is updated. 227 228 An INVALID_VALUE error is generated if <gpuMask> is zero or is greater than or equal to 2^n, 229 where n is equal to MULTICAST_GPUS_NV. 230 An INVALID_OPERATION error is generated if <buffer> is not the name of an existing buffer 231 object. 232 An INVALID_VALUE error is generated if <offset> or <size> is negative, or if <offset> + <size> 233 is greater than the value of BUFFER_SIZE for the buffer object. 234 An INVALID_OPERATION error is generated if any part of the specified buffer range is mapped 235 with MapBufferRange or MapBuffer (see section 6.3), unless it was mapped with 236 MAP_PERSISTENT_BIT set in the MapBufferRange access flags. 237 An INVALID_OPERATION error is generated if the BUFFER_IMMUTABLE_STORAGE flag of the buffer 238 object is TRUE and the value of BUFFER_STORAGE_FLAGS for the buffer does not have the 239 DYNAMIC_STORAGE_BIT set. 240 241 To copy between buffers created with PER_GPU_STORAGE_BIT_NV, the client may use the command 242 243 void MulticastCopyBufferSubDataNV( 244 uint readGpu, bitfield writeGpuMask, 245 uint readBuffer, uint writeBuffer, 246 intptr readOffset, intptr writeOffset, sizeiptr size); 247 248 This command operates similarly to CopyNamedBufferSubData, while adding control over the 249 source and destination GPU(s). The read GPU index is specified by <readGpu> and 250 the set of write GPUs is specified by the mask in <writeGpuMask>. 251 252 Implementations may also support this command with buffers not created with 253 PER_GPU_STORAGE_BIT_NV. This support can be determined with one test copy with an error check 254 (see error discussion below). Note that a buffer created without PER_GPU_STORAGE_BIT_NV is 255 considered to have undefined storage and the behavior of the command depends on the storage 256 type (per-GPU or GPU-shared) currently used for <writeBuffer>. If <writeBuffer> is using 257 GPU-shared storage, the normal error checks apply but the command behaves as if <writeGpuMask> 258 includes all GPUs. If <writeBuffer> is using per-GPU storage, the command behaves as if 259 PER_GPU_STORAGE_BIT_NV were set, however performance may be reduced. 260 261 This following error may apply to MulticastCopyBufferSubDataNV on some implementations and not 262 on others. In earlier revisions of this extension the error was required, therefore 263 applications should perform a test copy using buffers without PER_GPU_STORAGE_BIT_NV before 264 relying on that functionality: 265 266 An INVALID_OPERATION error is generated if the value of BUFFER_STORAGE_FLAGS for <readBuffer> 267 or <writeBuffer> does not have PER_GPU_STORAGE_BIT_NV set. 268 269 The following errors apply to MulticastCopyBufferSubDataNV: 270 271 An INVALID_OPERATION error is generated if <readBuffer> or <writeBuffer> is not the name of an 272 existing buffer object. 273 An INVALID_VALUE error is generated if any of <readOffset>, <writeOffset>, or <size> are 274 negative, if <readOffset> + <size> exceeds the size of the source buffer object, or if 275 <writeOffset> + <size> exceeds the size of the destination buffer object. 276 An INVALID_OPERATION error is generated if either the source or destination buffer objects is 277 mapped, unless they were mapped with MAP_PERSISTENT_BIT set in the Map*BufferRange access 278 flags. 279 An INVALID_VALUE error is generated if <readGpu> is greater than or equal to 280 MULTICAST_GPUS_NV. 281 An INVALID_OPERATION error is generated if <writeGpuMask> is zero. An INVALID_VALUE error is 282 generated if <writeGpuMask> is not zero and <writeGpuMask> is greater than or equal to 2^n, 283 where n is equal to MULTICAST_GPUS_NV. 284 An INVALID_VALUE error is generated if the source and destination are the same buffer object, 285 <readGpu> is present in <writeGpuMask>, and the ranges [<readOffset>; <readOffset> + <size>) 286 and [<writeOffset>; <writeOffset> + <size>) overlap. 287 288 20.3 Multi-GPU Framebuffers and Textures 289 290 All buffers in the default framebuffer as well as renderbuffers receive per-GPU storage. By 291 default, storage for textures is undefined: it may be per-GPU or GPU-shared and can transition 292 between the types at any time. Per-GPU storage can be specified via 293 [Multi]Tex[ture]Parameter{if}[v] with PER_GPU_STORAGE_NV for the <pname> argument and TRUE for 294 the value. For this storage parameter to take effect, it must be specified after the texture 295 object is created and before the texture contents are defined by TexImage*, TexStorage* or 296 TextureStorage*. 297 298 20.3.1 Copying Image Data Between GPUs 299 300 To copy texel data between GPUs, the client may use the command: 301 302 void MulticastCopyImageSubDataNV( 303 uint srcGpu, bitfield dstGpuMask, 304 uint srcName, enum srcTarget, 305 int srcLevel, 306 int srcX, int srcY, int srcZ, 307 uint dstName, enum dstTarget, 308 int dstLevel, 309 int dstX, int dstY, int dstZ, 310 sizei srcWidth, sizei srcHeight, sizei srcDepth); 311 312 This command operates equivalently to CopyImageSubData, except that it takes a source GPU and 313 a destination GPU set defined by <srcGpu> and <dstGpuMask> (respectively). Texel data is 314 copied from the source GPU to all destination GPUs. The following errors apply to 315 MulticastCopyImageSubDataNV: 316 317 INVALID_ENUM is generated 318 * if either <srcTarget> or <dstTarget> 319 - is not RENDERBUFFER or a valid non-proxy texture target 320 - is TEXTURE_BUFFER, or 321 - is one of the cubemap face selectors described in table 3.17, 322 * if the target does not match the type of the object. 323 324 INVALID_OPERATION is generated 325 * if either object is a texture and the texture is not complete, 326 * if the source and destination formats are not compatible, 327 * if the source and destination number of samples do not match, 328 * if one image is compressed and the other is uncompressed and the 329 block size of compressed image is not equal to the texel size 330 of the compressed image. 331 332 INVALID_VALUE is generated 333 * if <srcGpu> is greater than or equal to MULTICAST_GPUS_NV, 334 * if <dstGpuMask> is zero, 335 * if <dstGpuMask> is greater than or equal to 2^n, where n is equal to 336 MULTICAST_GPUS_NV, 337 * if either <srcName> or <dstName> does not correspond to a valid 338 renderbuffer or texture object according to the corresponding 339 target parameter, or 340 * if the specified level is not a valid level for the image, or 341 * if the dimensions of the either subregion exceeds the boundaries 342 of the corresponding image object, or 343 * if the image format is compressed and the dimensions of the 344 subregion fail to meet the alignment constraints of the format. 345 346 To copy pixel values from one GPU to another use the following command: 347 348 void MulticastBlitFramebufferNV(uint srcGpu, uint dstGpu, 349 int srcX0, int srcY0, int srcX1, int srcY1, 350 int dstX0, int dstY0, int dstX1, int dstY1, 351 bitfield mask, enum filter); 352 353 This command operates equivalently to BlitNamedFramebuffer except that it takes a source GPU 354 and a destination GPU defined by <srcGpu> and <dstGpu> (respectively). Pixel values are 355 copied from the read framebuffer on the source GPU to the draw framebuffer on the destination 356 GPU. 357 358 In addition to the errors generated by BlitNamedFramebuffer (see listing starting on page 359 634), calling MulticastBlitFramebufferNV will generate INVALID_VALUE if <srcGpu> or <dstGpu> 360 is greater than or equal to MULTICAST_GPUS_NV. 361 362 20.3.2 Per-GPU Sample Locations 363 364 Programmable sample locations can be customized for each GPU and framebuffer using the 365 following command: 366 367 void MulticastFramebufferSampleLocationsfvNV(uint gpu, uint framebuffer, uint start, 368 sizei count, const float *v); 369 370 An INVALID_OPERATION error is generated by MulticastFramebufferSampleLocationsfvNV if 371 <framebuffer> is not the name of an existing framebuffer object. 372 373 INVALID_VALUE is generated if the sum of <start> and <count> is greater than 374 PROGRAMMABLE_SAMPLE_LOCATION_TABLE_SIZE_ARB. 375 376 An INVALID_VALUE error is generated if <gpu> is greater than or equal to MULTICAST_GPUS_NV. 377 378 This is equivalent to FramebufferSampleLocationsfvARB except that it sets 379 MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV at the appropriate offset for the specified GPU. 380 Just as with FramebufferSampleLocationsfvARB, FRAMEBUFFER_PROGRAMMABLE_SAMPLE_LOCATIONS_ARB 381 must be enabled for these sample locations to take effect. FramebufferSampleLocationsfvARB 382 and NamedFramebufferSampleLocationsfvARB also set MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV 383 but for the specified sample across all multicast GPUs. If <gpu> is 0, 384 MulticastFramebufferSampleLocationsfvNV updates PROGRAMMABLE_SAMPLE_LOCATION_ARB in addition 385 to MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV. 386 387 The programmed sample locations can be retrieved using GetMultisamplefv with <pname> set to 388 MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV and indices calculated as follows: 389 390 index_x = gpu * PROGRAMMABLE_SAMPLE_LOCATION_TABLE_SIZE_ARB + 2 * sample_i; 391 index_y = gpu * PROGRAMMABLE_SAMPLE_LOCATION_TABLE_SIZE_ARB + 2 * sample_i + 1; 392 393 20.4 Interactions with Other Copy Functions 394 395 Many existing commands can be used to copy between resources with GPU-shared, per-GPU or 396 undefined storage. For example: ReadPixels, GetBufferSubData or TexImage2D with a pixel 397 unpack buffer. The following table defines how the storage of the resource influences the 398 behavior of these copies. 399 400 Table 20.1 Behavior of Copy Commands with Multi-GPU Storage 401 402 Source Destination Behavior 403 ---------- ----------- ----------------------------------------------------------------------- 404 GPU-shared GPU-shared There is just one source and one destination. Copy from source to 405 destination. 406 GPU-shared per-GPU There is a single source. Copy it to the destination on all GPUs. 407 GPU-shared undefined Either of the above behaviors for a GPU-shared source may apply. 408 409 per-GPU GPU-shared Copy from the GPU with the lowest index set in RENDER_GPU_MASK_NV to 410 to the shared destination. 411 per-GPU per-GPU Implementations are encouraged to copy from source to destination 412 separately on each GPU. This is not required. If and when this is not 413 feasible, the copy should source from the GPU with the lowest index set 414 in RENDER_GPU_MASK_NV. 415 per-GPU undefined Either of the above behaviors for a per-GPU source may apply. 416 417 undefined GPU-shared Either of the above behaviors for a GPU-shared destination may apply. 418 undefined per-GPU Either of the above behaviors for a per-GPU destination may apply. 419 undefined undefined Any of the above behaviors may apply. 420 421 20.5 Multi-GPU Synchronization 422 423 MulticastCopyImageSubDataNV and MulticastCopyBufferSubDataNV each provide implicit 424 synchronization with previous work on the source GPU. MulticastBlitFramebufferNV is 425 different, providing implicit synchronization with previous work on the destination GPU. 426 In both cases, synchronization of the copies can be achieved with calls to the barrier 427 command: 428 429 void MulticastBarrierNV(void); 430 431 This is called to block all GPUs until all previous commands have been completed by all GPUs, 432 and all writes have landed. To guarantee consistency, synchronization must be placed between 433 any two accesses by multiple GPUs to the same memory when at least one of the accesses is a 434 write. This includes accesses to both the source and the destination. The safest approach is 435 to call MulticastBarrierNV immediately before and after each copy that involves multiple GPUs. 436 437 GPU writes and reads to/from GPU-shared locations require synchronization as well. GPU writes 438 such as transform feedback, shader image store, CopyTexImage, CopyBufferSubData are not 439 automatically synchronized with writes by other GPUs. Neither are GPU reads such as texture 440 fetches, shader image loads, CopyTexImage, etc. synchronized with writes by other GPUs. 441 Existing barriers such as TextureBarrier and MemoryBarrier only provide consistency guarantees 442 for rendering, writes and reads on a single GPU. 443 444 In some cases it may be desirable to have one or more GPUs wait for an operation to complete 445 on another GPU without synchronizing all GPUs with MulticastBarrierNV. This can be performed 446 with the following command: 447 448 void MulticastWaitSyncNV(uint signalGpu, bitfield waitGpuMask); 449 450 INVALID_VALUE is generated 451 * if <signalGpu> is greater than or equal to MULTICAST_GPUS_NV, 452 * if <waitGpuMask> is zero, 453 * if <waitGpuMask> is greater than or equal to 2^n, where n is equal to 454 MULTICAST_GPUS_NV, or 455 * if <signalGpu> is present in <waitGpuMask>. 456 457 MulticastWaitSyncNV provides the same consistency guarantees as MulticastBarrierNV but only 458 between the GPUs specified by <signalGpu> and <waitGpuMask> in a single direction. It forces 459 the GPUs specified by waitGpuMask to wait until the GPU specified by <signalGpu> has completed 460 all previous commands and writes associated with those commands. 461 462 20.6 Multi-GPU Queries 463 464 Queries are performed across all multicast GPUs. Each query object stores independent result 465 values for each GPU. The result value for a specific GPU can be queried using one of the 466 following commands: 467 468 void MulticastGetQueryObjectivNV(uint gpu, uint id, enum pname, int *params); 469 void MulticastGetQueryObjectuivNV(uint gpu, uint id, enum pname, uint *params); 470 void MulticastGetQueryObjecti64vNV(uint gpu, uint id, enum pname, int64 *params); 471 void MulticastGetQueryObjectui64vNV(uint gpu, uint id, enum pname, uint64 *params); 472 473 The behavior of these commands matches the GetQueryObject* equivalent commands, except they 474 return the result value for the specified GPU. A query may be available on one GPU but not on 475 another, so it may be necessary to check QUERY_RESULT_AVAILABLE for each GPU. GetQueryObject* 476 return query results and availability for GPU 0 only. 477 478 In addition to the errors generated by GetQueryObject* (see the listing in section 4.2 on page 479 49), calling MulticastGetQueryObject* will generate INVALID_VALUE if <gpu> is greater than or 480 equal to MULTICAST_GPUS_NV. 481 482Additions to Chapter 8 of the OpenGL 4.5 (Compatibility Profile) Specification 483(Textures and Samplers) 484 485 Modify Section 8.10 (Texture Parameters) 486 487 Insert the following paragraph before Table 8.25 (Texture parameters and their values): 488 489 If <pname> is PER_GPU_STORAGE_NV, then the state is stored in the texture, but only takes 490 effect the next time storage is allocated for a texture using TexImage*, TexStorage* or 491 TextureStorage*. If the value of TEXTURE_IMMUTABLE_FORMAT is TRUE, then PER_GPU_STORAGE_NV 492 cannot be changed and an error is generated. 493 494 Additions to Table 8.26 Texture parameters and their values 495 496 Name Type Legal values 497 ------------------ ------- ------------ 498 PER_GPU_STORAGE_NV boolean TRUE, FALSE 499 500Additions to Chapter 10 of the OpenGL 4.5 (Compatibility Profile) Specification 501(Vertex Specification and Drawing Commands) 502 503 Modify Section 10.9 (Conditional Rendering) 504 505 Replace the following text: 506 507 If the result (SAMPLES_PASSED) of the query is zero, or if the result (ANY_SAMPLES_PASSED 508 or ANY_SAMPLES_- PASSED_CONSERVATIVE) is FALSE, all rendering commands described in 509 section 2.4 are discarded and have no effect when issued between BeginConditional- Render 510 and the corresponding EndConditionalRender 511 512 with this text: 513 514 For each active render GPU, if the result (SAMPLES_PASSED) of the query on that GPU is 515 zero, or if the result (ANY_SAMPLES_PASSED or ANY_SAMPLES_- PASSED_CONSERVATIVE) is FALSE, 516 all rendering commands described in section 2.4 are discarded by this GPU and have no 517 effect when issued between BeginConditional- Render and the corresponding 518 EndConditionalRender 519 520 Similarly replace the following: 521 522 If the result (SAMPLES_PASSED) of the query is non-zero, or if the result 523 (ANY_SAMPLES_PASSED or ANY_SAMPLES_PASSED_- CONSERVATIVE) is TRUE, such commands are not 524 discarded. 525 526 with this: 527 528 For each active render GPU, if the result (SAMPLES_PASSED) of the query on that GPU is 529 non-zero, or if the result (ANY_SAMPLES_PASSED or ANY_SAMPLES_PASSED_- CONSERVATIVE) is 530 TRUE, such commands are not discarded. 531 532 Finally, replace all instances of "the GL" with "each active render GPU". 533 534Additions to Chapter 14 of the OpenGL 4.5 (Compatibility Profile) Specification 535(Fixed-Function Primitive Assembly and Rasterization) 536 537 Modify Section 14.3.1 (Multisampling) 538 539 Replace the following text: 540 541 The location for sample <i> is taken from v[2*(i-start)] and v[2*(i-start)+1]. 542 543 with the following: 544 545 These commands set the sample locations for all multicast GPUs in 546 MULTICAST_FRAMEBUFFER_PROGRAMMABLE_SAMPLE_LOCATIONS_NV. The location for sample <i> on 547 gpu <g> is taken from v[g*N+2*(i-start)] and v[g*N+2*(i-start)+1]. 548 549 Replace the following error generated by GetMultisamplefv: 550 551 An INVALID_ENUM error is generated if <pname> is not SAMPLE_LOCATION_ARB or 552 PROGRAMMABLE_SAMPLE_LOCATION_ARB. 553 554 with the following: 555 556 An INVALID_ENUM error is generated if <pname> is not SAMPLE_LOCATION_ARB, 557 PROGRAMMABLE_SAMPLE_LOCATION_ARB or MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV. 558 559 Add the following to the list of errors generated by GetMultisamplefv: 560 561 An INVALID_VALUE error is generated if <pname> is 562 MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_ARB and <index> is greater than or equal to the 563 value of PROGRAMMABLE_SAMPLE_LOCATION_TABLE_SIZE_ARB multiplied by the value of 564 MULTICAST_GPUS_NV. 565 566 Replace the following pseudocode (in both locations): 567 568 float *table = FRAMEBUFFER_PROGRAMMABLE_SAMPLE_LOCATIONS_ARB; 569 sample_location.xy = (table[2*sample_i], table[2*sample_i+1]); 570 571 with the following: 572 573 float *table = MULTICAST_FRAMEBUFFER_PROGRAMMABLE_SAMPLE_LOCATIONS_NV; 574 table += PROGRAMMABLE_SAMPLE_LOCATION_TABLE_SIZE_ARB * gpu; 575 sample_location.xy = (table[2*sample_i], table[2*sample_i+1]); 576 577Additions to the WGL/GLX/EGL/AGL Specifications 578 579 None 580 581Dependencies on ARB_sample_locations 582 583 If ARB_sample_locations is not supported, section 20.3.2 and any references to 584 MulticastFramebufferSampleLocationsfvNV and MULTICAST_PROGRAMMABLE_SAMPLE_LOCATION_NV should 585 be removed. The modifications to Section 14.3.1 (Multisampling) should also be removed. 586 587Dependencies on ARB_sparse_buffer 588 589 If ARB_sparse_buffer is not supported, any reference to SPARSE_STORAGE_BIT_ARB should be 590 removed. 591 592Interactions with EXT_bindable_uniform 593 594 When using the functionality of EXT_bindable_uniform and a per-GPU storage buffer is bound 595 to a bindable location in a program object, client uniform updates apply to all GPUs. 596 597 An INVALID_OPERATION is generated if a buffer with PER_GPU_STORAGE_BIT_NV is bound to a 598 program object's bindable location and GetUniformfv, GetUniformiv, GetUniformuiv or 599 GetUniformdv is called. 600 601Errors 602 603 Relaxation of INVALID_ENUM errors 604 --------------------------------- 605 GetBooleanv, GetIntegerv, GetInteger64v, GetFloatv, and GetDoublev now accept new tokens as 606 described in the "New Tokens" section. 607 608New State 609 610 Additions to Table 23.4 Rasterization 611 Initial 612 Get Value Type Get Command Value Description Sec. Attribute 613 -------------------------- ------ ----------- ----- ----------------------- ---- --------- 614 RENDER_GPU_MASK_NV Z+ GetIntegerv * Mask of GPUs that have 20.1 - 615 writes enabled 616 * See section 20.1 617 618 Additions to Table 23.19 Textures (state per texture object) 619 620 Initial 621 Get Value Type Get Command Value Description Sec. 622 --------- ---- ----------- ------- ----------- ---- 623 PER_GPU_STORAGE_NV B GetTexParameter FALSE Per-GPU storage requested 20.3 624 625 626 Additions to Table 23.30 Framebuffer (state per framebuffer object) 627 628 Get Value Get Command Type Initial Value Description Sec. Attribute 629 --------- ----------- ---- ------------- ----------- ---- --------- 630 MULTICAST_PROGRAMMABLE_- GetMultisamplefv * (0.5,0.5) Programmable sample 20.3.2 - 631 SAMPLE_LOCATION_NV 632 633 * The type here is "2* x n x 2 x R[0,1]" which is is equivalent to PROGRAMMABLE_SAMPLE_LOCATION_ARB 634 but with samples locations for all multicast GPUs (one after the other). 635 636New Implementation Dependent State 637 638 Add to Table 23.82, Implementation-Dependent Values, p. 784 639 640 Minimum 641 Get Value Type Get Command Value Description Sec. Attribute 642 ---------------------------- ------ ------------- ----- ---------------------- ---- --------- 643 MULTICAST_GPUS_NV Z+ GetIntegerv 1 Number of linked GPUs 20.0 - 644 usable for multicast 645 646Backwards Compatibility 647 648 This extension replaces NVX_linked_gpu_multicast. The enumerant values for MULTICAST_GPUS_NV 649 and PER_GPU_STORAGE_BIT_NV match those of MAX_LGPU_GPUS_NVX and LGPU_SEPARATE_STORAGE_BIT_NVX 650 (respectively). MulticastBufferSubDataNV, MulticastCopyImageSubDataNV and MulticastBarrierNV 651 behave analog to LGPUNamedBufferSubDataNVX, LGPUCopyImageSubDataNVX and LGPUInterlockNVX 652 (respectively). 653 654Sample Code 655 656 Binocular stereo rendering example using NV_gpu_multicast with single GPU fallback: 657 658 struct ViewData { 659 GLint viewport_index; 660 GLfloat mvp[16]; 661 GLfloat modelview[16]; 662 }; 663 ViewData leftViewData = { 0, {...}, {...} }; 664 ViewData rightViewData = { 1, {...}, {...} }; 665 666 GLuint ubo[2]; 667 glCreateBuffers(2, &ubo[0]); 668 669 if (has_NV_gpu_multicast) { 670 glNamedBufferStorage(ubo[0], size, NULL, GL_PER_GPU_STORAGE_BIT_NV | GL_DYNAMIC_STORAGE_BIT); 671 glMulticastBufferSubDataNV(0x1, ubo[0], 0, size, &leftViewData); 672 glMulticastBufferSubDataNV(0x2, ubo[0], 0, size, &rightViewData); 673 } else { 674 glNamedBufferStorage(ubo[0], size, &leftViewData, 0); 675 glNamedBufferStorage(ubo[1], size, &rightViewData, 0); 676 } 677 678 glViewportIndexedf(0, 0, 0, 640, 480); // left viewport 679 glViewportIndexedf(1, 640, 0, 640, 480); // right viewport 680 // Vertex shader sets gl_ViewportIndex according to viewport_index in UBO 681 682 glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); 683 684 if (has_NV_gpu_multicast) { 685 glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[0]); 686 drawScene(); 687 // Make GPU 1 wait for glClear above to complete on GPU 0 688 glMulticastWaitSyncNV(0, 0x2); 689 // Copy right viewport from GPU 1 to GPU 0 690 glMulticastCopyImageSubDataNV(1, 0x1, 691 renderBuffer, GL_RENDERBUFFER, 0, 640, 0, 0, 692 renderBuffer, GL_RENDERBUFFER, 0, 640, 0, 0, 693 640, 480, 1); 694 // Make GPU 0 wait for GPU 1 copy to GPU 0 695 glMulticastWaitSyncNV(1, 0x1); 696 } else { 697 glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[0]); 698 drawScene(); 699 glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[1]); 700 drawScene(); 701 } 702 // Both viewports are now present in GPU 0's renderbuffer 703 704Issues 705 706 (1) Should we provide explicit inter-GPU synchronization API? Will this make the implementation 707 easier or harder for the driver and applications? 708 709 RESOLVED. Yes. A naive implementation of implicit synchronization would simply synchronize the 710 GPUs before and after each copy. Smart implicit synchronization would have to track all APIs 711 that can modify buffers and textures, creating an excessive burden for driver implementation 712 and maintenance. An application can track dependencies more easily and outperform a naive 713 driver implementation using explicit synchronization. 714 715 (2) How does this extension interact with queries (e.g. occlusion queries)? 716 717 RESOLVED. Queries are performed separately on each GPU. The standard GetQueryObject* APIs 718 return query results for GPU 0 only. However GetQueryBufferObject* can be used to retrieve 719 query results for all GPUs through a buffer with separate storage (PER_GPU_STORAGE_BIT_NV). 720 721 (3) Are copy operations controlled by the render mask? 722 723 RESOLVED. Copies which write to the framebuffer are considered render commands and implicitly 724 controlled by the render mask. Copies between textures and buffers are not considered render 725 commands so they are not influenced by the mask. If masked copies are desired, use 726 MulticastCopyImageSubDataNV, MulticastCopyBufferSubDataNV or MulticastBlitFramebufferNV. 727 These commands explicitly specify the GPU source and destination and are not influenced by the 728 render mask. 729 730 (4) What happens if the MulticastCopyBufferSubDataNV source and destination buffer is the same? 731 732 RESOLVED. When the source and destination involve the same GPU, MulticastCopyBufferSubDataNV 733 matches the behavior of CopyBufferSubData: overlapped copies are not allowed and an 734 INVALID_VALUE error results. When the source and destination do not involve the same GPU, 735 overlapping copies are allowed and no error is generated. 736 737 (5) How does this extension interact with CopyTexImage2D? 738 739 RESOLVED. The behavior depends on the storage type of the target. See section 20.4. Since 740 CopyTexImage* sources from the framebuffer, the source always has per-GPU storage. 741 742 (6) Should we provide a mechanism to modify viewports independently for each GPU? 743 744 RESOLVED. No. This can be achieved using multicast UBOs and ARB_shader_viewport_layer_array. 745 746 (7) Should we add a present API that automatically displays content from a specific GPU? It 747 could abstract the transport mechanism, copying when necessary. 748 749 RESOLVED. No. Transfers should be avoided to maximize performance and minimize latency. 750 Minimizing transfers requires application awareness of display connectivity to assign 751 rendering appropriately. Hiding transfers behind an API would also prevent some interesting 752 multi-GPU rendering techniques (e.g. checkerboard-style split rendering). 753 754 WGL_NV_bridged_display can be used to enable display from multiple GPUs without copies. 755 756 (8) Should we expose the extension on single-GPU configurations? 757 758 RESOLVED. Yes, this is recommended. It allows more code sharing between multi-GPU and 759 single-GPU code paths. If there is only one GPU present MULTICAST_GPUS_NV will be 1. It 760 may also be 1 if explicit GPU control is unavailable (e.g. if the active multi-GPU rendering 761 mode prevents it). Note that in revisions 5 and prior of this extension the minimum for 762 MULTICAST_GPUS_NV was 2. 763 764 (9) Should glGet*BufferParameter* return the PER_GPU_STORAGE_BIT_NV bit when 765 BUFFER_STORAGE_FLAGS is queried? 766 767 RESOLVED. Yes. BUFFER_STORAGE_FLAGS must match the flags parameter input to *BufferStorage, as 768 specified in table 6.3. 769 770 (10) Can a query be complete/available on one GPU and not another? 771 772 RESOLVED. Yes. Independent query completion is important for conditional rendering. It 773 allows each GPU to begin conditional rendering in mode QUERY_WAIT without waiting on other 774 GPUs. 775 776 (11) How can custom texel data for be uploaded to each GPU for a given texture? 777 778 The easiest way is to create staging textures with the custom texel data and then copy it 779 to a texture with per-GPU storage using MulticastCopyImageSubDataNV. 780 781 (12) Should we allow the waitGpuMask in MulticastWaitSyncNV to include the signal GPU? 782 783 RESOLVED. No. There is no reason for a GPU to wait on itself. This is effectively a no-op in 784 the command stream. Furthermore it is easy to confuse GPU indices and masks, so it is 785 beneficial to explicitly generate an error in this case. 786 787 (13) Will support for NVX_linked_gpu_multicast continue? 788 789 RESOLVED. NVX_linked_gpu_multicast is deprecated and applications should switch to 790 NV_gpu_multicast. However, implementations are encouraged to continue supporting 791 NVX_linked_gpu_multicast for backwards compatibility. 792 793 (14) Does RenderGpuMaskNV work with immediate mode rendering? 794 795 RESOLVED. Yes, the render GPU mask applies to immediate mode rendering the same as other 796 rendering. Note that RenderGpuMaskNV is not one of the commands allowed between Begin and End 797 (see section 10.7.5) so the render mask must be set before Begin is called. 798 799Revision History 800 801 Rev. Date Author Changes 802 ---- -------- -------- ----------------------------------------------- 803 7 04/02/19 jschnarr clarify that the interactions with uniform APIs only apply to 804 EXT_bindable_uniform (not ARB_uniform_buffer_object). 805 optionally allow MulticastCopyBufferSubDataNV with buffers lacking 806 per-GPU storage 807 6 01/03/19 jschnarr reduce MULTICAST_GPUS_NV minimum to 1 808 clarify that MULTICAST_GPUS_NV is constant for a context 809 5 10/07/16 jschnarr trivial typo fix 810 4 07/21/16 mjk registered 811 3 06/15/16 jschnarr R370 release 812