1Name 2 3 EXT_shader_image_load_store 4 5Name Strings 6 7 GL_EXT_shader_image_load_store 8 9Contact 10 11 Jeff Bolz, NVIDIA Corporation (jbolz 'at' nvidia.com) 12 Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) 13 14Contributors 15 16 Barthold Lichtenbelt, NVIDIA 17 Bill Licea-Kane, AMD 18 Eric Werness, NVIDIA 19 Graham Sellers, AMD 20 Greg Roth, NVIDIA 21 Nick Haemel, AMD 22 Pierre Boudier, AMD 23 Piers Daniell, NVIDIA 24 25Status 26 27 Shipping. 28 29Version 30 31 Last Modified Date: 10/16/2013 32 NVIDIA Revision: 7 33 34Number 35 36 386 37 38Dependencies 39 40 This extension is written against the OpenGL 3.2 specification 41 (Compatibility Profile). 42 43 This extension is written against version 1.50 (revision 09) of the OpenGL 44 Shading Language Specification. 45 46 OpenGL 3.0 and GLSL 1.30 are required. 47 48 This extension interacts trivially with OpenGL 3.2 (Core Profile). 49 50 This extension interacts trivially with OpenGL 3.1, 51 ARB_uniform_buffer_object, and EXT_bindable_uniform. 52 53 This extension interacts trivially with ARB_draw_indirect. 54 55 This extension interacts trivially with NV_vertex_buffer_unified_memory. 56 57 This extension interacts trivially with OpenGL 3.2 and 58 ARB_texture_multisample. 59 60 This extension interacts trivially with OpenGL 4.0 and ARB_sample_shading. 61 62 This extension interacts trivially with OpenGL 4.0 and 63 ARB_texture_cube_map_array. 64 65 This extension interacts trivially with OpenGL 3.3 and 66 ARB_texture_rgb10_a2ui. 67 68 This extension interacts trivially with NV_shader_buffer_load. 69 70 This extension interacts trivially with OpenGL 4.0, ARB_gpu_shader5, and 71 NV_gpu_shader5. 72 73 This extension interacts trivially with OpenGL 4.0 and 74 ARB_tessellation_shader. 75 76 This extension interacts trivially with EXT_depth_bounds_test. 77 78 This extension interacts with EXT_separate_shader_objects. 79 80 This extension interacts with NV_gpu_program5. 81 82Overview 83 84 This extension provides GLSL built-in functions allowing shaders to load 85 from, store to, and perform atomic read-modify-write operations to a 86 single level of a texture object from any shader stage. These built-in 87 functions are named imageLoad(), imageStore(), and imageAtomic*(), 88 respectively, and accept integer texel coordinates to identify the texel 89 accessed. The extension adds the notion of "image units" to the OpenGL 90 API, to which texture levels are bound for access by the GLSL built-in 91 functions. To allow shaders to specify the image unit to access, GLSL 92 provides a new set of data types ("image*") similar to samplers. Each 93 image variable is assigned an integer value to identify an image unit to 94 access, which is specified using Uniform*() APIs in a manner similar to 95 samplers. For implementations supporting the NV_gpu_program5 extensions, 96 assembly language instructions to perform image loads, stores, and atomics 97 are also provided. 98 99 This extension also provides the capability to explicitly enable "early" 100 per-fragment tests, where operations like depth and stencil testing are 101 performed prior to fragment shader execution. In unextended OpenGL, 102 fragment shaders never have any side effects and implementations can 103 sometimes perform per-fragment tests and discard some fragments prior to 104 executing the fragment shader. Since this extension allows fragment 105 shaders to write to texture and buffer object memory using the built-in 106 image functions, such optimizations could lead to non-deterministic 107 results. To avoid this, implementations supporting this extension may not 108 perform such optimizations on shaders having such side effects. However, 109 enabling early per-fragment tests guarantees that such tests will be 110 performed prior to fragment shader execution, and ensures that image 111 stores and atomics will not be performed by fragment shader invocations 112 where these per-fragment tests fail. 113 114 Finally, this extension provides both a GLSL built-in function and an 115 OpenGL API function allowing applications some control over the ordering 116 of image loads, stores, and atomics relative to other OpenGL pipeline 117 operations accessing the same memory. Because the extension provides the 118 ability to perform random accesses to texture or buffer object memory, 119 such accesses are not easily tracked by the OpenGL driver. To avoid the 120 need for heavy-handed synchronization at the driver level, this extension 121 requires manual synchronization. The MemoryBarrierEXT() OpenGL API 122 function allows applications to specify a bitfield indicating the set of 123 OpenGL API operations to synchronize relative to shader memory access. 124 The memoryBarrier() GLSL built-in function provides a synchronization 125 point within a given shader invocation to ensure that all memory accesses 126 performed prior to the synchronization point complete prior to any started 127 after the synchronization point. 128 129New Procedures and Functions 130 131 void BindImageTextureEXT(uint index, uint texture, int level, 132 boolean layered, int layer, enum access, 133 int format); 134 135 void MemoryBarrierEXT(bitfield barriers); 136 137New Tokens 138 139 Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, 140 GetFloatv, and GetDoublev: 141 142 MAX_IMAGE_UNITS_EXT 0x8F38 143 MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS_EXT 0x8F39 144 MAX_IMAGE_SAMPLES_EXT 0x906D 145 146 Accepted by the <target> parameter of GetIntegeri_v and GetBooleani_v: 147 148 IMAGE_BINDING_NAME_EXT 0x8F3A 149 IMAGE_BINDING_LEVEL_EXT 0x8F3B 150 IMAGE_BINDING_LAYERED_EXT 0x8F3C 151 IMAGE_BINDING_LAYER_EXT 0x8F3D 152 IMAGE_BINDING_ACCESS_EXT 0x8F3E 153 IMAGE_BINDING_FORMAT_EXT 0x906E 154 155 Accepted by the <barriers> parameter of MemoryBarrierEXT: 156 157 VERTEX_ATTRIB_ARRAY_BARRIER_BIT_EXT 0x00000001 158 ELEMENT_ARRAY_BARRIER_BIT_EXT 0x00000002 159 UNIFORM_BARRIER_BIT_EXT 0x00000004 160 TEXTURE_FETCH_BARRIER_BIT_EXT 0x00000008 161 SHADER_IMAGE_ACCESS_BARRIER_BIT_EXT 0x00000020 162 COMMAND_BARRIER_BIT_EXT 0x00000040 163 PIXEL_BUFFER_BARRIER_BIT_EXT 0x00000080 164 TEXTURE_UPDATE_BARRIER_BIT_EXT 0x00000100 165 BUFFER_UPDATE_BARRIER_BIT_EXT 0x00000200 166 FRAMEBUFFER_BARRIER_BIT_EXT 0x00000400 167 TRANSFORM_FEEDBACK_BARRIER_BIT_EXT 0x00000800 168 ATOMIC_COUNTER_BARRIER_BIT_EXT 0x00001000 169 ALL_BARRIER_BITS_EXT 0xFFFFFFFF 170 171 Returned by the <type> parameter of GetActiveUniform: 172 173 IMAGE_1D_EXT 0x904C 174 IMAGE_2D_EXT 0x904D 175 IMAGE_3D_EXT 0x904E 176 IMAGE_2D_RECT_EXT 0x904F 177 IMAGE_CUBE_EXT 0x9050 178 IMAGE_BUFFER_EXT 0x9051 179 IMAGE_1D_ARRAY_EXT 0x9052 180 IMAGE_2D_ARRAY_EXT 0x9053 181 IMAGE_CUBE_MAP_ARRAY_EXT 0x9054 182 IMAGE_2D_MULTISAMPLE_EXT 0x9055 183 IMAGE_2D_MULTISAMPLE_ARRAY_EXT 0x9056 184 INT_IMAGE_1D_EXT 0x9057 185 INT_IMAGE_2D_EXT 0x9058 186 INT_IMAGE_3D_EXT 0x9059 187 INT_IMAGE_2D_RECT_EXT 0x905A 188 INT_IMAGE_CUBE_EXT 0x905B 189 INT_IMAGE_BUFFER_EXT 0x905C 190 INT_IMAGE_1D_ARRAY_EXT 0x905D 191 INT_IMAGE_2D_ARRAY_EXT 0x905E 192 INT_IMAGE_CUBE_MAP_ARRAY_EXT 0x905F 193 INT_IMAGE_2D_MULTISAMPLE_EXT 0x9060 194 INT_IMAGE_2D_MULTISAMPLE_ARRAY_EXT 0x9061 195 UNSIGNED_INT_IMAGE_1D_EXT 0x9062 196 UNSIGNED_INT_IMAGE_2D_EXT 0x9063 197 UNSIGNED_INT_IMAGE_3D_EXT 0x9064 198 UNSIGNED_INT_IMAGE_2D_RECT_EXT 0x9065 199 UNSIGNED_INT_IMAGE_CUBE_EXT 0x9066 200 UNSIGNED_INT_IMAGE_BUFFER_EXT 0x9067 201 UNSIGNED_INT_IMAGE_1D_ARRAY_EXT 0x9068 202 UNSIGNED_INT_IMAGE_2D_ARRAY_EXT 0x9069 203 UNSIGNED_INT_IMAGE_CUBE_MAP_ARRAY_EXT 0x906A 204 UNSIGNED_INT_IMAGE_2D_MULTISAMPLE_EXT 0x906B 205 UNSIGNED_INT_IMAGE_2D_MULTISAMPLE_ARRAY_EXT 0x906C 206 207 208Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification 209(Rasterization) 210 211 (Add new types to table 2.13, pp. 96-98) 212 213 Type Name Keyword 214 ------------------------------ ------------------------- 215 IMAGE_1D_EXT image1D 216 IMAGE_2D_EXT image2D 217 IMAGE_3D_EXT image3D 218 IMAGE_2D_RECT_EXT image2DRect 219 IMAGE_CUBE_EXT imageCube 220 IMAGE_BUFFER_EXT imageBuffer 221 IMAGE_1D_ARRAY_EXT image1DArray 222 IMAGE_2D_ARRAY_EXT image2DArray 223 IMAGE_CUBE_MAP_ARRAY_EXT imageCubeArray 224 IMAGE_2D_MULTISAMPLE_EXT image2DMS 225 IMAGE_2D_MULTISAMPLE_ARRAY_EXT image2DMSArray 226 INT_IMAGE_1D_EXT iimage1D 227 INT_IMAGE_2D_EXT iimage2D 228 INT_IMAGE_3D_EXT iimage3D 229 INT_IMAGE_2D_RECT_EXT iimage2DRect 230 INT_IMAGE_CUBE_EXT iimageCube 231 INT_IMAGE_BUFFER_EXT iimageBuffer 232 INT_IMAGE_1D_ARRAY_EXT iimage1DArray 233 INT_IMAGE_2D_ARRAY_EXT iimage2DArray 234 INT_IMAGE_CUBE_MAP_ARRAY_EXT iimageCubeArray 235 INT_IMAGE_2D_MULTISAMPLE_EXT iimage2DMS 236 INT_IMAGE_2D_MULTISAMPLE_ARRAY_EXT iimage2DMSArray 237 UNSIGNED_INT_IMAGE_1D_EXT uimage1D 238 UNSIGNED_INT_IMAGE_2D_EXT uimage2D 239 UNSIGNED_INT_IMAGE_3D_EXT uimage3D 240 UNSIGNED_INT_IMAGE_2D_RECT_EXT uimage2DRect 241 UNSIGNED_INT_IMAGE_CUBE_EXT uimageCube 242 UNSIGNED_INT_IMAGE_BUFFER_EXT uimageBuffer 243 UNSIGNED_INT_IMAGE_1D_ARRAY_EXT uimage1DArray 244 UNSIGNED_INT_IMAGE_2D_ARRAY_EXT uimage2DArray 245 UNSIGNED_INT_IMAGE_CUBE_MAP_ARRAY_EXT uimageCubeArray 246 UNSIGNED_INT_IMAGE_2D_MULTISAMPLE_EXT uimage2DMS 247 UNSIGNED_INT_IMAGE_2D_MULTISAMPLE_ARRAY_EXT uimage2DMSArray 248 249 250 (Add a new subsection after Section 2.14.5, Samplers, p. 106) 251 252 Section 2.14.X, Images 253 254 Images are special uniforms used in the OpenGL Shading Language to 255 identify a level of a texture to be read or written using image load, 256 store, and atomic built-in functions in the manner described in Section 257 3.9.X. The value of an image uniform is an integer specifying the image 258 unit accessed. Image units are numbered beginning at zero, and there is 259 an implementation-dependent number of available image units 260 (MAX_IMAGE_UNITS_EXT). The error INVALID_VALUE is generated if a 261 Uniform1i{v} call is used to set an image uniform to a value less than 262 zero or greater than or equal to MAX_IMAGE_UNITS_EXT. Note that image 263 units used for image variables are independent of the texture image 264 units used for sampler variables; the number of units provided by the 265 implementation may differ. Textures are bound independently and 266 separately to image and texture image units. 267 268 The type of an image variable must match the texture target of the image 269 currently bound to the image unit, otherwise the result of the load/ 270 store/atomic operation is undefined (see Section 4.1.X of the OpenGL 271 Shading Language specification for more detail). 272 273 The location of an image variable needs to be queried with 274 GetUniformLocation, just like any uniform variable. Image values need to 275 be set by calling Uniform1i{v}. Loading image variables with any of the 276 other Uniform entry point is not allowed and will result in an 277 INVALID_OPERATION error. 278 279 Unlike samplers, there is no limit on the number of active image variables 280 that may be used by a program or by any particular shader. However, given 281 that there is an implementation-dependent limit on the number of unique 282 image units, the actual number of images that may be used by all shaders 283 in a program is limited. 284 285 286 (Add a new subsection after Section 2.14.7, Shader Execution, p. 109) 287 288 Section 2.14.X, Shader Memory Access 289 290 Shaders may perform random-access reads and writes to texture or buffer 291 object memory using built-in image load, store, and atomic functions, as 292 described in the OpenGL Shading Language Specification. The ability to 293 perform such random-access reads and writes in system that may be highly 294 pipelined results in ordering and synchronization issues discussed in the 295 sections below. 296 297 298 Shader Memory Access Ordering 299 300 The order in which texture or buffer object memory is read or written by 301 shaders is largely undefined. For some shader types (vertex, tessellation 302 evaluation, and in some cases, fragment), the number of shader invocations 303 that might perform loads and stores is even undefined. In particular, the 304 following rules apply: 305 306 * While a vertex or tessellation evaluation shader will be executed at 307 least once for each unique vertex specified by the application (vertex 308 shaders) or generated by the tessellation primitive generator 309 (tessellation evaluation shaders), it may be executed more than once 310 for implementation-dependent reasons. Additionally, if the same 311 vertex is specified multiple times in a collection of primitives 312 (e.g., repeating an index in DrawElements), the vertex shader might be 313 run only once. 314 315 * For each fragment generated by the GL, the number of fragment shader 316 invocations depends on a number of factors. If the fragment fails the 317 pixel ownership test (Section 4.1.1), the fragment shader may not be 318 executed. Otherwise, if the framebuffer has no multisample buffer 319 (SAMPLE_BUFFERS is zero), the fragment shader will be invoked exactly 320 once. If the fragment shader specifies per-sample shading, the 321 fragment shader will be run once per covered sample. Otherwise, the 322 number of fragment shader invocations is undefined, but must be in the 323 range [1,<N>], where <N> is the number of samples covered by the 324 fragment. 325 326 * If a fragment shader is invoked to process fragments or samples not 327 covered by a primitive being rasterized to facilitate the 328 approximation of derivatives for texture lookups, stores and atomics 329 have no effect. 330 331 * The relative order of invocations of the same shader type are 332 undefined. A store issued by a shader when working on primitive B 333 might complete prior to a store for primitive A, even if primitive A 334 is specified prior to primitive B. This applies even to fragment 335 shaders; while fragment shader outputs are written to the framebuffer 336 in primitive order, stores executed by fragment shader invocations are 337 not. 338 339 * The relative order of invocations of different shader types is largely 340 undefined. However, when executing a shader whose inputs are 341 generated from a previous programmable stage, the shader invocations 342 from the previous stage are guaranteed to have executed far enough to 343 generate final values for all next-stage inputs. That implies shader 344 completion for all stages except geometry; geometry shaders are 345 guaranteed only to have executed far enough to emit all needed 346 vertices. 347 348 The above limitations on shader invocation order also make some forms of 349 synchronization between shader invocations within a single set of 350 primitives unimplementable. For example, having one invocation poll 351 memory written by another invocation assumes that the other invocation has 352 been launched and can complete its writes. The only case where such a 353 guarantee is made is when the inputs of one shader invocation are 354 generated from the outputs of a shader invocation in a previous stage. 355 356 Stores issued to different memory locations within a single shader 357 invocation may not be visible to other invocations in the order they were 358 performed. The built-in function memoryBarrier() may be used to provide 359 stronger ordering of reads and writes performed by a single invocation. 360 Calling memoryBarrier() guarantees that any memory transactions issued by 361 the shader invocation prior to the call complete prior to the memory 362 transactions issued after the call. Memory barriers may be needed for 363 algorithms that require multiple invocations to access the same memory and 364 require the operations need to be performed in a partially-defined 365 relative order. For example, if one shader invocation does a series of 366 writes, followed by a memoryBarrier() call, followed by another write, 367 then another invocation that sees the results of the final write will also 368 see the previous writes. Without the memory barrier, the final write may 369 be visible before the previous writes. 370 371 The atomic memory transaction built-in functions may be used to read and 372 write a given memory address atomically. While atomic built-in functions 373 issued by multiple shader invocations are executed in undefined order 374 relative to each other, these functions perform both a read and a write of 375 a memory address and guarantee that no other memory transaction will write 376 to the underlying memory between the read and write. Atomics allow 377 shaders to use shared global addresses for mutual exclusion or as 378 counters, among other uses. 379 380 381 Shader Memory Access Synchronization 382 383 Data written to textures or buffer objects by a shader invocation may 384 eventually be read by other shader invocations, sourced by other fixed 385 pipeline stages, or read back by the application. When applications write 386 to buffer objects or textures using API commands such as TexSubImage* or 387 BufferSubData, the GL implementation knows when and where writes occur and 388 can perform implicit synchronization to ensure that operations requested 389 before the update see the original data and that subsequent operations see 390 the modified data. Without logic to track the target address of each 391 shader instruction performing a store, automatic synchronization of stores 392 performed by a shader invocation would require the GL implementation to 393 make worst-case assumptions at significant performance cost. To permit 394 cases where textures or buffers may be read or written in different 395 pipeline stages without the overhead of automatic synchronization, buffer 396 object and texture stores performed by shaders are not automatically 397 synchronized with other GL operations using the same memory. 398 399 Explicit synchronization is required to ensure that the effects of buffer 400 and texture data stores performed by shaders will be visible to subsequent 401 operations using the same objects and will not overwrite data still to be 402 read by previously requested operations. Without manual synchronization, 403 shader stores for a "new" primitive may complete before processing of an 404 "old" primitive completes. Additionally, stores for an "old" primitive 405 might not be completed before processing of a "new" primitive starts. The 406 command 407 408 void MemoryBarrierEXT(bitfield barriers) 409 410 defines a barrier ordering the memory transactions issued prior to the 411 command relative to those issued after the barrier. For the purposes of 412 this ordering, memory transactions performed by shaders are considered to 413 be issued by the rendering command that triggered the execution of the 414 shader. <barriers> is a bitfield indicating the set of operations that 415 are synchronized with shader stores; the bits used in <barriers> are as 416 follows: 417 418 - VERTEX_ATTRIB_ARRAY_BARRIER_BIT_EXT: If set, vertex data sourced from 419 buffer objects after the barrier will reflect data written by shaders 420 prior to the barrier. The set of buffer objects affected by this bit 421 is derived from the buffer object bindings or GPU addresses used for 422 generic vertex attributes (VERTEX_ATTRIB_ARRAY_BUFFER bindings, 423 VERTEX_ATTRIB_ARRAY_ADDRESS from NV_vertex_buffer_unified_memory), as 424 well as those for arrays of named vertex attributes (e.g., vertex, 425 color, normal). 426 427 - ELEMENT_ARRAY_BARRIER_BIT_EXT: If set, vertex array indices sourced from 428 buffer objects after the barrier will reflect data written by shaders 429 prior to the barrier. The buffer objects affected by this bit are 430 derived from the ELEMENT_ARRAY_BUFFER binding and the 431 NV_vertex_buffer_unified_memory ELEMENT_ARRAY_ADDRESS address. 432 433 - UNIFORM_BARRIER_BIT_EXT: Shader uniforms and assembly program parameters 434 sourced from buffer objects after the barrier will reflect data 435 written by shaders prior to the barrier. 436 437 - TEXTURE_FETCH_BARRIER_BIT_EXT: Texture fetches from shaders, including 438 fetches from buffer object memory via buffer textures, after the 439 barrier will reflect data written by shaders prior to the barrier. 440 441 - SHADER_IMAGE_ACCESS_BARRIER_BIT_EXT: Memory accesses using shader image 442 load, store, and atomic built-in functions issued after the barrier 443 will reflect data written by shaders prior to the barrier. 444 Additionally, image stores and atomics issued after the barrier will 445 not execute until all memory accesses (e.g., loads, stores, texture 446 fetches, vertex fetches) initiated prior to the barrier complete. 447 448 - COMMAND_BARRIER_BIT_EXT: Command data sourced from buffer objects by 449 Draw*Indirect commands after the barrier will reflect data written by 450 shaders prior to the barrier. The buffer objects affected by this bit 451 are derived from the DRAW_INDIRECT_BUFFER_EXT binding and the GPU 452 address DRAW_INDIRECT_ADDRESS_EXT. 453 454 - PIXEL_BUFFER_BARRIER_BIT_EXT: Reads/writes of buffer objects via the 455 PACK/UNPACK_BUFFER bindings (ReadPixels, TexSubImage, etc.) after the 456 barrier will reflect data written by shaders prior to the barrier. 457 Additionally, buffer object writes issued after the barrier will wait 458 on the completion of all shader writes initiated prior to the barrier. 459 460 - TEXTURE_UPDATE_BARRIER_BIT_EXT: Writes to a texture via Tex(Sub)Image*, 461 CopyTex(Sub)Image*, CompressedTex(Sub)Image*, and reads via 462 GetTexImage after the barrier will reflect data written by shaders 463 prior to the barrier. Additionally, texture writes from these 464 commands issued after the barrier will not execute until all shader 465 writes initiated prior to the barrier complete. 466 467 - BUFFER_UPDATE_BARRIER_BIT_EXT: Reads/writes via Buffer(Sub)Data, 468 MapBuffer(Range), CopyBufferSubData, ProgramBufferParameters, and 469 GetBufferSubData after the barrier will reflect data written by 470 shaders prior to the barrier. Additionally, writes via these commands 471 issued after the barrier will wait on the completion of all shader 472 writes initiated prior to the barrier. 473 474 - FRAMEBUFFER_BARRIER_BIT_EXT: Reads and writes via framebuffer object 475 attachments after the barrier will reflect data written by shaders 476 prior to the barrier. Additionally, framebuffer writes issued after 477 the barrier will wait on the completion of all shader writes issued 478 prior to the barrier. 479 480 - TRANSFORM_FEEDBACK_BARRIER_BIT_EXT: Writes via transform feedback 481 bindings after the barrier will reflect data written by shaders prior 482 to the barrier. Additionally, transform feedback writes issued after 483 the barrier will wait on the completion of all shader writes issued 484 prior to the barrier. 485 486 - ATOMIC_COUNTER_BARRIER_BIT_EXT: Accesses to atomic counters after the 487 barrier will reflect writes prior to the barrier. 488 489 If <barriers> is ALL_BARRIER_BITS_EXT, shader memory accesses will be 490 synchronized relative to all the operations described above. 491 492 Implementations may cache buffer object and texture image memory that 493 could be written by shaders in multiple caches; for example, there may be 494 separate caches for texture, vertex fetching, and one or more caches for 495 shader memory accesses. Implementations are not required to keep these 496 caches coherent with shader memory writes. Stores issued by one 497 invocation may not be immediately observable by other pipeline stages or 498 other shader invocations because the value stored may remain in a cache 499 local to the processor executing the store, or because data overwritten by 500 the store is still in a cache elsewhere in the system. When MemoryBarrier 501 is called, the GL flushes and/or invalidates any caches relevant to the 502 operations specified by the <barriers> parameter to ensure consistent 503 ordering of operations across the barrier. 504 505 To allow for independent shader invocations to communicate by reads and 506 writes to a common memory address, image variables in the OpenGL Shading 507 Language may be declared as "coherent". Buffer object or texture image 508 memory accessed through such variables may be cached only if caches are 509 automatically updated due to stores issued by any other shader invocation. 510 If the same address is accessed using both coherent and non-coherent 511 variables, the accesses using variables declared as coherent will observe 512 the results stored using coherent variables in other invocations. Using 513 variables declared as "coherent" guarantees only that the results of 514 stores will be immediately visible to shader invocations using 515 similarly-declared variables; calling MemoryBarrier is required to ensure 516 that the stores are visible to other operations. 517 518 The following guidelines may be helpful in choosing when to use coherent 519 memory accesses and when to use barriers. 520 521 - Data that are read-only or constant may be accessed without using 522 coherent variables or calling MemoryBarrierEXT(). Updates to the 523 read-only data via API calls such as BufferSubData will invalidate 524 shader caches implicitly as required. 525 526 - Data that are shared between shader invocations at a fine granularity 527 (e.g., written by one invocation, consumed by another invocation) should 528 use coherent variables to read and write the shared data. 529 530 - Data written by one shader invocation and consumed by other shader 531 invocations launched as a result of its execution ("dependent 532 invocations") should use coherent variables in the producing shader 533 invocation and call memoryBarrier() after the last write. The consuming 534 shader invocation should also use coherent variables. 535 536 - Data written to image variables in one rendering pass and read by the 537 shader in a later pass need not use coherent variables or 538 memoryBarrier(). Calling MemoryBarrierEXT() with the 539 SHADER_IMAGE_ACCESS_BARRIER_BIT_EXT set in <barriers> between passes is 540 necessary. 541 542 - Data written by the shader in one rendering pass and read by another 543 mechanism (e.g., vertex or index buffer pulling) in a later pass need 544 not use coherent variables or memoryBarrier(). Calling 545 MemoryBarrierEXT() with the appropriate bits set in <barriers> between 546 passes is necessary. 547 548 549Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification 550(Rasterization) 551 552 (insert new section immediately before Section 3.8, Texturing, p. 210) 553 554 Section 3.X, Early Per-Fragment Tests 555 556 Once fragments are produced by rasterization (sections 3.4 through 3.8), a 557 number of per-fragment operations may be performed prior to fragment 558 shader execution. If a fragment is discarded during any of these 559 operations, it will not be processed by any subsequent stage, including 560 fragment shader execution. 561 562 Up to six operations are performed on each fragment, in the following 563 order: 564 565 * the pixel ownership test, described in section 4.1.1; 566 567 * the scissor test, described in section 4.1.2; 568 569 * the depth bounds test, described in section 4.1.X (of the 570 EXT_depth_bounds_test specification); 571 572 * the stencil test, described in section 4.1.5; 573 574 * the depth buffer test, described in section 4.1.6; and 575 576 * occlusion query sample counting, described in section 4.1.7. 577 578 The pixel ownership and scissor tests are always performed. 579 580 The other operations are performed if and only if early fragment tests are 581 enabled in the active fragment shader (section 3.12.2). When early 582 per-fragment operations are enabled, the depth bounds test, stencil test, 583 depth buffer test, and occlusion query sample counting operations are 584 performed prior to fragment shader execution, and the stencil buffer, 585 depth buffer, and occlusion query sample counts will be updated 586 accordingly. When early per-fragment operations are enabled, these 587 operations will not be performed again after fragment shader execution. 588 When there is no active program, the active program has no fragment 589 shader, or the active program was linked with early fragment tests 590 disabled, these operations are performed only after fragment program 591 execution, in the order described in chapter 4. 592 593 If early fragment tests are enabled, any depth value computed by the 594 fragment shader has no effect. Additionally, the depth buffer, stencil 595 buffer, and occlusion query sample counts may be updated even for 596 fragments or samples that would be discarded after fragment shader 597 execution due to per-fragment operations such as alpha-to-coverage or 598 alpha tests. 599 600 601 (Add new section after Section 3.9.19, Texture Application, p. 268) 602 603 Section 3.9.X, Texture Image Loads and Stores 604 605 The contents of a texture may be made available for shaders to read and 606 write by binding the texture to one of a collection of image units. The 607 GL implementation provides an array of image units numbered beginning with 608 zero, with the total number of image units provided given by the 609 implementation-dependent constant MAX_IMAGE_UNITS_EXT. Unlike texture 610 image units, image units do not have a separate attachment for each 611 texture target texture; each image unit may have only one texture bound at 612 a time. 613 614 A texture may be bound to an image unit for use by image loads and stores 615 by calling: 616 617 void BindImageTextureEXT(uint index, uint texture, int level, 618 boolean layered, int layer, enum access, 619 int format); 620 621 where <index> identifies the image unit, <texture> is the name of the 622 texture, and <level> selects a single level of the texture. If <texture> 623 is zero, <level> is ignored and the currently bound texture to image unit 624 <index> is unbound. If <index> is less than zero or greater than or equal 625 to MAX_IMAGE_UNITS_EXT, or if <texture> is not the name of an existing 626 texture object, the error INVALID_VALUE is generated. 627 628 If the texture identified by <texture> is a one-dimensional array, 629 two-dimensional array, three-dimensional, cube map, cube map array, or 630 two-dimensional multisample array texture, it is possible to bind either 631 the entire texture level or a single layer or face of the texture level. 632 If <layered> is TRUE, the entire level is bound. If <layered> is FALSE, 633 only the single layer identified by <layer> will be bound. When <layered> 634 is FALSE, the single bound layer is treated as a different texture target 635 for image accesses: 636 637 * one-dimensional array texture layers are treated as one-dimensional 638 textures; 639 640 * two-dimensional array, three-dimensional, cube map, cube map array 641 texture layers are treated as two-dimensional textures; and 642 643 * two-dimensional multisample array textures are treated as 644 two-dimensional multisample textures. 645 646 For cube map textures where <layered> is FALSE, the face is taken by 647 mapping the layer number to a face according to table 4.13. For cube map 648 array textures where <layered> is FALSE, the selected layer number is 649 mapped to a texture layer and cube face using the following equations and 650 mapping <face> to a face according to table 4.13. 651 652 layer = floor(layer_orig / 6) 653 face = layer_orig - (layer * 6) 654 655 <format> specifies the format that the elements of the image will be 656 treated as when doing formatted stores, as described later in this 657 section. This is referred to as the "image unit format". This must be one 658 of the formats listed in Table X.2, otherwise the error INVALID_VALUE is 659 generated. 660 661 <access> specifies whether the texture bound to the image will be treated 662 as READ_ONLY, WRITE_ONLY, or READ_WRITE. If a shader reads from an image 663 unit with a texture bound as WRITE_ONLY, or writes to an image unit with a 664 texture bound as READ_ONLY, the results of that shader operation are 665 undefined and may lead to application termination. 666 667 If a texture object bound to one or more image units is deleted by 668 DeleteTextures, it is detached from each such image unit, as though 669 BindImageTextureEXT were called with <index> identifying the image unit and 670 <texture> set to zero. 671 672 When a shader accesses the texture bound to an image unit using a built-in 673 image load, store, or atomic function, it identifies a single texel by 674 providing a one-, two-, or three-dimensional coordinate. Multisample 675 texture accesses also specify a sample number. A coordinate vector is 676 mapped to an individual texel tau_i, tau_i_j, or tau_i_j_k according to 677 the target of the texture bound to the image unit using Table X.1. As 678 noted above, single-layer bindings of array or cube map textures are 679 considered to use a texture target corresponding to the bound layer, 680 rather than that of the full texture. 681 682 face/ 683 i j k layer 684 -- -- -- ----- 685 TEXTURE_1D x - - - 686 TEXTURE_2D x y - - 687 TEXTURE_3D x y z - 688 TEXTURE_RECTANGLE x y - - 689 TEXTURE_CUBE_MAP x y - z 690 TEXTURE_BUFFER x - - - 691 TEXTURE_1D_ARRAY x - - y 692 TEXTURE_2D_ARRAY x y - z 693 TEXTURE_CUBE_MAP_ARRAY x y - z 694 TEXTURE_2D_MULTISAMPLE x y - - 695 TEXTURE_2D_MULTISAMPLE_ARRAY x y - z 696 697 Table X.1, Mapping of image load, store, and atomic texel coordinate 698 components to texel numbers. 699 700 If the texture target has layers or cube map faces, the layer or face 701 number is taken from the <layer> argument of BindImageTextureEXT if the 702 texture is bound with <layered> set to FALSE, or from the coordinate 703 identified by Table X.1 otherwise. For cube map and cube map array 704 textures with <layered> set to TRUE, the coordinate is mapped to a layer 705 and face in the same manner as the <layer> argument of 706 BindImageTextureEXT. 707 708 If the individual texel identified for an image load, store, or atomic 709 operation doesn't exist, the access is treated as invalid. Invalid image 710 loads will return zero. Invalid image stores will have no effect. 711 Invalid image atomics will not update any texture bound to the image unit 712 and will return zero. An access is considered invalid if: 713 714 * no texture is bound to the selected image unit; 715 716 * the texture bound to the selected image unit is incomplete; 717 718 * the texture level bound to the image unit is less than the base 719 level or greater than the maximum level of the texture; 720 721 * the texture bound to the image unit is bordered; 722 723 * the internal format of the texture bound to the image unit is not 724 found in Table X.2; 725 726 * the internal format of the texture is incompatible with the specified 727 <format> according to Table X.2. 728 729 * the texture bound to the image unit has layers, is bound with 730 <layered> set to TRUE, and the selected layer or cube map face doesn't 731 exist; 732 733 * the selected texel tau_i, tau_i_j, or tau_i_j_k doesn't exist; 734 735 * the <x>, <y>, or <z> coordinate is not listed in the selected row of 736 Table X.1 and is non-zero; or 737 738 * the texture bound to the image unit has layers, is bound with 739 <layered> set to FALSE, and the corresponding coordinate in the 740 face/layer column of Table X.1 is non-zero. 741 742 * the image has more samples than the implementation-dependent value of 743 MAX_IMAGE_SAMPLES_EXT. 744 745 * the access is a load and the format is not compatible with the 746 "size" layout qualifier of the image uniform. 747 748 For textures with multiple samples per texel, the sample selected for an 749 image load, store, or atomic is undefined if the <sample> coordinate is 750 negative or greater than or equal to the number of samples in the 751 texture. 752 753 If a shader performs an image load, store, or atomic operation using an 754 image variable declared as an array, and if the index used to select an 755 individual out of bounds is negative or greater than or equal to the size 756 of the array, the results of the operation are undefined but may not lead 757 to termination. 758 759 Accesses to textures bound to image units do format conversions based on 760 the <format> argument specified when the image is bound. Loads always 761 return a value as a vec4, ivec4, or uvec4, and stores always take the 762 source data as a vec4, ivec4, or uvec4. Data is converted to/from the 763 specified format as if it were passed through a TexImage2D or GetTexImage 764 command with <format> and <type> as RGBA and FLOAT for vec4 data, with 765 <format> and <type> as RGBA_INTEGER and INT for ivec4 data, or with 766 <format> and <type> as RGBA_INTEGER and UNSIGNED_INT for uvec4 data. 767 Unused components are filled in with (0,0,0,1) (where "1" is either a 768 float or integer depending on the format). 769 770 The formats that are supported for image loads are dependent on the 771 layout(size*) qualifier of the image uniform. The following formats 772 are supported for image loads: 773 774 - size1x8: R8I, R8UI 775 - size1x16: R16I, R16UI 776 - size1x32: R32F, R32I, R32UI 777 - size2x32: RG32F, RG32I, RG32UI 778 - size4x32: RGBA32F, RGBA32I, RGBA32UI 779 780 Image stores support all formats in Table X.2. 781 782 Table X.2 specifies how each format is stored in memory, which must be 783 made explicit because a single image can be viewed with multiple formats 784 according to the <format> argument. The "R", "G", "B", and "A" columns 785 indicate which bits of which 32-bit word correspond to that component. 786 For example, an entry of "1[15:0]" indicates that the selected component 787 uses sixteen bits with its most significant bit in bit 15 of the second 788 word of memory and its least significant bit in bit 0. Floating-point 789 textures with 32-bit components are stored using the IEEE standard 790 representation; textures with 10-, 11-, or 16-bit floating-point 791 components are stored according to Sections 2.1.2 and 2.1.3. 792 793 The "equivalence" column of Table X.2 defines a set of equivalence 794 classes for formats, such that if the internal format of a texture level 795 is in the same equivalence class as the <format> argument to 796 BindImageTextureEXT then the image may be viewed with that format. 797 Otherwise, the access is considered invalid as described above. 798 799 Internal format Equivalence R G B A 800 --------------- ----------- ------- ------- ------- ------- 801 RGBA32F 4x32 0[31:0] 1[31:0] 2[31:0] 3[31:0] 802 RGBA16F 2x32 0[15:0] 0[31:16] 1[15:0] 1[31:16] 803 RG32F 2x32 0[31:0] 1[31:0] 804 RG16F 1x32 0[15:0] 0[31:16] 805 R11F_G11F_B10F 1x32 0[10:0] 0[21:11] 0[31:22] 806 R32F 1x32 0[31:0] 807 R16F 1x16 0[15:0] 808 809 RGBA32UI 4x32 0[31:0] 1[31:0] 2[31:0] 3[31:0] 810 RGBA16UI 2x32 0[15:0] 0[31:16] 1[15:0] 1[31:16] 811 RGB10_A2UI 1x32 0[9:0] 0[19:10] 0[29:20] 0[31:30] 812 RGBA8UI 1x32 0[7:0] 0[15:8] 0[23:16] 0[31:24] 813 RG32UI 2x32 0[31:0] 1[31:0] 814 RG16UI 1x32 0[15:0] 0[31:16] 815 RG8UI 1x16 0[7:0] 0[15:8] 816 R32UI 1x32 0[31:0] 817 R16UI 1x16 0[15:0] 818 R8UI 1x8 0[7:0] 819 820 RGBA32I 4x32 0[31:0] 1[31:0] 2[31:0] 3[31:0] 821 RGBA16I 2x32 0[15:0] 0[31:16] 1[15:0] 1[31:16] 822 RGBA8I 1x32 0[7:0] 0[15:8] 0[23:16] 0[31:24] 823 RG32I 2x32 0[31:0] 1[31:0] 824 RG16I 1x32 0[15:0] 0[31:16] 825 RG8I 1x16 0[7:0] 0[15:8] 826 R32I 1x32 0[31:0] 827 R16I 1x16 0[15:0] 828 R8I 1x8 0[7:0] 829 830 RGBA16 2x32 0[15:0] 0[31:16] 1[15:0] 1[31:16] 831 RGB10_A2 1x32 0[9:0] 0[19:10] 0[29:20] 0[31:30] 832 RGBA8 1x32 0[7:0] 0[15:8] 0[23:16] 0[31:24] 833 RG16 1x32 0[15:0] 0[31:16] 834 RG8 1x16 0[7:0] 0[15:8] 835 R16 1x16 0[15:0] 836 R8 1x8 0[7:0] 837 838 RGBA16_SNORM 2x32 0[15:0] 0[31:16] 1[15:0] 1[31:16] 839 RGBA8_SNORM 1x32 0[7:0] 0[15:8] 0[23:16] 0[31:24] 840 RG16_SNORM 1x32 0[15:0] 0[31:16] 841 RG8_SNORM 1x16 0[7:0] 0[15:8] 842 R16_SNORM 1x16 0[15:0] 843 R8_SNORM 1x8 0[7:0] 844 845 Table X.2, Supported texture formats, component packing, and 846 equivalence classes for formatted image accesses. 847 848 Implementations may support a limited combined number of image units and 849 active fragment shader outputs (section 4.2.1). A link error will be 850 generated if the number of active image uniforms used in all shaders and 851 the number of active fragment shader outputs exceeds the implementation- 852 dependent value (MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS_EXT). 853 854 855 Modify Section 3.12.2, Shader Execution, p. 274 856 857 (add new unnumbered subsection section at the end of the section, p. 279) 858 859 Early Fragment Tests 860 861 An explicit control is provided to allow fragment shaders to enable early 862 fragment tests. If the fragment shader specifies the 863 "early_fragment_tests" layout qualifier, the per-fragment tests described 864 in Section 3.X will be performed prior to fragment shader execution. 865 Otherwise, they will be performed after fragment shader execution. 866 867 868Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification 869(Per-Fragment Operations and the Framebuffer) 870 871 None. 872 873 874Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification 875(Special Functions) 876 877 Modify Section 5.4.1, Commands Not Usable In Display Lists (p. 358) 878 879 (add "MemoryBarrierEXT" to the list of commands not allowed in a display 880 list, in the "Buffer objects" paragraph) 881 882 883Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification 884(State and State Requests) 885 886 None. 887 888 889New Implementation Dependent State 890 891 Minimum 892 Get Value Type Get Command Value Description Sec. Attrib 893 --------- ---- ----------- ------- ----------- ---- ------ 894 MAX_IMAGE_UNITS_EXT Z+ GetIntegerv 8 number of units for 3.9.X - 895 image load/store/atom 896 MAX_COMBINED_IMAGE_UNITS_ Z+ GetIntegerv 8 limit on active image 3.9.X - 897 AND_FRAGMENT_OUTPUTS_EXT units + fragment outputs 898 MAX_IMAGE_SAMPLES_EXT Z GetIntegerv 0 max allowed samples 3.9.X - 899 for a texture level 900 bound to an image unit 901 902New State 903 904 Add a new Table 6.X, Image Stage (state per image unit) 905 906 Get Value Type Get Command Initial Value Sec Attribute 907 --------- ---- ----------- ------------- --- --------- 908 IMAGE_BINDING_NAME_EXT 8*xZ+ GetIntegeri_v 0 3.9.X none 909 IMAGE_BINDING_LEVEL_EXT 8*xZ+ GetIntegeri_v 0 3.9.X none 910 IMAGE_BINDING_LAYERED_EXT 8*xB GetBooleani_v FALSE 3.9.X none 911 IMAGE_BINDING_LAYER_EXT 8*xZ+ GetIntegeri_v 0 3.9.X none 912 IMAGE_BINDING_ACCESS_EXT 8*xZ3 GetIntegeri_v READ_ONLY 3.9.X none 913 IMAGE_BINDING_FORMAT_EXT 8*xZ+ GetIntegeri_v R8 3.9.X none 914 915 916Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile) 917Specification (Invariance) 918 919 None. 920 921 922Additions to the AGL/GLX/WGL Specifications 923 924 None. 925 926 927GLX Protocol 928 929 !!! TBD !!! 930 931 932Modifications to the OpenGL Shading Language Specification, Version 1.50 933 934 Including the following line in a shader can be used to control the 935 language features described in this extension: 936 937 #extension GL_EXT_shader_image_load_store : <behavior> 938 939 where <behavior> is as specified in section 3.3. 940 941 New preprocessor #defines are added to the OpenGL Shading Language: 942 943 #define GL_EXT_shader_image_load_store 1 944 945 946 Modify Section 3.6, Keywords, p. 14 947 948 (add the following to the list of keywords, p. 14) 949 950 coherent 951 volatile 952 restrict 953 954 image1D iimage1D uimage1D 955 image2D iimage2D uimage2D 956 image3D iimage3D uimage3D 957 image2DRect iimage2DRect uimage2DRect 958 imageCube iimageCube uimageCube 959 imageBuffer iimageBuffer uimageBuffer 960 image1DArray iimage1DArray uimage1DArray 961 image2DArray iimage2DArray uimage2DArray 962 imageCubeArray iimageCubeArray uimageCubeArray 963 image2DMS iimage2DMS uimage2DMS 964 image2DMSArray iimage2DMSArray uimage2DMSArray 965 966 (remove from the list of reserved keywords, p. 15) 967 968 volatile 969 970 971 (Insert a new section immediately after Section 4.1.7, Samplers, p. 23) 972 973 Section 4.1.X, Images 974 975 Like samplers, images are opaque handles to one-, two-, or 976 three-dimensional images corresponding to all or a portion of a single 977 level of a texture image bound to an image unit. There are distinct 978 image variable types for each texture target, and for each of float, 979 integer, and unsigned integer data types. Image accesses should use 980 an image type that matches the target of the texture whose level is 981 bound to the image unit, or for non-layered bindings of 3D or array 982 images should use the image type that matches the dimensionality of 983 the layer of the image (i.e. a layer of 3D, 2DArray, Cube, or 984 CubeArray should use image2D, a layer of 1DArray should use image1D, 985 and a layer of 2DMSArray should use image2DMS). If the image target type 986 does not match the bound image in this manner, if the data type does not 987 match the bound image, or if the "size" layout qualifier does not match 988 the image unit format as described in Section 3.9.X of the OpenGL 989 Specification, the results of image accesses are undefined but may not 990 include program termination. 991 992 Image variables are used in the image load, store, and atomic functions 993 described in Section 8.X, "Image Functions" to specify an image to access. 994 They can only be declared as function parameters or uniform variables (see 995 Section 4.3.5 "Uniform"). Except for array indexing, structure field 996 selection, and parentheses, images are not allowed to be operands in 997 expressions. Images may be aggregated into arrays within a shader (using 998 square brackets [ ]) and can be indexed with general integer expressions. 999 The results of accessing an image array with an out-of-bounds index are 1000 undefined. Images cannot be treated as l-values; hence, they cannot be 1001 used as out or inout function parameters, nor can they be assigned into. 1002 As uniforms, they are initialized only with the OpenGL API; they cannot be 1003 declared with an initializer in a shader. As function parameters, images 1004 may only be passed to samplers of matching type. 1005 1006 1007 Modify Section 4.3, Storage Qualifiers, p. 29 1008 1009 (add new qualifiers to the first table, p. 29) 1010 1011 Qualifier Meaning 1012 ------------ ------------------------------------------------- 1013 coherent memory variable where reads and writes are coherent 1014 with reads and writes from other shader invocations 1015 1016 volatile memory variable whose underlying value may be 1017 changed at any point during shader execution by 1018 some source other than the current shader invocation 1019 1020 restrict memory variable where use of that variable is the 1021 only way to read and write the underlying memory 1022 in the relevant shader stage 1023 1024 1025 Modify Section 4.3.2, Constant Qualifier (p. 30) 1026 1027 (add after last paragraph of section) 1028 1029 Because image variables can not be built from constant expressions, the 1030 "const" qualifier may not be used to create a compile-time constant image 1031 variable. However, the "const" qualifier may be used to declare image 1032 variables whose image data are treated as constant, as described in 1033 Section 4.3.X. 1034 1035 1036 Modify Section 4.3.8.1 (Input Layout Qualifiers), p. 39 1037 1038 Remove "only" from the sentence: 1039 1040 Fragment shaders can have an input layout only for redeclaring the 1041 built-in variable gl_FragCoord... 1042 1043 Add to the end of the section: 1044 1045 Fragment shaders also allow an input layout qualifier on the qualifier 1046 "in". The only valid layout qualifier is: 1047 1048 layout-qualifier-id 1049 early_fragment_tests 1050 1051 to indicate that fragment tests will be performed before fragment shader 1052 execution, as described in Section 3.12.2 of the OpenGL Specification. 1053 For example, 1054 1055 layout(early_fragment_tests) in; 1056 1057 1058 (Insert immediately after Section 4.3.8.3, Uniform Block Layout 1059 Qualifiers, p. 40) 1060 1061 Section 4.3.8.X, Image Qualifiers 1062 1063 Layout qualifiers can be used for image variable declarations. The layout 1064 qualifier identifiers for image variable declarations are 1065 1066 layout-qualifier-id 1067 size1x8 1068 size1x16 1069 size1x32 1070 size2x32 1071 size4x32 1072 1073 The "size" identifiers indicate the set of image formats that the image 1074 variable can be used to access. Only one "size" identifier may be 1075 specified for any variable declaration. A layout of "size1x8" is illegal 1076 for image variables associated with floating-point data types. 1077 1078 All image variable declarations, including function parameter 1079 declarations, must specify a "size" layout qualifier. It is an error to 1080 declare an image uniform variable or function parameter without a size 1081 qualifier. 1082 1083 1084 (Insert immediately after Section 4.3.9, Interpolation, p. 42) 1085 1086 Section 4.3.X, Memory Access Qualifiers 1087 1088 The "coherent", "volatile", "restrict", and "const" storage qualifiers can 1089 be specified in image variable declarations to control memory accesses 1090 using the declared variables. 1091 1092 Memory accesses to image variables declared using the "coherent" storage 1093 qualifier are performed coherently with similar accesses from other shader 1094 invocations. In particular, when reading a variable declared as 1095 "coherent", the values returned will reflect the results of previously 1096 completed writes performed by other shader invocations. When writing a 1097 variable declared as "coherent", the values written will be reflected in 1098 subsequent coherent reads performed by other shader invocations. As 1099 described in the Section 2.20.X of the OpenGL Specification, shader memory 1100 reads and writes complete in a largely undefined order. The built-in 1101 function memoryBarrier() can be used if needed to guarantee the completion 1102 and relative ordering of memory accesses performed by a single shader 1103 invocation. 1104 1105 When accessing memory using variables not declared as "coherent", the 1106 memory accessed by a shader may be cached by the implementation to service 1107 future accesses to the same address. Memory stores may be cached in such 1108 a way that the values written may not be visible to other shader 1109 invocations accessing the same memory. The implementation may cache the 1110 values fetched by memory reads and return the same values to any shader 1111 invocation accessing the same memory, even if the underlying memory has 1112 been modified since the first memory read. While variables not declared 1113 as "coherent" may not be useful for communicating between shader 1114 invocations, using non-coherent accesses may result in higher performance. 1115 1116 Memory accesses to image variables declared using the "volatile" storage 1117 qualifier must treat the underlying memory as though it could be read or 1118 written at any point during shader execution by some source other than the 1119 executing shader invocation. When a volatile variable is read, its value 1120 must be re-fetched from the underlying memory, even if the shader 1121 invocation performing the read had already fetched its value from the same 1122 memory once. When a volatile variable is written, its value must be 1123 written to the underlying memory, even if the compiler can conclusively 1124 determine that its value will be overwritten by a subsequent write. Since 1125 the external source reading or writing a "volatile" variable may be 1126 another shader invocation, variables declared as "volatile" are 1127 automatically treated as coherent. 1128 1129 Memory accesses to image variables declared using the "restrict" storage 1130 qualifier may be compiled assuming that the variable used to perform the 1131 memory access is the only way to access the underlying memory using the 1132 shader stage in question. This allows the compiler to coalesce or reorder 1133 loads and stores using "restrict"-qualified image variables in ways that 1134 wouldn't be permitted for image variables not so qualified, because the 1135 compiler can assume that the underlying image won't be read or written by 1136 other code. Applications are responsible for ensuring that image memory 1137 referenced by variables qualified with "restrict" will not be referenced 1138 using other variables in the same scope; otherwise, accesses to 1139 "restrict"-qualified variables will have undefined results. 1140 1141 Memory accesses to image variables declared using the "const" storage 1142 qualifier may only read the underlying memory, which is treated as 1143 read-only. It is an error to pass an image variable qualified with 1144 "const" to imageStore() or imageAtomic*(). 1145 1146 In image variable declarations, the "coherent", "volatile", "restrict", 1147 and "const" qualifiers can be positioned anywhere in the declaration, 1148 either before or after the data type of the variable being qualified. 1149 Qualifiers before the type name apply to the image data referenced by the 1150 image variable; qualifiers after the type name apply to the image variable 1151 itself. It is an error to specify "restrict" prior to the type name, as 1152 "restrict" can only qualify the image variable itself. 1153 1154 The "coherent", "volatile", and "restrict" storage qualifiers may only be 1155 used on image variables, and may not be used on variables of any other 1156 type. "const" may be used in declarations with non-image variable types, 1157 as described in Section 4.3.2. 1158 1159 The values of variables qualified with "coherent", "volatile", "restrict", 1160 or "const" may not be assigned to function parameters lacking such 1161 qualifiers. It is legal to add qualifiers in a function call, but not to 1162 remove them. 1163 1164 vec4 funcA(layout(size4x32) image2D restrict a) { ... } 1165 vec4 funcB(layout(size4x32) image2D a) { ... } 1166 layout(size4x32) uniform image2D img1; 1167 layout(size4x32) coherent uniform image2D img2; 1168 1169 funcA(img1); // OK, adding "restrict" is allowed 1170 funcB(img2); // illegal, stripping "coherent" is not 1171 1172 1173 (Insert a new numbered section at the end of Chapter 8, Built-in 1174 Functions, p. 69) 1175 1176 Section 8.X, Image Functions 1177 1178 Variables using one of the image data types may be used in the built-in 1179 shader image memory functions defined in this section to read and write 1180 individual texels of a texture. Each image variable is an integer scalar 1181 that references an image unit, which has a texture image attached. 1182 1183 When image memory functions access memory, an individual texel in the 1184 image is identified using an i, (i,j), or (i,j,k) coordinate corresponding 1185 to the values of <coord>. For image2DMS and image2DMSArray variables (and 1186 the corresponding int/unsigned int types) corresponding to multisample 1187 textures, each texel may have multiple samples and an individual sample is 1188 identified using the integer <sample> parameter. The coordinates and 1189 sample number are used to select an individual texel in the manner 1190 described in Section 3.9.X of the OpenGL specification. 1191 1192 Loads and stores support float, integer, and unsigned integer types. The 1193 data types "gimage*" serve as placeholders meaning either "image*", 1194 "iimage*", or "uimage*" in the same way as "gvec" or "gsampler". 1195 1196 The "IMAGE_INFO" in the prototypes below is a placeholder representing 1197 33 separate functions, each for a different type of image variable. The 1198 "IMAGE_INFO" placeholder is replaced by one of the following argument 1199 lists: 1200 1201 gimage1D image, int coord 1202 gimage2D image, ivec2 coord 1203 gimage3D image, ivec3 coord 1204 gimage2DRect image, ivec2 coord 1205 gimageCube image, ivec3 coord 1206 gimageBuffer image, int coord 1207 gimage1DArray image, ivec2 coord 1208 gimage2DArray image, ivec3 coord 1209 gimageCubeArray image, ivec3 coord 1210 gimage2DMS image, ivec2 coord, int sample 1211 gimage2DMSArray image, ivec3 coord, int sample 1212 1213 (Note that each of the "gimage*" lines represents one of three different 1214 image variable types.) 1215 1216 Syntax: 1217 1218 gvec4 imageLoad(const IMAGE_INFO); 1219 1220 Description: 1221 1222 Loads the texel at the coordinate <coord> from the image unit specified 1223 by <image>. For multisample loads, the sample number is given by 1224 <sample>. When <image>, <coord>, and <sample> identify a valid texel, 1225 the bits used to represent the selected texel in memory are converted to 1226 a vec4, ivec4, or uvec4 in the manner described in Section 3.9.X of the 1227 OpenGL Specification and returned. 1228 1229 1230 Syntax: 1231 1232 void imageStore(IMAGE_INFO, gvec4 data); 1233 1234 Description: 1235 1236 Stores the value of <data> into the texel at the coordinate <coord> from 1237 the image specified by <image>. For multisample stores, the sample number 1238 is given by <sample>. When <image>, <coord>, and <sample> identify a 1239 valid texel, the bits used to represent <data> are converted to the format 1240 of the image unit in the manner described in Section 3.9.X of the OpenGL 1241 Specification and stored to the specified texel. 1242 1243 1244 Syntax: 1245 1246 uint imageAtomicAdd(IMAGE_INFO, uint data); 1247 int imageAtomicAdd(IMAGE_INFO, int data); 1248 1249 uint imageAtomicMin(IMAGE_INFO, uint data); 1250 int imageAtomicMin(IMAGE_INFO, int data); 1251 1252 uint imageAtomicMax(IMAGE_INFO, uint data); 1253 int imageAtomicMax(IMAGE_INFO, int data); 1254 1255 uint imageAtomicIncWrap(IMAGE_INFO, uint wrap); 1256 1257 uint imageAtomicDecWrap(IMAGE_INFO, uint wrap); 1258 1259 uint imageAtomicAnd(IMAGE_INFO, uint data); 1260 int imageAtomicAnd(IMAGE_INFO, int data); 1261 1262 uint imageAtomicOr(IMAGE_INFO, uint data); 1263 int imageAtomicOr(IMAGE_INFO, int data); 1264 1265 uint imageAtomicXor(IMAGE_INFO, uint data); 1266 int imageAtomicXor(IMAGE_INFO, int data); 1267 1268 uint imageAtomicExchange(IMAGE_INFO, uint data); 1269 int imageAtomicExchange(IMAGE_INFO, int data); 1270 1271 uint imageAtomicCompSwap(IMAGE_INFO, uint compare, uint data); 1272 int imageAtomicCompSwap(IMAGE_INFO, int compare, int data); 1273 1274 Description: 1275 1276 These functions perform atomic operations on individual texels or samples 1277 of an image variable. Atomic memory operations read a value from the 1278 selected texel, compute a new value using one of the operations described 1279 below, writes the new value to the selected texel, and returns the 1280 original value read. The contents of the texel being updated by the 1281 atomic operation are guaranteed not to be updated by any other image store 1282 or atomic function between the time the original value is read and the 1283 time the new value is written. 1284 1285 As with image load and store functions, <image>, <coord>, and <sample> 1286 specify the the individual texel to operate on. The method for 1287 identifying the individual texel operated on from <image>, <coord>, and 1288 <sample>, and the method for reading and writing the texel are specified 1289 in Section 3.9.X of the OpenGL specification. The format of the image 1290 unit must be in the "1x32" equivalence class in Table X.2 in Section 3.9.X 1291 of the OpenGL specification, otherwise the atomic operation is invalid. 1292 1293 imageAtomicAdd() computes a new value by adding the value of <data> to the 1294 contents of the selected texel. These functions support 32-bit unsigned 1295 integer operands and 32-bit signed integer operands. 1296 1297 imageAtomicMin() computes a new value by taking the minimum of the value 1298 of <data> and the contents of the selected texel. These functions support 1299 32-bit signed and unsigned integer operands. 1300 1301 imageAtomicMax() computes a new value by taking the maximum of the value 1302 of <data> and the contents of the selected texel. These functions support 1303 32-bit signed and unsigned integer operands. 1304 1305 imageAtomicIncWrap() computes a new value by adding one to the contents of 1306 the selected texel, and then forcing the result to zero if and only if the 1307 incremented value is greater than or equal to <wrap>. These functions 1308 support only 32-bit unsigned integer operands. 1309 1310 imageAtomicDecWrap() computes a new value by subtracting one from the 1311 contents of the selected texel, and then forcing the result to <wrap>-1 if 1312 the original value read from the selected texel was either zero or greater 1313 than <wrap>. These functions support only 32-bit unsigned integer 1314 operands. 1315 1316 imageAtomicAnd() computes a new value by performing a bitwise and of the 1317 value of <data> and the contents of the selected texel. These functions 1318 support 32-bit signed and unsigned integer operands. 1319 1320 imageAtomicOr() computes a new value by performing a bitwise or of the 1321 value of <data> and the contents of the selected texel. These functions 1322 support 32-bit signed and unsigned integer operands. 1323 1324 imageAtomicXor() computes a new value by performing a bitwise exclusive or 1325 of the value of <data> and the contents of the selected texel. These 1326 functions support 32-bit signed and unsigned integer operands. 1327 1328 imageAtomicExchange() computes a new value by simply copying the value of 1329 <data>. These functions support 32-bit signed and unsigned integer 1330 operands. 1331 1332 imageAtomicCompSwap() compares the value of <compare> and the contents of 1333 the selected texel. If the values are equal, the new value is given by 1334 <data>; otherwise, it is taken from the original value loaded from the 1335 texel. These functions support 32-bit signed and unsigned integer 1336 operands. 1337 1338 1339 (Insert another new numbered section at the end of Chapter 8, Built-in 1340 Functions, p. 69) 1341 1342 Section 8.Y, Shader Memory Functions 1343 1344 Shaders of all types may read and write the contents of textures and 1345 buffer objects using image variables. While the order or reads and writes 1346 within a single shader invocation is well-defined, the relative order of 1347 reads and writes to a single shared memory address from multiple separate 1348 invocations is largely undefined. 1349 1350 Syntax: 1351 1352 void memoryBarrier(void); 1353 1354 Description: 1355 1356 memoryBarrier() can be used to control the ordering of memory transactions 1357 issued by a shader invocation. When called, it will wait on the 1358 completion of all memory accesses resulting from the use of image 1359 variables prior to calling the function. When all memory operations have 1360 been flushed, memoryBarrier() returns to the caller with no other effect. 1361 When this function returns, the results of any memory stores performed 1362 using coherent variables performed prior to the call will be visible to 1363 any future coherent memory access to the same addresses from other shader 1364 invocations. In particular, the values written and flushed this way in 1365 one shader stage are guaranteed to be visible to coherent memory accesses 1366 performed by shader invocations in subsequent stages when those 1367 invocations were triggered by the execution of the original shader 1368 invocation (e.g., fragment shader invocations for a primitive resulting 1369 from a particular geometry shader invocation). 1370 1371 1372 Modify Section 9, Shading Language Grammar (p. 105) 1373 1374 !!! TBD: Add grammar constructs for memory access qualifiers, allowing 1375 memory access qualifiers before or after the type in a variable 1376 declaration. 1377 1378 1379Errors 1380 1381 INVALID_VALUE is generated by Uniform1i{v} if the location refers to an 1382 image variable and the value specified is less than zero or greater than 1383 or equal to MAX_IMAGE_UNITS_EXT. 1384 1385 INVALID_OPERATION is generated by Uniform* functions other than 1386 Uniform1i{v} if the location refers to an image variable. 1387 1388 INVALID_VALUE is generated by BindImageTextureEXT if <index> is less than 1389 zero or greater than or equal to MAX_IMAGE_UNITS_EXT. 1390 1391 INVALID_VALUE is generated by BindImageTextureEXT if <texture> is not the 1392 name of an existing texture object. 1393 1394 INVALID_VALUE is generated by BindImageTextureEXT if <format> is not a 1395 legal format. 1396 1397 1398Dependencies on OpenGL 3.2 (Core Profile) 1399 1400 If only the core profile of OpenGL 3.2 is supported, references to buffer 1401 objects for conventional vertex attributes and to the Begin and RasterPos 1402 commands should be removed. 1403 1404Dependencies on OpenGL 3.1, ARB_uniform_buffer_object, and 1405EXT_bindable_uniform 1406 1407 If OpenGL 3.1, ARB_uniform_buffer_object, and EXT_bindable_uniform are not 1408 supported, references to UNIFORM_BARRIER_BIT should be removed. 1409 1410Dependencies on ARB_draw_indirect 1411 1412 If ARB_draw_indirect is not supported, references to COMMAND_BARRIER_BIT_EXT 1413 should be removed. 1414 1415Dependencies on NV_vertex_buffer_unified_memory 1416 1417 If NV_vertex_buffer_unified_memory is not supported, references to that 1418 extension and GPU addresses in the discussion of 1419 VERTEX_ATTRIB_ARRAY_BARRIER_BIT_EXT and ELEMENT_ARRAY_BARRIER_BIT_EXT should 1420 be removed. 1421 1422Dependencies on OpenGL 3.2 and ARB_texture_multisample 1423 1424 If OpenGL 3.2 and ARB_texture_multisample are not supported, references to 1425 multisample textures should be removed. 1426 1427Dependencies on OpenGL 4.0 and ARB_sample_shading 1428 1429 If OpenGL 4.0 or ARB_sample_shading is supported, the discussion of the 1430 number of shader invocations for a given fragment in the "Shader Memory 1431 Access" section of the specification should be updated to discuss the 1432 sample shading enable and the minimum sample shading factor provided in 1433 that extension. 1434 1435Dependencies on OpenGL 4.0 and ARB_texture_cube_map_array 1436 1437 If OpenGL 4.0 or ARB_texture_cube_map_array are not supported, references 1438 to cube map array textures should be removed. 1439 1440Dependencies on OpenGL 3.3 and ARB_texture_rgb10_a2ui 1441 1442 If OpenGL 3.3 or ARB_texture_rgb10_a2ui are not supported, references to 1443 the RGB10_A2UI texture format should be removed. 1444 1445Dependencies on NV_shader_buffer_load 1446 1447 If NV_shader_buffer_load is supported, the new section 2.14.X (Shader 1448 Memory Access) should be combined with "Section 2.20.X, Shader Memory 1449 Access" from NV_shader_buffer_load. 1450 1451Dependencies on OpenGL 4.0, ARB_gpu_shader5, and NV_gpu_shader5 1452 1453 If OpenGL 4.0, ARB_gpu_shader5, and NV_gpu_shader5 are not supported, the 1454 modifications to the OpenGL Shading Language Specification should be 1455 removed. 1456 1457Dependencies on OpenGL 4.0 and ARB_tessellation_shader 1458 1459 If OpenGL 4.0 and ARB_tessellation_shader are not supported, references to 1460 tessellation control and evaluation shaders should be removed. 1461 1462Dependencies on EXT_shader_atomic_counters 1463 1464 If EXT_shader_atomic_counters is not supported, remove references to 1465 ATOMIC_COUNTER_BARRIER_BIT_EXT. 1466 1467Dependencies on EXT_depth_bounds_test 1468 1469 If EXT_depth_bounds_test is not supported, references to the depth bounds 1470 test should be removed. 1471 1472Dependencies on EXT_separate_shader_objects 1473 1474 If EXT_separate_shader_objects is supported, early depth tests are enabled 1475 if and only if (a) there is an active program for the fragment shader 1476 stage and (b) the fragment shader in that program enables early depth 1477 tests using a layout qualifier. 1478 1479Dependencies on NV_gpu_program5 1480 1481 If NV_gpu_program5 is supported, the following edits are made to extend 1482 the assembly programming model documented in the NV_gpu_program4 extension 1483 and extended by NV_gpu_program5. No "OPTION" line is required; the 1484 following capability is implied by NV_gpu_program5 program headers such as 1485 "!!NVfp5.0". 1486 1487 If NV_gpu_program5 is not supported, the contents of this dependencies 1488 section should be ignored. 1489 1490 Section 2.X.2, Program Grammar 1491 1492 (add the following rules to the grammar) 1493 1494 <namingStatement> ::= IMAGE_statement 1495 1496 <IMAGE_statement> ::= "IMAGE" <establishName> <imageSingleInit> 1497 | "IMAGE" <establishName> <optArraySize> 1498 <imageMultipleInit> 1499 1500 <imageSingleInit> ::= "=" <imageUseDS> 1501 1502 <imageMultipleInit> ::= "=" "{" <imageItemList> "}" 1503 1504 <imageItemList> ::= <imageUseDM> 1505 | <imageUseDM> "," <imageItemList> 1506 1507 <imageUseDS> ::= "image" <arrayMemAbs> 1508 1509 <imageUseDM> ::= <imageUseDS> 1510 | "image" <arrayRange> 1511 1512 1513 <instruction> ::= <ImageInstruction> 1514 1515 <ImageInstruction>: ::= <LOADIMop_instruction> 1516 | <STOREIMop_instruction> 1517 | <ATOMIMop_instruction> 1518 1519 <LOADIMop_instruction> ::= <LOADIMop> <opModifiers> <instResult> "," 1520 <instOperandV> "," <imageAccess> 1521 1522 <STOREIMop_instruction> ::= <STOREIMop> <opModifiers> <imageUnit> "," 1523 <instOperandV> "," <instOperandV> "," 1524 <imageTarget> 1525 1526 <ATOMIMop_instruction> ::= <ATOMIMop> <opModifiers> <instResult> "," 1527 <instOperandV> "," <instOperandV> "," 1528 <imageAccess> 1529 1530 <LOADIMop> ::= "LOADIM" 1531 <STOREIMop> ::= "STOREIM" 1532 <ATOMIMop> ::= "ATOMIM" 1533 1534 <imageAccess> ::= <imageUnit> "," <imageTarget> 1535 1536 <imageUnit> ::= "image" <arrayMemAbs> 1537 | <imageVarName> <optArrayMem> 1538 1539 <imageTarget> ::= "1D" 1540 | "2D" 1541 | "3D" 1542 | "RECT" 1543 | "CUBE" 1544 | "BUFFER" 1545 | "ARRAY1D" 1546 | "ARRAY2D" 1547 | "ARRAYCUBE" 1548 | "2DMS" 1549 | "ARRAY2DMS" 1550 1551 Section 2.X.3.X, Program Image Variables 1552 1553 Program image variables are used as constants during program execution 1554 and refer the image objects bound to one or more image units. All 1555 image variables have associated bindings and are read-only during 1556 program execution. Image variables retain their values across program 1557 invocations, and the set of image units to which they refer is 1558 constant. The texture object a variable refers to may be changed by 1559 binding a new texture object to the corresponding image unit. Image 1560 variables may only be used to identify a texture object in image 1561 instructions, and may not be used as operands in any other instruction. 1562 Image variables may be declared explicitly via the <IMAGE_statement> 1563 grammar rule, or implicitly by using an image unit binding in an 1564 instruction. 1565 1566 Image array variables may be declared as arrays, but the list of image 1567 units assigned to the array must increase consecutively. 1568 1569 Binding Components Underlying State 1570 --------------- ---------- ------------------------------------------ 1571 image[a] x image object bound to image unit a 1572 image[a..b] x image objects bound to image units a 1573 through b 1574 1575 Table X.12.2: Image Unit Bindings. <a> and <b> indicate image unit 1576 numbers. 1577 1578 If an image binding matches "image[a]", the image variable is filled 1579 with a single integer referring to image unit <a>. 1580 1581 If an image binding matches "image[a..b]", the image variable is 1582 filled with an array of integers referring to image units <a> through 1583 <b>, inclusive. A program will fail to compile if <a> or <b> is 1584 negative or greater than or equal to the number of image units 1585 supported, or if <a> is greater than <b>. 1586 1587 1588 Modify Section 2.X.4, Program Execution Environment 1589 1590 Instr- Modifiers 1591 uction V F I C S H D Out Inputs Description 1592 ------- -- - - - - - - --- -------- -------------------------------- 1593 ATOMIM 50 - - X - - - s v,vs,i atomic image operation 1594 LOADIM 50 - - X X - F v vs,i image load 1595 MEMBAR 50 - - - - - - - - memory barrier 1596 STOREIM 50 X X - - - F - i,v,vs image store 1597 1598 ... 1599 1600 The input and output columns describe the formats of the operands and 1601 results of the instruction. 1602 1603 i: IMAGE variable, read-only 1604 1605 1606 Modify Section 2.X.4.1, Program Instruction Modifiers 1607 1608 (add to Table X.14 of the NV_gpu_program4 specification.) 1609 1610 Modifier Description 1611 -------- --------------------------------------------------- 1612 COH Mark LOADIM and STOREIM operations as coherent 1613 VOL Make LOADIM and STOREIM operations as volatile 1614 1615 For image load and store operations, the "COH" modifier controls whether 1616 the operation is performed in a manner guaranteed to be coherent with 1617 loads and stores performed by other shader invocations. 1618 1619 For image load and store operations, the "VOL" modifier controls whether 1620 the operation should treat the contents of the image accessed as volatile, 1621 where the underlying image contents may be changed at any point during 1622 shader execution by some source other than the current shader thread. 1623 1624 1625 Section 2.X.8.Z, LOADIM: Image Load 1626 1627 The LOADIM instruction takes the components of a single signed integer 1628 vector operand and uses them as coordinates to perform an unformatted 1629 image load from the texture bound to the image unit specified by 1630 <imageUnit>. Unformatted loads read the data from memory without 1631 converting from the image unit format, by copying raw bits from memory 1632 to the destination variable according to the bit layouts described in 1633 Table X.2, where word 0 is written to the .x component, word 1 to .y, 1634 etc.. 1635 1636 Eleven image targets are supported: 1D, 2D, 3D, RECT, CUBE, BUFFER, 1637 ARRAY1D, ARRAY2D, ARRAYCUBE, 2DMS, and ARRAY2DMS. The texel coordinate 1638 is a one-, two- or three-dimensional vector, taken from the <x>, <y>, and 1639 <z> components of the operand. For the 2DMS and ARRAY2DMS, the texel 1640 coordinate is a two- or three-dimensional vector, taken from the <x>, 1641 <y>, and <z> components of the operand, and a sample number is taken from 1642 the <w> component of the operand. 1643 1644 coords = VectorLoad(op0); 1645 if (target == 1D || target == BUFFER) { 1646 coords.y = 0; 1647 } 1648 if (target == 1D || target == 2D || 1649 target == BUFFER || target == RECT || 1650 target == 2DMS) { 1651 coords.z = 0; 1652 } 1653 if (target != 2DMS && target != ARRAY2DMS) { 1654 coords.w = 0; 1655 } 1656 result = ImageLoad(image, coords); 1657 1658 When an image load uses the "S8", "U8", "S16", "U16", "F32", "S32", or 1659 "U32" storage modifiers, the <x> component of the result contains the 1660 loaded value and the <y>, <z>, and <w> components of the result are zero, 1661 zero, and one (int or float, depending on the type of the opModifier), 1662 respectively. For "S8" and "S16" modifiers, the loaded value is sign- 1663 extended; for "U8" and "U16", the loaded value is zero-extended. When 1664 an image load uses the "F32X2", "S32X2", or "U32X2" storage modifiers, 1665 the <x> and <y> components of the result contain the loaded values and 1666 the <z>, and <w> components of the result are zero and one, respectively. 1667 When an image load uses the "F32X4", "S32X4", or "U32X4" storage 1668 modifiers, all four components of the result contain the loaded values. 1669 If the image load is invalid for any of the reasons described in Section 1670 3.9.X, the result vector will be undefined. 1671 1672 LOADIM supports no base data type modifiers, but requires exactly one 1673 storage modifier. An image load is treated as invalid unless the storage 1674 modifier matches the image unit format, as described in Table X.3. The 1675 base data type of the result vector is derived from the storage modifier. 1676 The single operand is always interpreted as a signed integer vector. 1677 1678 Data Type Supported Modifers 1679 --------- ------------------- 1680 4x32 F32X4, S32X4, U32X4 1681 2x32 F32X2, S32X2, U32X2 1682 1x32 F32, S32, U32 1683 1x16 S16, U16 1684 1x8 S8, U8 1685 1686 Table X.3, Supported Storage Modifiers. Unformatted image operations 1687 are considered invalid unless the storage modifier is compatible with 1688 the "Data Type" entry for the image unit format, as described in Table 1689 X.2. 1690 1691 1692 Section 2.X.8.Z, STOREIM: Image Store 1693 1694 The STOREIM instruction takes the components of the second signed integer 1695 vector operand, uses them as coordinates to perform a formatted or 1696 unformatted image store to the texture bound to the image unit specified 1697 by <imageUnit> using the data specified in the first vector operand. The 1698 store is performed in the manner described in Section 3.9.X. 1699 1700 Eleven image targets are supported: 1D, 2D, 3D, RECT, CUBE, BUFFER, 1701 ARRAY1D, ARRAY2D, ARRAYCUBE, 2DMS, and ARRAY2DMS. The texel coordinate 1702 is a one-, two- or three-dimensional vector, taken from the <x>, <y>, and 1703 <z> components of the operand. For the 2DMS and ARRAY2DMS, the texel 1704 coordinate is a two- or three-dimensional vector, taken from the <x>, 1705 <y>, and <z> components of the operand, and a sample number is taken from 1706 the <w> component of the operand. 1707 1708 data = VectorLoad(op0); 1709 coords = VectorLoad(op1); 1710 if (target == 1D || target == BUFFER) { 1711 coords.y = 0; 1712 } 1713 if (target == 1D || target == 2D || 1714 target == BUFFER || target == RECT || 1715 target == 2DMS) { 1716 coords.z = 0; 1717 } 1718 if (target != 2DMS && target != ARRAY2DMS) { 1719 coords.w = 0; 1720 } 1721 ImageStore(image, coords, data); 1722 1723 STOREIM supports an optional base data type or storage modifier. If a 1724 storage modifier is specified, the store is unformatted; otherwise, it is 1725 formatted. Formatted stores operate as described in Section 3.9.X. 1726 Unformatted stores write the data to memory without converting to the 1727 image unit format, by copying raw bits from the source variable to 1728 memory according to the bit layouts described in Table X.2, where word 1729 0 is taken from the <x> component, word 1 from <y>, etc.. 1730 1731 An unformatted image store is treated as invalid unless the 1732 storage modifier matches image unit format, as described in Table X.3. 1733 When performing an unformatted store using the "S8", "U8", "S16", or 1734 "U16" modifiers, all bits but the least significant eight or sixteen 1735 are dropped as part of the store. When performing a formatted store, 1736 the first operand will be converted to the image unit format as part 1737 of the store. 1738 1739 The base data type of the first vector operand is derived from the data 1740 type or storage modifier. The second operand is always interpreted as a 1741 signed integer vector. 1742 1743 1744 Section 2.X.8.Z, ATOMIM: Image Atomic Memory Operation 1745 1746 The ATOMIM instruction takes the components of the second signed integer 1747 vector operand, uses them as coordinates to perform an unformatted image 1748 load from the texture bound to the image unit specified by <imageUnit>, 1749 performs a computation using the loaded value and the first vector 1750 operand, performs an unformatted store of the result of the computation to 1751 the same texel, and then returns the loaded value in the vector result. 1752 The atomic operation is performed in the manner described in Section 1753 3.9.X. 1754 1755 The ATOMIM instruction has two required instruction modifiers. The atomic 1756 modifier specifies the type of computation to be performed. The storage 1757 modifier specifies the size and data type of the operand read from the 1758 image unit and the base data type of the operation used to compute the 1759 value to be written back. 1760 1761 atomic storage 1762 modifier modifiers operation 1763 -------- --------- -------------------------------------- 1764 ADD U32, S32 compute a sum 1765 MIN U32, S32 compute minimum 1766 MAX U32, S32 compute maximum 1767 IWRAP U32 increment memory, wrapping at operand 1768 DWRAP U32 decrement memory, wrapping at operand 1769 AND U32, S32 compute bit-wise AND 1770 OR U32, S32 compute bit-wise OR 1771 XOR U32, S32 compute bit-wise XOR 1772 EXCH U32, S32 exchange memory with operand 1773 CSWAP U32, S32 compare-and-swap 1774 1775 Table X.4, Supported atomic and storage modifiers for the ATOMIM 1776 instruction. 1777 1778 Not all storage modifiers are supported by ATOMIM, and the set of 1779 modifiers allowed for any given instruction depends on the atomic modifier 1780 specified. Table X.4 enumerates the set of atomic modifiers supported by 1781 the ATOMIM instruction, and the storage modifiers allowed for each. 1782 1783 data = VectorLoad(op0); 1784 coords = VectorLoad(op1); 1785 if (target == 1D || target == BUFFER) { 1786 coords.y = 0; 1787 } 1788 if (target == 1D || target == 2D || 1789 target == BUFFER || target == RECT || 1790 target == 2DMS) { 1791 coords.z = 0; 1792 } 1793 if (target != 2DMS && target != ARRAY2DMS) { 1794 coords.w = 0; 1795 } 1796 result = ImageLoad(coords, data); 1797 switch (atomicModifier) { 1798 case ADD: 1799 writeval = tmp0.x + result; 1800 break; 1801 case MIN: 1802 writeval = min(tmp0.x, result); 1803 break; 1804 case MAX: 1805 writeval = max(tmp0.x, result); 1806 break; 1807 case IWRAP: 1808 writeval = (result >= tmp0.x) ? 0 : result+1; 1809 break; 1810 case DWRAP: 1811 writeval = (result == 0 || result > tmp0.x) ? tmp0.x : result-1; 1812 break; 1813 case AND: 1814 writeval = tmp0.x & result; 1815 break; 1816 case OR: 1817 writeval = tmp0.x | result; 1818 break; 1819 case XOR: 1820 writeval = tmp0.x ^ result; 1821 break; 1822 case EXCH: 1823 break; 1824 case CSWAP: 1825 if (result == tmp0.x) { 1826 writeval = tmp0.y; 1827 } else { 1828 writeval = result; 1829 } 1830 break; 1831 } 1832 ImageStore(image, writeval); 1833 1834 ATOMIM performs a scalar atomic operation. The <y>, <z>, and <w> 1835 components of the result vector are undefined. 1836 1837 ATOMIM supports no base data type modifiers, but requires exactly one 1838 storage and one atomic modifier. An image atomic is treated as invalid 1839 unless the storage modifier matches the format of the texture bound to the 1840 image unit, as described in Table X.3. The base data type of the result 1841 and the first operand is derived from the storage modifier. The second 1842 operand is always interpreted as a signed integer vector. 1843 1844 1845 Section 2.X.8.Z, MEMBAR: Memory Barrier 1846 1847 The MEMBAR instruction synchronizes memory transactions to ensure that 1848 memory transactions resulting from any instruction executed by the thread 1849 prior to the MEMBAR instruction complete prior to any memory transactions 1850 issued after the instruction. 1851 1852 MEMBAR has no operands and generates no result. 1853 1854 Modify Section 3.9.X, Texture Image Loads and Stores, as added above. 1855 1856 (Add a separate paragraph and table describing how the four-component 1857 coordinate vector used in image load, store, and atomic opcodes are mapped 1858 to individual texels.) 1859 1860 When a program accesses the texture bound to an image unit using the 1861 LOADIM, STOREIM, or ATOMIM opcodes, it provides a four-component 1862 coordinate vector used to select individual texels or samples. This 1863 (x,y,z,w) vector is used to select an individual texel tau_i, tau_i_j, or 1864 tau_i_j_k according to the target of the texture bound to the image unit 1865 using Table X.5. As noted above, single-layer bindings of array or cube 1866 map textures are considered to use a texture target corresponding to the 1867 bound layer, rather than that of the full texture. 1868 1869 face/ 1870 i j k layer sample 1871 -- -- -- ----- ------ 1872 TEXTURE_1D x - - - - 1873 TEXTURE_2D x y - - - 1874 TEXTURE_3D x y z - - 1875 TEXTURE_RECTANGLE x y - - - 1876 TEXTURE_CUBE_MAP x y - z - 1877 TEXTURE_BUFFER x - - - - 1878 TEXTURE_1D_ARRAY x - - z - 1879 TEXTURE_2D_ARRAY x y - z - 1880 TEXTURE_CUBE_MAP_ARRAY_ARB x y - z - 1881 TEXTURE_2D_MULTISAMPLE x y - - w 1882 TEXTURE_2D_MULTISAMPLE_ARRAY x y - z w 1883 1884 Table X.5, Mapping of image load, store, and atomic texel coordinate 1885 components to texel numbers. 1886 1887 1888Issues 1889 1890 (1) How are the format and type of the load/store determined? 1891 1892 RESOLVED: There is a natural desire to load and store using a 1893 canonical 4-vector in the shader with hardware converting to/from a 1894 format compatible with the bound image, to be consistent with how 1895 texture loads and fragment shader outputs currently behave. There is 1896 also good reason to allow some flexibility in the format used for image 1897 accesses being different from the internal format of the texture level. 1898 We allow format conversions to and from any format that image units 1899 support. We make the format be selected when the image is bound to an 1900 image unit, and define which image unit formats can be used for which 1901 texture level internal formats. For example, it is legal to access an 1902 image whose internal format is RGBA8 with an image unit format of 1903 R32UI. 1904 1905 (2) What set of texture formats should be supported for image loads and 1906 stores? 1907 1908 RESOLVED: We allow textures to be bound to image units if and only if 1909 the implementation supports formatted stores for the texture format. 1910 Any texture formats not explicitly enumerated in this extension may not 1911 be bound to an image unit, although future extensions may add new 1912 formats to the set of supported formats. 1913 1914 In particular, this extension supports one-, two-, and four-component 1915 textures with 8-, 16-, and 32-bit components, including floating-point, 1916 signed integer, unsigned integer, as well as signed and unsigned 1917 normalized formats. Additionally, a small number of other formats are 1918 supported, including the 11/11/10 RGB format from EXT_packed_float and 1919 10/10/10/2 unsigned normalized RGBA. 1920 1921 (3) Should we general support image loads and stores for three-component 1922 "RGB" formats? 1923 1924 RESOLVED: Not in this extension. If an application needs to perform 1925 image loads and stores on a three-component texture, it could use an 1926 equivalent RGBA format and ignore the alpha component. The 1927 EXT_texture_swizzle extension could be used to make the values returned 1928 by texture appear identical to an RGB texture, if required. 1929 1930 (4) Should textures be unbound from image units when they are deleted? 1931 1932 RESOLVED: Yes, this matches behavior of existing bind points. 1933 1934 (5) Should we support image loads and stores for the deprecated LUMINANCE, 1935 LUMINANCE_ALPHA, and ALPHA formats? 1936 1937 RESOLVED: No, only support the RGBA-style formats. EXT_texture_swizzle 1938 can be used to mimic luminance and alpha if required. 1939 1940 (6) Should we support 64-bit atomics on images? Should we support atomics 1941 at all on formats with 8-, 16-, 64-, or 128-bit texels? 1942 1943 RESOLVED: No, we will only support 32-bit atomic operations on images. 1944 1945 (7) How do shader image loads and stores interact with texture 1946 completeness? What happens if you bind a texture with inconsistent 1947 mipmaps? 1948 1949 RESOLVED: The image unit is treated as if nothing were bound, where 1950 all accesses are treated as invalid. 1951 1952 (8) What happens if the value passed to Uniform1i to specify the image 1953 unit corresponding to a image variable refers to a non-existent image 1954 unit (i.e., is negative or greater than or equal to the number of 1955 image units supported)? 1956 1957 RESOLVED: Values referring to invalid image units will be rejected and 1958 produce an INVALID_VALUE error. 1959 1960 (9) Should we provide counting rules for image variable use in different 1961 shaders like we have for samplers? In particular, there are limits 1962 on the amount of state, the number of active samplers in each shader 1963 stage, and the sum of the active sampler counts in each stage. 1964 1965 RESOLVED: No. It was considered sufficient to have just a limit on the 1966 total number of image units in the implementation (i.e., the number of 1967 distinct values that the variable can be set to). 1968 1969 (10) Can this extension be used to load and store values into a buffer 1970 object? Into a renderbuffer? 1971 1972 RESOLVED: Yes, indirectly. The BUFFER_TEXTURE target provided by 1973 OpenGL 3.0 and the EXT_texture_buffer_object extension allows an 1974 application to create a one-dimensional buffer texture using the data 1975 store of a buffer object. This buffer texture may be bound to an image 1976 unit and accessed with an imageBuffer variable in the Shading Language. 1977 1978 This extension adds support for image accesses to multisample textures, 1979 but not renderbuffers. Note that with the ARB_texture_multisample 1980 extension, there is no longer a good reason to use renderbuffers. 1981 Existing 2D or rectangle targets already provided a superset of single- 1982 sample renderbuffer functionality; the new ARB extension provides a 1983 superset of multisample renderbuffer functionality. 1984 1985 (11) What amount of automatic synchronization is provided for image loads 1986 and stores? In particular, is the use of MemoryBarrierEXT() required 1987 to ensure consistent ordering relative to other GL operations? Or is 1988 some other mechanism (e.g., unbinding a texture from an image unit 1989 and then binding it to a texture image unit) sufficient? 1990 1991 RESOLVED: Use of MemoryBarrierEXT is required, and there is no 1992 automatic synchronization when images are bound or unbound. 1993 1994 Implicit synchronization is difficult, as it might require some 1995 combination of: 1996 1997 - tracking which images might be written (randomly) in the shader 1998 itself; 1999 2000 - assuming that if a shader that performs writes is executed, all 2001 texels of all bound images could be modified and thus must be 2002 treated as dirty; 2003 2004 - idling at the end of each primitive or draw call, so that the 2005 results of all previous commands are complete. 2006 2007 Since normal OpenGL operation is pipelined, idling would result in a 2008 significant performance impact since pipelining would otherwise allow 2009 fragment shader execution for draw call N while simultaneously 2010 performing vertex shader execution for draw call N+1. 2011 2012 (12) Should image loads and stores be allowed for all shader types? 2013 2014 RESOLVED: Yes, it seems useful. 2015 2016 Note that some shader types pose specific implementation complexities 2017 (e.g., reuse of vertices in vertex shaders, number of fragment shader 2018 invocations in multisample modes, relative order of execution within and 2019 between shader groups). We have explicitly specify several cases where 2020 the invocation count and execution order are undefined. While these 2021 cases may be a problem for some algorithms, we expect that many 2022 algorithms will not be adversely impacted. 2023 2024 (13) Should an implementation be required to throw INVALID_OPERATION 2025 errors if the dimension of the texture coordinates implied by the 2026 image variable type doesn't match the structure of the texture 2027 level/layer bound to the corresponding image unit? If not, what 2028 happens in such a mismatch? 2029 2030 RESOLVED: No. The results of image accesses are undefined. 2031 2032 (14) Should shader image variable types include a "format" implying the 2033 data type accepted/returned by shader image loads and stores? For 2034 example, an image variable corresponding to a 2D texture with format 2035 of RGBA32F might have a type "image2Dvec4", with the "vec4" 2036 indicating that the image data lines up with a four-component 2037 floating-point vector. 2038 2039 RESOLVED: No. Separate types are provided for float vs. int vs. 2040 unsigned int, but not for each image format. 2041 2042 (15) If shader image variable types include information on the texel 2043 components returned or written by shader image accesses, should an 2044 implementation be required to enforce errors if the variable type is 2045 incompatible with the format of the referenced texture? If not, or 2046 if the image variable type doesn't include format information, what 2047 happens in case of a mismatch between the texture format and the 2048 shader access format? 2049 2050 RESOLVED: We aren't including types in the variable that correspond 2051 to the image format, so an error check in the driver is not possible. 2052 2053 If an individual load, store, or atomic uses a data type incompatible 2054 with the texture bound to the image unit, loads will return and stores 2055 will write undefined values. 2056 2057 (16) Is it possible to bind the "default texture" (numbered zero) for a 2058 given texture target to an image unit? 2059 2060 RESOLVED: No. Passing zero to BindImageTexture unbinds and texture 2061 currently bound to the selected image unit. If this ability were 2062 provided, it would also be necessary to provide some mechanism to 2063 specify a texture target because there is a separate default "zero" 2064 texture for each target. 2065 2066 Note that existing framebuffer objects have a similar behavior; default 2067 textures can't be attached to an FBO. 2068 2069 (17) May bordered textures be used with image loads and stores? 2070 2071 RESOLVED: No. 2072 2073 (18) Should we have defined behavior if invalid coordinates are passed to 2074 an image load, store, or atomic operation? If so, what happens? 2075 2076 RESOLVED: Yes. We define the behavior to return zeroes on a load and 2077 atomic and to have no effect on any bound texture on stores and 2078 atomics. 2079 2080 (19) Should we have a limit on the total number of combined image units 2081 and draw buffers, and if so, what should that be? 2082 2083 RESOLVED: Yes, some hardware requires this. The program will fail to 2084 link. 2085 2086 (20) What happens if a shader specifies an image store or atomic operation 2087 for killed/discarded pixels? 2088 2089 RESOLVED: For GLSL shaders that execute a "discard" instruction, any 2090 image stores or atomics performed before executing the discard will 2091 behave normally. When the "discard" instruction is executed, the shader 2092 invocation will be terminated and will perform no further image store or 2093 atomic operations. 2094 2095 For assembly shaders (NV_gpu_program5) that execute a "KIL" instruction, 2096 any image stores or atomics performed before executing the KIL will 2097 behave normally. Unlike GLSL's "discard", the "KIL" instruction does 2098 not terminate program invocations. However, any image store or atomic 2099 operations performed after the KIL instruction do not update memory, and 2100 the value returned by atomic operations is undefined. 2101 2102 (21) When enabling early depth tests in a program, what happens if a 2103 fragment fails one of the tests (e.g., depth test)? 2104 2105 RESOLVED: The specification indicates that the fragment shader is not 2106 executed. Implementations might still end up running fragment shader 2107 for implementation-dependent reasons. For example, the fragment shader 2108 may be run in order to approximate derivatives for neighboring pixels 2109 that did pass all per-fragment tests. In these cases, implementations 2110 must guarantee that image stores have no effect. 2111 2112 (22) If implementations run fragment shaders for fragments that aren't 2113 covered by the primitive or fail early depth tests (e.g., "helper 2114 pixels"), how does that interact with stores and atomics? 2115 2116 RESOLVED: The current OpenGL specification has no formal notion of 2117 "helper" pixels. In practice, implementations may run fragment shaders 2118 for pixels near the boundaries of rasterized primitives to allow 2119 derivatives to be approximated by differencing. Typically, these shader 2120 invocations have no effect. While they may produce outputs, the outputs 2121 for these pixels will be discarded without affecting the framebuffer. 2122 The spec basically treats these shader invocations as though they don't 2123 exist. 2124 2125 If such a shader invocation performs store or atomic operations, we need 2126 to define what happens. In our definition, stores will have no effect, 2127 atomics will not update memory, and the values returned by atomics will 2128 be undefined. The fact that these invocations don't affect memory is 2129 consistent with the notion of helper pixel shader invocations not 2130 existing. 2131 2132 However, it is possible to write a fragment shader where flow control 2133 depends on the (undefined) values returned by the atomic. In this case, 2134 the undefined values returned for helper pixels could result in very 2135 long execution time (appearing to be hang) or an infinite loop. To 2136 avoid hangs in such cases, it is possible to use the fragment shader 2137 input sample mask to identify helper pixels: 2138 2139 // If the input sample mask is non-zero, at least one sample is 2140 // covered and the invocation should be treated as a real invocation. 2141 // If the sample mask is zero, nothing is covered and this should be 2142 // treated as a helper pixel. If more than 32 samples are supported, 2143 // additional words of gl_SampleMaskIn would need to be checked. 2144 if (gl_SampleMaskIn[0] != 0) { 2145 // "real" pixel, perform atomic operations 2146 } else { 2147 // "helper" pixel, skip atomics 2148 } 2149 2150 It may be desirable to formalize the notion of helper pixels in a future 2151 addition to the shading language. 2152 2153 (23) What API should we use to specify early depth tests? 2154 2155 RESOLVED: Use a layout qualifier in a fragment shader rather than 2156 having a separate program parameter or other piece of GL state. 2157 2158 (24) For formatted loads where the format doesn't include some component, 2159 what values are filled in? (0,0,0,1)? (0,0,0,0)? 2160 2161 RESOLVED: Prefer (0,0,0,1) to match other APIs. 2162 2163 (25) How does the combined-image-and-fragment-output limit interact with 2164 separate shader objects? For example, an application may want to 2165 share a single image unit between two shader stages and not have it 2166 count twice against the limit. 2167 2168 RESOLVED: The known implementations of this extension do not have this 2169 issue, so we chose not to include any spec language. Perhaps a 2170 Begin-time error could be specified in the future if this limit is 2171 exceeded. 2172 2173 (26) What sort of qualifiers should we provide relevant to memory 2174 referenced by image variables? 2175 2176 RESOLVED: We will support the qualifiers "coherent", "volatile", 2177 "restrict", and "const" to be used in image variable declarations. 2178 2179 "coherent" is used to ensure that memory accesses from different shader 2180 invocations are cached coherently (i.e., one invocation will be able to 2181 observe writes from another when the other invocation's writes 2182 complete). This coherence may mean the use of "coherent"-qualified 2183 image variables may perform more slowly than of otherwise equivalent 2184 unqualified variables. 2185 2186 "volatile" behaves is as in C, and may be needed if an algorithm 2187 requires reading image memory that may be written asynchronously by 2188 other shader invocations. 2189 2190 "restrict" behaves as in the C99 standard, and can be used to indicate 2191 that no other image variable points to the same underlying data. This 2192 permits optimizations that would otherwise be impossible if the compiler 2193 has to assume that a pair of images might end up pointing to the same 2194 data. For example, in standard C/C++, a loop like: 2195 2196 int *a, *b; 2197 a[0] = b[0] + b[0]; 2198 a[1] = b[0] + b[1]; 2199 a[2] = b[0] + b[2]; 2200 2201 would need to reload b[0] for each assignment because a[0] or a[1] might 2202 point at the same data as b[0]. With restrict, the compiler can assume 2203 that b[0] is not modified by any of the instructions and load it just 2204 once. The same considerations apply to accesses using imageLoad(), 2205 imageStore(), and imageAtomic*() builtins. 2206 2207 "const" behaves as in C, and indicates that the image memory should be 2208 treated as read-only. Note that the use of "const" in image variable 2209 declarations is different from the normal "const" qualifier, as it 2210 treats the image data referenced by the variable as constant. 2211 2212 (27) How should shaders be able to express qualifiers for image variables? 2213 2214 RESOLVED: This extension borrows from C/C++ syntax rules where a 2215 qualifier may be specified before or after the type. For example, 2216 2217 layout(size4x32) const uniform image2D imageVariable; 2218 2219 declare an image uniform whose image data are treated as read-only. We 2220 permit qualifiers to be provided either before or after the type name 2221 (image2D). The position of the qualifier is meaningful. Qualifiers 2222 before the type name apply to the data referenced by the variable. 2223 Qualifiers after the type name apply to the variable itself. 2224 2225 The closest C/C++ equivalent to the declarations above would turn 2226 declarations like: 2227 2228 layout(size4x32) const uniform image2D firstImage; 2229 layout(size4x32) uniform image2D const secondImage; 2230 2231 into: 2232 2233 const struct image2D_data * firstImage; 2234 struct image2D_data * const secondImage; 2235 2236 where "image2D" is replaced with "struct image2D_data *". In this 2237 model, the former declares <firstImage> to be a pointer to constant 2238 image data. The latter declares <secondImage> to be a constant pointer 2239 to non-constant image data. 2240 2241 For "coherent", "volatile", and "const", the qualifier should typically 2242 go before the image type. For "restrict", the qualifier must go after 2243 the image type, since "restrict" applies to the pointer, not the data 2244 being pointed to. 2245 2246 Note that a qualifier could theoretically be specified before and after 2247 the type name, such as: 2248 2249 const image2D const imageVariable; 2250 2251 which would declare <imageVariable> to be constant and to reference 2252 constant image data. In this extension, declaring an image variable to 2253 be constant isn't meaningful, as such variables can never be used as 2254 l-values. 2255 2256 (28) What is the meaning of "restrict" on a system that might run either 2257 multiple invocations of the same shader simultaneously, or multiple 2258 invocations of different shaders (vertex and fragment) 2259 simultaneously? 2260 2261 RESOLVED: When an image variable is qualified with "restrict", the only 2262 guarantee is that no other image variable in the same shader invocation 2263 references the same underlying image data. There is no guarantee that 2264 the same image couldn't be referenced by another invocation of the same 2265 shader, or by an invocation of a different shader. 2266 2267 The main function of "restrict" is to allow compilers to generate more 2268 efficient code for a single shader invocation than it could if it had to 2269 conservatively assume that accesses to other images could touch the same 2270 image data. 2271 2272 (29) What is the purpose of the memoryBarrier() built-in function? 2273 2274 RESOLVED: The memoryBarrier() function can be used to ensure that if 2275 another shader invocation or other portions observe image memory being 2276 written by a shader, that accesses appear in a predictable order. For 2277 example, consider the following code: 2278 2279 uniform imageBuffer buf1; 2280 uniform imageBuffer buf2; 2281 int offset1, offset2; 2282 vec4 data1, data2; 2283 imageStore(buf1, offset1, data1); 2284 imageStore(buf2, offset2, data2); 2285 2286 This specification doesn't require that writes be committed to memory in 2287 the order specified in the shader. It is possible that another shader 2288 invocation or some other observer would see <data2> before seeing 2289 <data1>. If an algorithm involved multiple shader invocations with one 2290 possibly needing to wait on data written by another, observing <data2> 2291 in the second shader would not ensure that <data1> has been written. 2292 However, if memoryBarrier() were used, as in the following code, the 2293 second shader would have such a guarantee. 2294 2295 imageStore(buf1, offset1, data1); 2296 memoryBarrier(); 2297 imageStore(buf2, offset2, data2); 2298 2299 (30) What happens if the texel identified by the coordinates given to an 2300 image load, store, or atomic built-in doesn't exist? (i.e., 2301 coordinates are out of bounds) 2302 2303 RESOLVED: The results of image loads return zero. Stores do not update 2304 image memory. Atomics do not update image memory and return zero. 2305 These same considerations apply if no texture is bound to an image unit, 2306 the texture is incomplete, and various other conditions. We do not ever 2307 apply wrap modes on image operations. 2308 2309 (31) Why do we have a <format> parameter on BindImageTextureEXT? 2310 2311 RESOLVED: It allows some amount of bit-casting, to view a texture with 2312 one format using another format. This parameter allows applications to 2313 work around several limitations of the specification: 2314 2315 * Image loads do not support all formats supported for stores. In 2316 particular, the only formats supported are 1x8, 1x16, 1x32, 2x32, 2317 and 4x32. Using the <format> parameter allows an application to 2318 view an RGBA8 texture as "R32UI" and examine the component bits 2319 itself. 2320 2321 * Image atomics are single-component 32-bit operations. The ability 2322 to view some other formats as "size1x32" allows atomic operations to 2323 be done on some multi-component formats, such as RGBA8. 2324 2325 (32) Do we support image atomics on multi-component texture formats? 2326 2327 RESOLVED: Only using the formats in the "size1x32" equivalence class, 2328 and then only as 32-bit scalar integer operations. Atomics do not 2329 operate on a component-by-component basis in this extension. 2330 2331 (33) What happens if early fragment testing is enabled, the early depth 2332 test passes, and a fragment shader that computes a new depth value is 2333 executed? 2334 2335 RESOLVED: The depth value produced by the fragment shader has no effect 2336 if early depth and stencil tests are enabled. The depth value computed 2337 by a fragment shader is used only by the post-fragment shader stencil 2338 and depth tests, and those tests always have no effect when early 2339 fragment tests is enabled. 2340 2341 (34) How do early fragment tests interact with occlusion queries? 2342 2343 RESOLVED: When early fragment tests are enabled, sample counting for 2344 occlusion queries also happens prior to fragment shader execution. 2345 Enabling early fragment tests can change the overall sample count, 2346 because samples killed by alpha test and alpha to coverage will still be 2347 counted if early fragment tests are enabled. 2348 2349 (35) If we provide support for multiple active program objects (e.g., one 2350 containing a vertex shader, another containing a fragment shader, as 2351 in EXT_separate_shader_object), how will early fragment tests be 2352 handled? 2353 2354 RESOLVED: The early fragment test enable should be taken from the 2355 active program object corresponding to the fragment shader stage. 2356 2357 (36) When specifying a coordinate vector to specify a texel for a 2358 TEXTURE_1D_ARRAY target, what coordinate is used to specify the 2359 layer? 2360 2361 RESOLVED: For GLSL functions, a two-component vector is specified and 2362 the second (y) component is used to select a layer. When using the 2363 LOADIM, STOREIM, and ATOMIM NV_gpu_program5 assembly opcodes, a 2364 four-component vector is provided and the third (z) component selects 2365 the layer. 2366 2367Revision History 2368 2369 Rev. Date Author Changes 2370 ---- -------- -------- ----------------------------------------- 2371 7 10/16/13 pbrown Update issue (20) to clarify that any image 2372 stores and atomics issued before a "discard" do 2373 have an effect. Update issue (22) to better 2374 define the behavior of stores and atomics on 2375 "helper" pixels and to suggest a workaround for 2376 shaders that need to use values returned by 2377 atomics (undefined for helper pixels) in flow 2378 control constructs. 2379 2380 6 12/12/10 pbrown Fix minor errata reported by spec reviewers 2381 (bugs 6870 and 6991). 2382 2383 5 09/17/10 pbrown Clean up the spec language specifying the 2384 mapping of coordinates to texels according to 2385 the texture target. For 1D arrays, GLSL wants 2386 the layer in the second component of a 2387 two-component vector while NV_gpu_program5 wants 2388 it in the third component of a four-component 2389 vector. Also clarify that single-layer bindings 2390 of an array or cube map texture use a target 2391 appropriate to the bound layer. 2392 2393 4 03/23/10 pbrown Add interaction with EXT_separate_shader_objects. 2394 Update issues section to include some issues 2395 left behind in NV_gpu_shader5 when specs were 2396 refactored. 2397 2398 3 03/21/10 pbrown Update spec overview, interactions, and issues 2399 sections; miscellaneous minor clarifications. 2400 2401 2 03/16/10 pbrown Add a separate #extension line for this 2402 extension; needed since the became packaged 2403 separately from ARB_gpu_shader5. Added C99-like 2404 "restrict" qualifier to indicate that an image 2405 variable won't share underlying image contents 2406 with any other variable. Added support for 2407 "const" qualifiers on images to allow indicate 2408 read-only image data. Added language describing 2409 the significance of the position of image 2410 variable qualifiers. Clarified rules on use of 2411 image variables as function parameters; adding 2412 qualifiers is OK, stripping them off is not. 2413 Updated image layout qualifier section to 2414 clarify that "size" layout qualifiers are 2415 required on both uniform and function parameter 2416 declarations. Added "const" qualifier on the 2417 image argument in imageLoad() prototypes. 2418 Updated extension names in dependency sections. 2419 Add support for stores to the RGB10_A2 texture 2420 format from OpenGL 3.3. Add several issues. 2421 2422 1 jbolz Internal revisions. 2423