1Name 2 3 ARB_shader_image_load_store 4 5Name Strings 6 7 GL_ARB_shader_image_load_store 8 9Contact 10 11 Jeff Bolz, NVIDIA Corporation (jbolz 'at' nvidia.com) 12 Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) 13 14Contributors 15 16 Barthold Lichtenbelt, NVIDIA 17 Bill Licea-Kane, AMD 18 Eric Werness, NVIDIA 19 Graham Sellers, AMD 20 Greg Roth, NVIDIA 21 Nick Haemel, AMD 22 Pierre Boudier, AMD 23 Piers Daniell, NVIDIA 24 25Notice 26 27 Copyright (c) 2011-2014 The Khronos Group Inc. Copyright terms at 28 http://www.khronos.org/registry/speccopyright.html 29 30Specification Update Policy 31 32 Khronos-approved extension specifications are updated in response to 33 issues and bugs prioritized by the Khronos OpenGL Working Group. For 34 extensions which have been promoted to a core Specification, fixes will 35 first appear in the latest version of that core Specification, and will 36 eventually be backported to the extension document. This policy is 37 described in more detail at 38 https://www.khronos.org/registry/OpenGL/docs/update_policy.php 39 40Status 41 42 Complete. Approved by the ARB on 2011/06/20. 43 Approved by the Khronos Promoters on 2011/07/29. 44 45Version 46 47 Last Modified Date: September 11, 2014 48 Revision: 35 49 50Number 51 52 ARB Extension #115 53 54Dependencies 55 56 This extension is written against the OpenGL 3.2 specification 57 (Compatibility Profile). 58 59 This extension is written against version 1.50 (revision 09) of the OpenGL 60 Shading Language Specification. 61 62 OpenGL 3.0 and GLSL 1.30 are required. 63 64 This extension interacts trivially with OpenGL 3.2 (Core Profile). 65 66 This extension interacts trivially with OpenGL 3.1, 67 ARB_uniform_buffer_object, and EXT_bindable_uniform. 68 69 This extension interacts trivially with ARB_draw_indirect. 70 71 This extension interacts trivially with NV_vertex_buffer_unified_memory. 72 73 This extension interacts with NV_parameter_buffer_object. 74 75 This extension interacts trivially with OpenGL 3.2 and 76 ARB_texture_multisample. 77 78 This extension interacts trivially with OpenGL 4.0 and ARB_sample_shading. 79 80 This extension interacts trivially with OpenGL 4.0 and 81 ARB_texture_cube_map_array. 82 83 This extension interacts trivially with OpenGL 3.3 and 84 ARB_texture_rgb10_a2ui. 85 86 This extension interacts trivially with NV_shader_buffer_load. 87 88 This extension interacts trivially with OpenGL 4.0, ARB_gpu_shader5, and 89 NV_gpu_shader5. 90 91 This extension interacts trivially with OpenGL 4.0 and 92 ARB_tessellation_shader. 93 94 This extension interacts trivially with EXT_depth_bounds_test. 95 96 This extension interacts with ARB_separate_shader_objects. 97 98 This extension interacts with EXT_shader_image_load_store. 99 100Overview 101 102 This extension provides GLSL built-in functions allowing shaders to load 103 from, store to, and perform atomic read-modify-write operations to a 104 single level of a texture object from any shader stage. These built-in 105 functions are named imageLoad(), imageStore(), and imageAtomic*(), 106 respectively, and accept integer texel coordinates to identify the texel 107 accessed. The extension adds the notion of "image units" to the OpenGL 108 API, to which texture levels are bound for access by the GLSL built-in 109 functions. To allow shaders to specify the image unit to access, GLSL 110 provides a new set of data types ("image*") similar to samplers. Each 111 image variable is assigned an integer value to identify an image unit to 112 access, which is specified using Uniform*() APIs in a manner similar to 113 samplers. 114 115 This extension also provides the capability to explicitly enable "early" 116 per-fragment tests, where operations like depth and stencil testing are 117 performed prior to fragment shader execution. In unextended OpenGL, 118 fragment shaders never have any side effects and implementations can 119 sometimes perform per-fragment tests and discard some fragments prior to 120 executing the fragment shader. Since this extension allows fragment 121 shaders to write to texture and buffer object memory using the built-in 122 image functions, such optimizations could lead to non-deterministic 123 results. To avoid this, implementations supporting this extension may not 124 perform such optimizations on shaders having such side effects. However, 125 enabling early per-fragment tests guarantees that such tests will be 126 performed prior to fragment shader execution, and ensures that image 127 stores and atomics will not be performed by fragment shader invocations 128 where these per-fragment tests fail. 129 130 Finally, this extension provides both a GLSL built-in function and an 131 OpenGL API function allowing applications some control over the ordering 132 of image loads, stores, and atomics relative to other OpenGL pipeline 133 operations accessing the same memory. Because the extension provides the 134 ability to perform random accesses to texture or buffer object memory, 135 such accesses are not easily tracked by the OpenGL driver. To avoid the 136 need for heavy-handed synchronization at the driver level, this extension 137 requires manual synchronization. The MemoryBarrier() OpenGL API 138 function allows applications to specify a bitfield indicating the set of 139 OpenGL API operations to synchronize relative to shader memory access. 140 The memoryBarrier() GLSL built-in function provides a synchronization 141 point within a given shader invocation to ensure that all memory accesses 142 performed prior to the synchronization point complete prior to any started 143 after the synchronization point. 144 145New Procedures and Functions 146 147 void BindImageTexture(uint unit, uint texture, int level, 148 boolean layered, int layer, enum access, 149 enum format); 150 151 void MemoryBarrier(bitfield barriers); 152 153New Tokens 154 155 Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, 156 GetFloatv, GetDoublev, and GetInteger64v: 157 158 MAX_IMAGE_UNITS 0x8F38 159 MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS 0x8F39 160 MAX_IMAGE_SAMPLES 0x906D 161 MAX_VERTEX_IMAGE_UNIFORMS 0x90CA 162 MAX_TESS_CONTROL_IMAGE_UNIFORMS 0x90CB 163 MAX_TESS_EVALUATION_IMAGE_UNIFORMS 0x90CC 164 MAX_GEOMETRY_IMAGE_UNIFORMS 0x90CD 165 MAX_FRAGMENT_IMAGE_UNIFORMS 0x90CE 166 MAX_COMBINED_IMAGE_UNIFORMS 0x90CF 167 168 Accepted by the <target> parameter of GetIntegeri_v and GetBooleani_v: 169 170 IMAGE_BINDING_NAME 0x8F3A 171 IMAGE_BINDING_LEVEL 0x8F3B 172 IMAGE_BINDING_LAYERED 0x8F3C 173 IMAGE_BINDING_LAYER 0x8F3D 174 IMAGE_BINDING_ACCESS 0x8F3E 175 IMAGE_BINDING_FORMAT 0x906E 176 177 Accepted by the <barriers> parameter of MemoryBarrier: 178 179 VERTEX_ATTRIB_ARRAY_BARRIER_BIT 0x00000001 180 ELEMENT_ARRAY_BARRIER_BIT 0x00000002 181 UNIFORM_BARRIER_BIT 0x00000004 182 TEXTURE_FETCH_BARRIER_BIT 0x00000008 183 SHADER_IMAGE_ACCESS_BARRIER_BIT 0x00000020 184 COMMAND_BARRIER_BIT 0x00000040 185 PIXEL_BUFFER_BARRIER_BIT 0x00000080 186 TEXTURE_UPDATE_BARRIER_BIT 0x00000100 187 BUFFER_UPDATE_BARRIER_BIT 0x00000200 188 FRAMEBUFFER_BARRIER_BIT 0x00000400 189 TRANSFORM_FEEDBACK_BARRIER_BIT 0x00000800 190 ATOMIC_COUNTER_BARRIER_BIT 0x00001000 191 ALL_BARRIER_BITS 0xFFFFFFFF 192 193 Returned by the <type> parameter of GetActiveUniform: 194 195 IMAGE_1D 0x904C 196 IMAGE_2D 0x904D 197 IMAGE_3D 0x904E 198 IMAGE_2D_RECT 0x904F 199 IMAGE_CUBE 0x9050 200 IMAGE_BUFFER 0x9051 201 IMAGE_1D_ARRAY 0x9052 202 IMAGE_2D_ARRAY 0x9053 203 IMAGE_CUBE_MAP_ARRAY 0x9054 204 IMAGE_2D_MULTISAMPLE 0x9055 205 IMAGE_2D_MULTISAMPLE_ARRAY 0x9056 206 INT_IMAGE_1D 0x9057 207 INT_IMAGE_2D 0x9058 208 INT_IMAGE_3D 0x9059 209 INT_IMAGE_2D_RECT 0x905A 210 INT_IMAGE_CUBE 0x905B 211 INT_IMAGE_BUFFER 0x905C 212 INT_IMAGE_1D_ARRAY 0x905D 213 INT_IMAGE_2D_ARRAY 0x905E 214 INT_IMAGE_CUBE_MAP_ARRAY 0x905F 215 INT_IMAGE_2D_MULTISAMPLE 0x9060 216 INT_IMAGE_2D_MULTISAMPLE_ARRAY 0x9061 217 UNSIGNED_INT_IMAGE_1D 0x9062 218 UNSIGNED_INT_IMAGE_2D 0x9063 219 UNSIGNED_INT_IMAGE_3D 0x9064 220 UNSIGNED_INT_IMAGE_2D_RECT 0x9065 221 UNSIGNED_INT_IMAGE_CUBE 0x9066 222 UNSIGNED_INT_IMAGE_BUFFER 0x9067 223 UNSIGNED_INT_IMAGE_1D_ARRAY 0x9068 224 UNSIGNED_INT_IMAGE_2D_ARRAY 0x9069 225 UNSIGNED_INT_IMAGE_CUBE_MAP_ARRAY 0x906A 226 UNSIGNED_INT_IMAGE_2D_MULTISAMPLE 0x906B 227 UNSIGNED_INT_IMAGE_2D_MULTISAMPLE_ARRAY 0x906C 228 229 Accepted by the <value> parameter of GetTexParameteriv, GetTexParameterfv, 230 GetTexParameterIiv, and GetTexParameterIuiv: 231 232 IMAGE_FORMAT_COMPATIBILITY_TYPE 0x90C7 233 234 Returned in the <data> parameter of GetTexParameteriv, GetTexParameterfv, 235 GetTexParameterIiv, and GetTexParameterIuiv when <value> is 236 IMAGE_FORMAT_COMPATIBILITY_TYPE: 237 238 IMAGE_FORMAT_COMPATIBILITY_BY_SIZE 0x90C8 239 IMAGE_FORMAT_COMPATIBILITY_BY_CLASS 0x90C9 240 241 242Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification 243(Rasterization) 244 245 Modify Section 2.14.4, Uniform Variables, p. 89 246 247 (modify second paragraph, p. 90) Sets of uniforms, except for samplers 248 and images, can be grouped into uniform blocks. ... 249 250 (Add new types to table 2.13, pp. 96-98) 251 252 Type Name Keyword 253 ------------------------------ ------------------------- 254 IMAGE_1D image1D 255 IMAGE_2D image2D 256 IMAGE_3D image3D 257 IMAGE_2D_RECT image2DRect 258 IMAGE_CUBE imageCube 259 IMAGE_BUFFER imageBuffer 260 IMAGE_1D_ARRAY image1DArray 261 IMAGE_2D_ARRAY image2DArray 262 IMAGE_CUBE_MAP_ARRAY imageCubeArray 263 IMAGE_2D_MULTISAMPLE image2DMS 264 IMAGE_2D_MULTISAMPLE_ARRAY image2DMSArray 265 INT_IMAGE_1D iimage1D 266 INT_IMAGE_2D iimage2D 267 INT_IMAGE_3D iimage3D 268 INT_IMAGE_2D_RECT iimage2DRect 269 INT_IMAGE_CUBE iimageCube 270 INT_IMAGE_BUFFER iimageBuffer 271 INT_IMAGE_1D_ARRAY iimage1DArray 272 INT_IMAGE_2D_ARRAY iimage2DArray 273 INT_IMAGE_CUBE_MAP_ARRAY iimageCubeArray 274 INT_IMAGE_2D_MULTISAMPLE iimage2DMS 275 INT_IMAGE_2D_MULTISAMPLE_ARRAY iimage2DMSArray 276 UNSIGNED_INT_IMAGE_1D uimage1D 277 UNSIGNED_INT_IMAGE_2D uimage2D 278 UNSIGNED_INT_IMAGE_3D uimage3D 279 UNSIGNED_INT_IMAGE_2D_RECT uimage2DRect 280 UNSIGNED_INT_IMAGE_CUBE uimageCube 281 UNSIGNED_INT_IMAGE_BUFFER uimageBuffer 282 UNSIGNED_INT_IMAGE_1D_ARRAY uimage1DArray 283 UNSIGNED_INT_IMAGE_2D_ARRAY uimage2DArray 284 UNSIGNED_INT_IMAGE_CUBE_MAP_ARRAY uimageCubeArray 285 UNSIGNED_INT_IMAGE_2D_MULTISAMPLE uimage2DMS 286 UNSIGNED_INT_IMAGE_2D_MULTISAMPLE_ARRAY uimage2DMSArray 287 288 289 (Add a new subsection after Section 2.14.5, Samplers, p. 106) 290 291 Section 2.14.X, Images 292 293 Images are special uniforms used in the OpenGL Shading Language to 294 identify a level of a texture to be read or written using image load, 295 store, and atomic built-in functions in the manner described in Section 296 3.9.X. The value of an image uniform is an integer specifying the image 297 unit accessed. Image units are numbered beginning at zero, and there is 298 an implementation-dependent number of available image units 299 (MAX_IMAGE_UNITS). The error INVALID_VALUE is generated if a 300 Uniform1i{v} call is used to set an image uniform to a value less than 301 zero or greater than or equal to MAX_IMAGE_UNITS. Note that image 302 units used for image variables are independent of the texture image 303 units used for sampler variables; the number of units provided by the 304 implementation may differ. Textures are bound independently and 305 separately to image and texture image units. 306 307 The type of an image variable must match the texture target of the image 308 currently bound to the image unit, otherwise the result of a load, store, 309 or atomic operation is undefined (see Section 4.1.X of the OpenGL 310 Shading Language specification for more detail). 311 312 The location of an image variable needs to be queried with 313 GetUniformLocation, just like any uniform variable. Image values need to 314 be set by calling Uniform1i{v}. Loading image variables with any of the 315 other Uniform entry point is not allowed and will result in an 316 INVALID_OPERATION error. 317 318 Unlike samplers, there is no limit on the number of active image variables 319 that may be used by a program or by any particular shader. However, given 320 that there is an implementation-dependent limit on the number of unique 321 image units, the actual number of images that may be used by all shaders 322 in a program is limited. 323 324 325 Modify Section 2.14.7, Shader Execution, p. 109 326 327 (Add a new unnumbered subsection before "Shader Inputs", p. 113) 328 329 Image Access 330 331 Shaders have the ability read and write to textures using image uniforms. 332 The maximum number of image uniforms available to individual shader stages 333 are the values of the implementation dependent constants 334 335 * MAX_VERTEX_IMAGE_UNIFORMS (vertex shaders), 336 * MAX_TESS_CONTROL_IMAGE_UNIFORMS (tessellation control shaders), 337 * MAX_TESS_EVALUATION_IMAGE_UNIFORMS (tessellation evaluation shaders), 338 * MAX_GEOMETRY_IMAGE_UNIFORMS (geometry shaders), and 339 * MAX_FRAGMENT_IMAGE_UNIFORMS (fragment shaders). 340 341 All active shaders combined cannot use more than the value of 342 MAX_COMBINED_IMAGE_UNIFORMS atomic counters. If more than one shader stage 343 accesses the same image uniform, each such access counts separately 344 against the MAX_COMBINED_IMAGE_UNIFORMS limit. 345 346 347 (Add a new numbered subsection after Section 2.14.7, Shader Execution, 348 p. 109) 349 350 Section 2.14.X, Shader Memory Access 351 352 Shaders may perform random-access reads and writes to texture or buffer 353 object memory using built-in image load, store, and atomic functions, as 354 described in the OpenGL Shading Language Specification. The ability to 355 perform such random-access reads and writes in systems that may be highly 356 pipelined results in ordering and synchronization issues discussed in the 357 sections below. 358 359 360 Shader Memory Access Ordering 361 362 The order in which texture or buffer object memory is read or written by 363 shaders is largely undefined. For some shader types (vertex, tessellation 364 evaluation, and in some cases, fragment), even the number of shader 365 invocations that might perform loads and stores is undefined. 366 In particular, the following rules apply: 367 368 * While a vertex or tessellation evaluation shader will be executed at 369 least once for each unique vertex specified by the application (vertex 370 shaders) or generated by the tessellation primitive generator 371 (tessellation evaluation shaders), it may be executed more than once 372 for implementation-dependent reasons. Additionally, if the same 373 vertex is specified multiple times in a collection of primitives 374 (e.g., repeating an index in DrawElements), the vertex shader might be 375 run only once. 376 377 * For each fragment generated by the GL, the number of fragment shader 378 invocations depends on a number of factors. If the fragment fails the 379 pixel ownership test (Section 4.1.1), the fragment shader may not be 380 executed. Otherwise, if the framebuffer has no multisample buffer 381 (SAMPLE_BUFFERS is zero), the fragment shader will be invoked exactly 382 once. If the fragment shader specifies per-sample shading, the 383 fragment shader will be run once per covered sample. Otherwise, the 384 number of fragment shader invocations is undefined, but must be in the 385 range [1,<N>], where <N> is the number of samples covered by the 386 fragment. 387 388 * If a fragment shader is invoked to process fragments or samples not 389 covered by a primitive being rasterized to facilitate the 390 approximation of derivatives for texture lookups, stores and atomics 391 have no effect. 392 393 * The relative order of invocations of the same shader type are 394 undefined. A store issued by a shader when working on primitive B 395 might complete prior to a store for primitive A, even if primitive A 396 is specified prior to primitive B. This applies even to fragment 397 shaders; while fragment shader outputs are written to the framebuffer 398 in primitive order, stores executed by fragment shader invocations are 399 not. 400 401 * The relative order of invocations of different shader types is largely 402 undefined. However, when executing a shader whose inputs are 403 generated from a previous programmable stage, the shader invocations 404 from the previous stage are guaranteed to have executed far enough to 405 generate final values for all next-stage inputs. That implies shader 406 completion for all stages except geometry; geometry shaders are 407 guaranteed only to have executed far enough to emit all needed 408 vertices. 409 410 The above limitations on shader invocation order also make some forms of 411 synchronization between shader invocations within a single set of 412 primitives unimplementable. For example, having one invocation poll 413 memory written by another invocation assumes that the other invocation has 414 been launched and can complete its writes. The only case where such a 415 guarantee is made is when the inputs of one shader invocation are 416 generated from the outputs of a shader invocation in a previous stage. 417 418 Stores issued to different memory locations within a single shader 419 invocation may not be visible to other invocations in the order they were 420 performed. The built-in function memoryBarrier() may be used to provide 421 stronger ordering of reads and writes performed by a single invocation. 422 Calling memoryBarrier() guarantees that any memory transactions issued by 423 the shader invocation prior to the call complete prior to the memory 424 transactions issued after the call. Memory barriers may be needed for 425 algorithms that require multiple invocations to access the same memory and 426 require the operations need to be performed in a partially-defined 427 relative order. For example, if one shader invocation does a series of 428 writes, followed by a memoryBarrier() call, followed by another write, 429 then another invocation that sees the results of the final write will also 430 see the previous writes. Without the memory barrier, the final write may 431 be visible before the previous writes. 432 433 The atomic memory transaction built-in functions may be used to read and 434 write a given memory address atomically. While atomic built-in functions 435 issued by multiple shader invocations are executed in undefined order 436 relative to each other, these functions perform both a read and a write of 437 a memory address and guarantee that no other memory transaction will write 438 to the underlying memory between the read and write. Atomics allow 439 shaders to use shared global addresses for mutual exclusion or as 440 counters, among other uses. 441 442 443 Shader Memory Access Synchronization 444 445 Data written to textures or buffer objects by a shader invocation may 446 eventually be read by other shader invocations, sourced by other fixed 447 pipeline stages, or read back by the application. When applications write 448 to buffer objects or textures using API commands such as TexSubImage* or 449 BufferSubData, the GL implementation knows when and where writes occur and 450 can perform implicit synchronization to ensure that operations requested 451 before the update see the original data and that subsequent operations see 452 the modified data. Without logic to track the target address of each 453 shader instruction performing a store, automatic synchronization of stores 454 performed by a shader invocation would require the GL implementation to 455 make worst-case assumptions at significant performance cost. To permit 456 cases where textures or buffers may be read or written in different 457 pipeline stages without the overhead of automatic synchronization, buffer 458 object and texture stores performed by shaders are not automatically 459 synchronized with other GL operations using the same memory. 460 461 Explicit synchronization is required to ensure that the effects of buffer 462 and texture data stores performed by shaders will be visible to subsequent 463 operations using the same objects and will not overwrite data still to be 464 read by previously requested operations. Without manual synchronization, 465 shader stores for a "new" primitive may complete before processing of an 466 "old" primitive completes. Additionally, stores for an "old" primitive 467 might not be completed before processing of a "new" primitive starts. The 468 command 469 470 void MemoryBarrier(bitfield barriers) 471 472 defines a barrier ordering the memory transactions issued prior to the 473 command relative to those issued after the barrier. For the purposes of 474 this ordering, memory transactions performed by shaders are considered to 475 be issued by the rendering command that triggered the execution of the 476 shader. <barriers> is a bitfield indicating the set of operations that 477 are synchronized with shader stores; the bits used in <barriers> are as 478 follows: 479 480 - VERTEX_ATTRIB_ARRAY_BARRIER_BIT: If set, vertex data sourced from 481 buffer objects after the barrier will reflect data written by shaders 482 prior to the barrier. The set of buffer objects affected by this bit 483 is derived from the buffer object bindings or GPU addresses used for 484 generic vertex attributes (VERTEX_ATTRIB_ARRAY_BUFFER bindings, 485 VERTEX_ATTRIB_ARRAY_ADDRESS from NV_vertex_buffer_unified_memory), as 486 well as those for arrays of named vertex attributes (e.g., vertex, 487 color, normal). 488 489 - ELEMENT_ARRAY_BARRIER_BIT: If set, vertex array indices sourced from 490 buffer objects after the barrier will reflect data written by shaders 491 prior to the barrier. The buffer objects affected by this bit are 492 derived from the ELEMENT_ARRAY_BUFFER binding and the 493 NV_vertex_buffer_unified_memory ELEMENT_ARRAY_ADDRESS address. 494 495 - UNIFORM_BARRIER_BIT: Shader uniforms and assembly program parameters 496 sourced from buffer objects after the barrier will reflect data 497 written by shaders prior to the barrier. 498 499 - TEXTURE_FETCH_BARRIER_BIT: Texture fetches from shaders, including 500 fetches from buffer object memory via buffer textures, after the 501 barrier will reflect data written by shaders prior to the barrier. 502 503 - SHADER_IMAGE_ACCESS_BARRIER_BIT: Memory accesses using shader image 504 load, store, and atomic built-in functions issued after the barrier 505 will reflect data written by shaders prior to the barrier. 506 Additionally, image stores and atomics issued after the barrier will 507 not execute until all memory accesses (e.g., loads, stores, texture 508 fetches, vertex fetches) initiated prior to the barrier complete. 509 510 - COMMAND_BARRIER_BIT: Command data sourced from buffer objects by 511 Draw*Indirect commands after the barrier will reflect data written by 512 shaders prior to the barrier. The buffer objects affected by this bit 513 are derived from the DRAW_INDIRECT_BUFFER binding and the GPU 514 address DRAW_INDIRECT_ADDRESS_NV. 515 516 - PIXEL_BUFFER_BARRIER_BIT: Reads/writes of buffer objects via the 517 PACK/UNPACK_BUFFER bindings (ReadPixels, TexSubImage, etc.) after the 518 barrier will reflect data written by shaders prior to the barrier. 519 Additionally, buffer object writes issued after the barrier will wait 520 on the completion of all shader writes initiated prior to the barrier. 521 522 - TEXTURE_UPDATE_BARRIER_BIT: Writes to a texture via Tex(Sub)Image*, 523 CopyTex(Sub)Image*, CompressedTex(Sub)Image*, and reads via 524 GetTexImage after the barrier will reflect data written by shaders 525 prior to the barrier. Additionally, texture writes from these 526 commands issued after the barrier will not execute until all shader 527 writes initiated prior to the barrier complete. 528 529 - BUFFER_UPDATE_BARRIER_BIT: Reads/writes via Buffer(Sub)Data, 530 CopyBufferSubData, ProgramBufferParametersNV, and GetBufferSubData, or 531 to buffer object memory mapped by MapBuffer(Range) after the barrier 532 will reflect data written by shaders prior to the barrier. 533 Additionally, writes via these commands issued after the barrier will 534 wait on the completion of any shader writes to the same memory 535 initiated prior to the barrier. 536 537 - FRAMEBUFFER_BARRIER_BIT: Reads and writes via framebuffer object 538 attachments after the barrier will reflect data written by shaders 539 prior to the barrier. Additionally, framebuffer writes issued after 540 the barrier will wait on the completion of all shader writes issued 541 prior to the barrier. 542 543 - TRANSFORM_FEEDBACK_BARRIER_BIT: Writes via transform feedback 544 bindings after the barrier will reflect data written by shaders prior 545 to the barrier. Additionally, transform feedback writes issued after 546 the barrier will wait on the completion of all shader writes issued 547 prior to the barrier. 548 549 - ATOMIC_COUNTER_BARRIER_BIT: Accesses to atomic counters after the 550 barrier will reflect writes prior to the barrier. 551 552 If <barriers> is ALL_BARRIER_BITS, shader memory accesses will be 553 synchronized relative to all the operations described above. 554 555 Implementations may cache buffer object and texture image memory that 556 could be written by shaders in multiple caches; for example, there may be 557 separate caches for texture, vertex fetching, and one or more caches for 558 shader memory accesses. Implementations are not required to keep these 559 caches coherent with shader memory writes. Stores issued by one 560 invocation may not be immediately observable by other pipeline stages or 561 other shader invocations because the value stored may remain in a cache 562 local to the processor executing the store, or because data overwritten by 563 the store is still in a cache elsewhere in the system. When MemoryBarrier 564 is called, the GL flushes and/or invalidates any caches relevant to the 565 operations specified by the <barriers> parameter to ensure consistent 566 ordering of operations across the barrier. 567 568 To allow for independent shader invocations to communicate by reads and 569 writes to a common memory address, image variables in the OpenGL Shading 570 Language may be declared as "coherent". Buffer object or texture image 571 memory accessed through such variables may be cached only if caches are 572 automatically updated due to stores issued by any other shader invocation. 573 If the same address is accessed using both coherent and non-coherent 574 variables, the accesses using variables declared as coherent will observe 575 the results stored using coherent variables in other invocations. Using 576 variables declared as "coherent" guarantees only that the results of 577 stores will be immediately visible to shader invocations using 578 similarly-declared variables; calling MemoryBarrier is required to ensure 579 that the stores are visible to other operations. 580 581 The following guidelines may be helpful in choosing when to use coherent 582 memory accesses and when to use barriers. 583 584 - Data that are read-only or constant may be accessed without using 585 coherent variables or calling MemoryBarrier(). Updates to the 586 read-only data via API calls such as BufferSubData will invalidate 587 shader caches implicitly as required. 588 589 - Data that are shared between shader invocations at a fine granularity 590 (e.g., written by one invocation, consumed by another invocation) should 591 use coherent variables to read and write the shared data. 592 593 - Data written by one shader invocation and consumed by other shader 594 invocations launched as a result of its execution ("dependent 595 invocations") should use coherent variables in the producing shader 596 invocation and call memoryBarrier() after the last write. The consuming 597 shader invocation should also use coherent variables. 598 599 - Data written to image variables in one rendering pass and read by the 600 shader in a later pass need not use coherent variables or 601 memoryBarrier(). Calling MemoryBarrier() with the 602 SHADER_IMAGE_ACCESS_BARRIER_BIT set in <barriers> between passes is 603 necessary. 604 605 - Data written by the shader in one rendering pass and read by another 606 mechanism (e.g., vertex or index buffer pulling) in a later pass need 607 not use coherent variables or memoryBarrier(). Calling 608 MemoryBarrier() with the appropriate bits set in <barriers> between 609 passes is necessary. 610 611 612Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification 613(Rasterization) 614 615 (insert new section immediately before Section 3.8, Texturing, p. 210) 616 617 Section 3.X, Early Per-Fragment Tests 618 619 Once fragments are produced by rasterization (sections 3.4 through 3.8), a 620 number of per-fragment operations may be performed prior to fragment 621 shader execution. If a fragment is discarded during any of these 622 operations, it will not be processed by any subsequent stage, including 623 fragment shader execution. 624 625 Up to six operations are performed on each fragment, in the following 626 order: 627 628 * the pixel ownership test, described in section 4.1.1; 629 630 * the scissor test, described in section 4.1.2; 631 632 * the depth bounds test, described in section 4.1.X (of the 633 EXT_depth_bounds_test specification); 634 635 * the stencil test, described in section 4.1.5; 636 637 * the depth buffer test, described in section 4.1.6; and 638 639 * occlusion query sample counting, described in section 4.1.7. 640 641 The pixel ownership and scissor tests are always performed. 642 643 The other operations are performed if and only if early fragment tests are 644 enabled in the active fragment shader (section 3.12.2). When early 645 per-fragment operations are enabled, the depth bounds test, stencil test, 646 depth buffer test, and occlusion query sample counting operations are 647 performed prior to fragment shader execution, and the stencil buffer, 648 depth buffer, and occlusion query sample counts will be updated 649 accordingly. When early per-fragment operations are enabled, these 650 operations will not be performed again after fragment shader execution. 651 When there is no active program, the active program has no fragment 652 shader, or the active program was linked with early fragment tests 653 disabled, these operations are performed only after fragment program 654 execution, in the order described in chapter 4. 655 656 If early fragment tests are enabled, any depth value computed by the 657 fragment shader has no effect. Additionally, the depth buffer, stencil 658 buffer, and occlusion query sample counts may be updated even for 659 fragments or samples that would be discarded after fragment shader 660 execution due to per-fragment operations such as alpha-to-coverage or 661 alpha tests. 662 663 664 (Add new section after Section 3.9.19, Texture Application, p. 268) 665 666 Section 3.9.X, Texture Image Loads and Stores 667 668 The contents of a texture may be made available for shaders to read and 669 write by binding the texture to one of a collection of image units. The 670 GL implementation provides an array of image units numbered beginning with 671 zero, with the total number of image units provided given by the 672 implementation-dependent constant MAX_IMAGE_UNITS. Unlike texture image 673 units, image units do not have a separate attachment for each texture 674 target texture; each image unit may have only one texture bound at a time. 675 676 A texture may be bound to an image unit for use by image loads and stores 677 by calling: 678 679 void BindImageTexture(uint unit, uint texture, int level, 680 boolean layered, int layer, enum access, 681 enum format); 682 683 where <unit> identifies the image unit, <texture> is the name of the 684 texture, and <level> selects a single level of the texture. If <texture> 685 is zero, any texture currently bound to image unit <unit> is unbound. If 686 <unit> is greater than or equal to the value of MAX_IMAGE_UNITS, if 687 <level> or <layer> is less than zero, or if <texture> is not the name of 688 an existing texture object, the error INVALID_VALUE is generated. 689 690 If the texture identified by <texture> is a one-dimensional array, 691 two-dimensional array, three-dimensional, cube map, cube map array, or 692 two-dimensional multisample array texture, it is possible to bind either 693 the entire texture level or a single layer or face of the texture level. 694 If <layered> is TRUE, the entire level is bound. If <layered> is FALSE, 695 only the single layer identified by <layer> will be bound. When <layered> 696 is FALSE, the single bound layer is treated as a different texture target 697 for image accesses: 698 699 * one-dimensional array texture layers are treated as one-dimensional 700 textures; 701 702 * two-dimensional array, three-dimensional, cube map, cube map array 703 texture layers are treated as two-dimensional textures; and 704 705 * two-dimensional multisample array textures are treated as 706 two-dimensional multisample textures. 707 708 For cube map textures where <layered> is FALSE, the face is taken by 709 mapping the layer number to a face according to table 4.13. For cube map 710 array textures where <layered> is FALSE, the selected layer number is 711 mapped to a texture layer and cube face using the following equations and 712 mapping <face> to a face according to table 4.13. 713 714 layer = floor(layer_orig / 6) 715 face = layer_orig - (layer * 6) 716 717 If the texture identified by <texture> does not have multiple layers or 718 faces, the entire texture level is bound, regardless of the values 719 specified by <layered> and <layer>. 720 721 <format> specifies the format that the elements of the image will be 722 treated as when doing formatted stores, as described later in this 723 section. This is referred to as the "image unit format". This must be one 724 of the formats listed in Table X.2; otherwise, the error INVALID_VALUE is 725 generated. 726 727 <access> specifies whether the texture bound to the image will be treated 728 as READ_ONLY, WRITE_ONLY, or READ_WRITE. If a shader reads from an image 729 unit with a texture bound as WRITE_ONLY, or writes to an image unit with a 730 texture bound as READ_ONLY, the results of that shader operation are 731 undefined and may lead to application termination. 732 733 If a texture object bound to one or more image units is deleted by 734 DeleteTextures, it is detached from each such image unit, as though 735 BindImageTexture were called with <unit> identifying the image unit and 736 <texture> set to zero. 737 738 When a shader accesses the texture bound to an image unit using a built-in 739 image load, store, or atomic function, it identifies a single texel by 740 providing a one-, two-, or three-dimensional coordinate. Multisample 741 texture accesses also specify a sample number. A coordinate vector is 742 mapped to an individual texel tau_i, tau_i_j, or tau_i_j_k according to 743 the target of the texture bound to the image unit using Table X.1. As 744 noted above, single-layer bindings of array or cube map textures are 745 considered to use a texture target corresponding to the bound layer, 746 rather than that of the full texture. 747 748 Face/ 749 i j k layer 750 -- -- -- ----- 751 TEXTURE_1D x - - - 752 TEXTURE_2D x y - - 753 TEXTURE_3D x y z - 754 TEXTURE_RECTANGLE x y - - 755 TEXTURE_CUBE_MAP x y - z 756 TEXTURE_BUFFER x - - - 757 TEXTURE_1D_ARRAY x - - y 758 TEXTURE_2D_ARRAY x y - z 759 TEXTURE_CUBE_MAP_ARRAY x y - z 760 TEXTURE_2D_MULTISAMPLE x y - - 761 TEXTURE_2D_MULTISAMPLE_ARRAY x y - z 762 763 Table X.1, Mapping of image load, store, and atomic texel coordinate 764 components to texel numbers. 765 766 If the texture target has layers or cube map faces, the layer or face 767 number is taken from the <layer> argument of BindImageTexture if the 768 texture is bound with <layered> set to FALSE, or from the coordinate 769 identified by Table X.1 otherwise. For cube map and cube map array 770 textures with <layered> set to TRUE, the coordinate is mapped to a layer 771 and face in the same manner as described for the <layer> argument of 772 BindImageTexture. 773 774 If the individual texel identified for an image load, store, or atomic 775 operation doesn't exist, the access is treated as invalid. Invalid image 776 loads will return zero. Invalid image stores will have no effect. 777 Invalid image atomics will not update any texture bound to the image unit 778 and will return zero. An access is considered invalid if: 779 780 * no texture is bound to the selected image unit; 781 782 * the texture bound to the selected image unit is incomplete; 783 784 * the texture level bound to the image unit is less than the base 785 level or greater than the maximum level of the texture; 786 787 * [[compatiblity profile only]] the texture bound to the image unit is 788 bordered; 789 790 * the internal format of the texture bound to the image unit is not 791 found in Table X.2; 792 793 * the internal format of the texture bound to the image unit is 794 incompatible with the specified <format> according to Table X.3; 795 796 * the texture bound to the image unit has layers, and the selected layer 797 or cube map face doesn't exist; 798 799 * the selected texel tau_i, tau_i_j, or tau_i_j_k doesn't exist; 800 801 * the image has more samples than the implementation-dependent value of 802 MAX_IMAGE_SAMPLES. 803 804 Additionally, there are a number of cases where image load, store, or 805 atomic operations are considered to involve a format mismatch. In such 806 cases, undefined values will be returned by image loads and atomic 807 operations and undefined values will be written by stores and atomic 808 operations. A format mismatch will occur if: 809 810 * the type of image variable used to access the image unit does not 811 match the target of a texture bound to the image unit with <layered> 812 set to TRUE; 813 814 * the type of image variable used to access the image unit does not 815 match the target corresponding to a single layer of a multi-layer 816 texture target bound to the image unit with <layered> set to FALSE; 817 818 * the type of image variable used to access the image unit has a 819 component data type (floating-point, signed integer, unsigned integer) 820 incompatible with the format of the image unit; 821 822 * the format layout qualifier for an image variable used for an image 823 load or atomic operation does not match the format of the image unit, 824 according to Table X.2; or 825 826 * the image variable used for an image store has a format layout 827 qualifier, and that qualifier does not match the format of the image 828 unit, according to Table X.2. 829 830 For textures with multiple samples per texel, the sample selected for an 831 image load, store, or atomic is undefined if the <sample> coordinate is 832 negative or greater than or equal to the number of samples in the 833 texture. 834 835 If a shader performs an image load, store, or atomic operation using an 836 image variable declared as an array, and if the index used to select an 837 individual element is negative or greater than or equal to the size 838 of the array, the results of the operation are undefined but may not lead 839 to termination. 840 841 Accesses to textures bound to image units do format conversions based on 842 the <format> argument specified when the image is bound. Loads always 843 return a value as a vec4, ivec4, or uvec4, and stores always take the 844 source data as a vec4, ivec4, or uvec4. Data are converted to/from the 845 specified format according to the process described for a TexImage2D or 846 GetTexImage command with <format> and <type> as RGBA and FLOAT for vec4 847 data, with <format> and <type> as RGBA_INTEGER and INT for ivec4 data, or 848 with <format> and <type> as RGBA_INTEGER and UNSIGNED_INT for uvec4 data. 849 Unused components are filled in with (0,0,0,1) (where "1" is either a 850 floating-point or integer value, depending on the format). 851 852 Any image variable used for shader loads or atomic memory operations must 853 be declared with a format layout qualifier matching the format of its 854 associated image unit, as enumerated in Table X.2. Otherwise, the access 855 is considered to involve a format mismatch, as described above. Image 856 variables used exclusively for image stores need not include a format 857 layout qualifier, but any declared qualifier must match the image unit 858 format to avoid a format mismatch. 859 860 Image Unit Format Format Qualifer 861 ----------------- --------------- 862 RGBA32F rgba32f 863 RGBA16F rgba16f 864 RG32F rg32f 865 RG16F rg16f 866 R11F_G11F_B10F r11f_g11f_b10f 867 R32F r32f 868 R16F r16f 869 870 RGBA32UI rgba32ui 871 RGBA16UI rgba16ui 872 RGB10_A2UI rgb10_a2ui 873 RGBA8UI rgba8ui 874 RG32UI rg32ui 875 RG16UI rg16ui 876 RG8UI rg8ui 877 R32UI r32ui 878 R16UI r16ui 879 R8UI r8ui 880 881 RGBA32I rgba32i 882 RGBA16I rgba16i 883 RGBA8I rgba8i 884 RG32I rg32i 885 RG16I rg16i 886 RG8I rg8i 887 R32I r32i 888 R16I r16i 889 R8I r8i 890 891 RGBA16 rgba16 892 RGB10_A2 rgb10_a2 893 RGBA8 rgba8 894 RG16 rg16 895 RG8 rg8 896 R16 r16 897 R8 r8 898 899 RGBA16_SNORM rgba16_snorm 900 RGBA8_SNORM rgba8_snorm 901 RG16_SNORM rg16_snorm 902 RG8_SNORM rg8_snorm 903 R16_SNORM r16_snorm 904 R8_SNORM r8_snorm 905 906 Table X.2, Supported image unit formats, with equivalent format 907 layout qualifiers. 908 909 When a texture is bound to an image unit, the <format> parameter for the 910 image unit need not exactly match the texture internal format as long as 911 the formats are considered compatible. A pair of formats is considered 912 to match in size if the corresponding entries in the "size" column of 913 able X.3 are identical. A pair of formats is considered to match by 914 class if the corresponding entries in the "class" column of Table X.3 are 915 identical. For textures allocated by the GL, an image unit format is 916 compatible with a texture internal format if they match by size. For 917 textures allocated outside the GL, format compatibility is determined by 918 matching by size or by class, in an implementation dependent manner. The 919 matching criterion used for a given texture may be determined by calling 920 GetTexParameter with <value> set to IMAGE_FORMAT_COMPATIBILITY_TYPE, with 921 return values of IMAGE_FORMAT_COMPATIBILITY_BY_SIZE and 922 IMAGE_FORMAT_COMPATIBILITY_BY_CLASS, specifying matches by size and 923 class, respectively. 924 925 When the format associated with an image unit does not exactly match the 926 internal format of the texture bound to the image unit, image loads, 927 stores, and atomic operations re-interpret the memory holding the 928 components of an accessed texel according to the format of the image unit. 929 The re-interpretation for image loads and the read portion of image 930 atomics is performed as though data were copied from the texel of the 931 bound texture to a similar texel represented in the format of the image 932 unit. Similarly, the re-interpretation for image stores and the write 933 portion of image atomics is performed as though data were copied from a 934 texel represented in the format of the image unit to the texel in the 935 bound texture. In both cases, this copy operation would be performed by: 936 937 * reading the texel from the source format to scratch memory according 938 to the process described for GetTexImage (section 6.1.4), using 939 default pixel storage modes and <format> and <type> parameters 940 corresponding to the source format in Table X.3; and 941 942 * writing the texel from scratch memory to the destination format 943 according to the process described for TexSubImage3D (section 3.9.2), 944 using default pixel storage modes and <format> and <type> parameters 945 corresponding to the destination format in Table X.3. 946 947 [[compatibility profile only: No pixel transfer operations are performed 948 during this conversion.]] 949 950 Image Format Size Class Pixel Format/Type 951 -------------- ---- ----- ----------------------------------------- 952 RGBA32F 128 4x32 RGBA, FLOAT 953 RGBA16F 64 4x16 RGBA, HALF_FLOAT 954 RG32F 64 2x32 RG, FLOAT 955 RG16F 32 2x16 RG, HALF_FLOAT 956 R11F_G11F_B10F 32 (a) RGB, UNSIGNED_INT_10F_11F_11F_REV 957 R32F 32 1x32 RED, FLOAT 958 R16F 16 1x16 RED, HALF_FLOAT 959 960 RGBA32UI 128 4x32 RGBA_INTEGER, UNSIGNED_INT 961 RGBA16UI 64 4x16 RGBA_INTEGER, UNSIGNED_SHORT 962 RGB10_A2UI 32 (b) RGBA_INTEGER, UNSIGNED_INT_2_10_10_10_REV 963 RGBA8UI 32 4x8 RGBA_INTEGER, UNSIGNED_BYTE 964 RG32UI 64 2x32 RG_INTEGER, UNSIGNED_INT 965 RG16UI 32 2x16 RG_INTEGER, UNSIGNED_SHORT 966 RG8UI 16 2x8 RG_INTEGER, UNSIGNED_BYTE 967 R32UI 32 1x32 RED_INTEGER, UNSIGNED_INT 968 R16UI 16 1x16 RED_INTEGER, UNSIGNED_SHORT 969 R8UI 8 1x8 RED_INTEGER, UNSIGNED_BYTE 970 971 RGBA32I 128 4x32 RGBA_INTEGER, INT 972 RGBA16I 64 4x16 RGBA_INTEGER, SHORT 973 RGBA8I 32 4x8 RGBA_INTEGER, BYTE 974 RG32I 64 2x32 RG_INTEGER, INT 975 RG16I 32 2x16 RG_INTEGER, SHORT 976 RG8I 16 2x8 RG_INTEGER, BYTE 977 R32I 32 1x32 RED_INTEGER, INT 978 R16I 16 1x16 RED_INTEGER, SHORT 979 R8I 8 1x8 RED_INTEGER, BYTE 980 981 RGBA16 64 4x16 RGBA, UNSIGNED_SHORT 982 RGB10_A2 32 (b) RGBA, UNSIGNED_INT_2_10_10_10_REV 983 RGBA8 32 4x8 RGBA, UNSIGNED_BYTE 984 RG16 32 2x16 RG, UNSIGNED_SHORT 985 RG8 16 2x8 RG, UNSIGNED_BYTE 986 R16 16 1x16 RED, UNSIGNED_SHORT 987 R8 8 1x8 RED, UNSIGNED_BYTE 988 989 RGBA16_SNORM 64 4x16 RGBA, SHORT 990 RGBA8_SNORM 32 4x8 RGBA, BYTE 991 RG16_SNORM 32 2x16 RG, SHORT 992 RG8_SNORM 16 2x8 RG, BYTE 993 R16_SNORM 16 1x16 RED, SHORT 994 R8_SNORM 8 1x8 RED, BYTE 995 996 Table X.3, Texel sizes, compatibility classes, and pixel format/type 997 combinations for each image format. Class (a) is for 11/11/10 packed 998 floating-point formats; class (b) is for 10/10/10/2 packed formats. 999 1000 Implementations may support a limited combined number of image units and 1001 active fragment shader outputs (section 4.2.1). A link error will be 1002 generated if the number of active image uniforms used in all shaders and 1003 the number of active fragment shader outputs exceeds the implementation- 1004 dependent value (MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS). 1005 1006 1007 Modify Section 3.12.2, Shader Execution, p. 274 1008 1009 (add new unnumbered subsection section at the end of the section, p. 279) 1010 1011 Early Fragment Tests 1012 1013 An explicit control is provided to allow fragment shaders to enable early 1014 fragment tests. If the fragment shader specifies the 1015 "early_fragment_tests" layout qualifier, the per-fragment tests described 1016 in Section 3.X will be performed prior to fragment shader execution. 1017 Otherwise, they will be performed after fragment shader execution. 1018 1019 1020Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification 1021(Per-Fragment Operations and the Framebuffer) 1022 1023 None. 1024 1025 1026Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification 1027(Special Functions) 1028 1029 Modify Section 5.4.1, Commands Not Usable In Display Lists (p. 358) 1030 1031 (add "MemoryBarrier" to the list of commands not allowed in a display 1032 list, in the "Buffer objects" paragraph) 1033 1034 1035Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification 1036(State and State Requests) 1037 1038 Modify Section 6.1.3, Enumerated Queries (p. 369) 1039 1040 (modify 2nd pargraph, p. 370) ... <value> must be TEXTURE_RESIDENT, 1041 IMAGE_FORMAT_COMPATIBILITY_TYPE, or one of the symbolic values in table 1042 3.22. 1043 1044 1045New Implementation Dependent State 1046 1047 Minimum 1048 Get Value Type Get Command Value Description Sec. Attrib 1049 --------- ---- ----------- -------- ----------------------- ---- ------ 1050 MAX_IMAGE_UNITS Z+ GetIntegerv 8 number of units for 3.9.X - 1051 image load/store/atom 1052 MAX_COMBINED_IMAGE_UNITS_ Z+ GetIntegerv 8 limit on active image 3.9.X - 1053 AND_FRAGMENT_OUTPUTS units + fragment outputs 1054 MAX_IMAGE_SAMPLES Z GetIntegerv 0 max allowed samples 3.9.X - 1055 for a texture level 1056 bound to an image unit 1057 MAX_VERTEX_IMAGE_ Z+ GetIntegerv 0 number of image variables 2.14.7 1058 UNIFORMS in vertex shaders 1059 MAX_TESS_CONTROL_IMAGE_ Z+ GetIntegerv 0 number of image variables 2.14.7 1060 UNIFORMS in tess. control shaders 1061 MAX_TESS_EVALUATION_IMAGE_ Z+ GetIntegerv 0 number of image variables 2.14.7 1062 UNIFORMS in tess. eval. shaders 1063 MAX_GEOMETRY_IMAGE_ Z+ GetIntegerv 0 number of image variables 2.14.7 1064 UNIFORMS in geometry shaders 1065 MAX_FRAGMENT_IMAGE_ Z+ GetIntegerv 8 number of image variables 2.14.7 1066 UNIFORMS in fragment shaders 1067 MAX_COMBINED_IMAGE_ Z+ GetIntegerv 8 number of image variables 2.14.7 1068 UNIFORMS in all shaders 1069 1070New State 1071 1072 Add to Table 6.22, Textures (state per texture object), p. 414 1073 1074 Get Value Type Get Command Initial Value Description Sec Attribute 1075 --------------------- ----- ----------- ------------- ------------------------ ----- --------- 1076 IMAGE_FORMAT_ Z_2 GetTexParam- see 3.9.x compatibility rules for 3.9.X texture 1077 COMPATIBILITY_TYPE eteriv texture use with image 1078 units 1079 1080 Add a new Table 6.X, Image Stage (state per image unit) 1081 1082 Get Value Type Get Command Initial Value Description Sec Attribute 1083 --------------------- ---- ----------- ------------- ------------------------ ----- --------- 1084 IMAGE_BINDING_NAME 8*xZ+ GetIntegeri_v 0 name of bound texture 3.9.X none 1085 object 1086 IMAGE_BINDING_LEVEL 8*xZ+ GetIntegeri_v 0 level of bound texture 3.9.X none 1087 object 1088 IMAGE_BINDING_LAYERED 8*xB GetBooleani_v FALSE texture object bound w/ 3.9.X none 1089 multiple layers 1090 IMAGE_BINDING_LAYER 8*xZ+ GetIntegeri_v 0 layer of bound texture 3.9.X none 1091 object, if not layered 1092 IMAGE_BINDING_ACCESS 8*xZ3 GetIntegeri_v READ_ONLY read and/or write access 3.9.X none 1093 for bound texture 1094 IMAGE_BINDING_FORMAT 8*xZ+ GetIntegeri_v R8 format used for accesses 3.9.X none 1095 to bound texture 1096 1097 1098Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile) 1099Specification (Invariance) 1100 1101 Modify Section A.1, Repeatability (p. 454) 1102 1103 (add a new sentence to the end of the first paragraph, p. 454) ... For 1104 any given GL and framebuffer state vector .. whenever the command is 1105 executed on that initial GL and framebuffer state. This repeatability 1106 requirement doesn't apply when using shaders containing side effects 1107 (image stores, image atomic operations, atomic counter operations), 1108 because these memory operations are not guaranteed to be processed in a 1109 defined order. 1110 1111 Modify Section A.3, Invariance (p. 455) 1112 1113 (add new language to the end of the section, p. 457) 1114 1115 If a sequence of GL commands specifies primitives to be rendered with 1116 shaders containing side effects (image stores, image atomic operations, 1117 atomic counter operations), invariance rules are relaxed. In particular, 1118 Rule 1, Corollary 3, and Rule 4 do not apply in the presence of shader 1119 side effects. 1120 1121 The following weaker versions of Rule 1 and 4 apply to GL commands 1122 involving shader side effects: 1123 1124 Rule 6: For any given GL and framebuffer state vector, and for any 1125 given GL command, the contents of any framebuffer state not directly or 1126 indirectly affected by results of shader image stores, atomic 1127 operations, or atomic counter operations must be identical each time the 1128 command is executed on that initial GL and framebuffer state. 1129 1130 Rule 7: The same vertex or fragment shader will produce the same result 1131 when run multiple times with the same input as long as: 1132 1133 * shader invocations do not use image atomic operations or atomic 1134 counters; 1135 1136 * no framebuffer memory is written to more than once by image stores, 1137 unless all such stores write the same value; and 1138 1139 * no shader invocation, or other operation performed to process the 1140 sequence of commands, reads memory written to by an image store. 1141 1142 When any sequence of GL commands triggers shader invocations that perform 1143 image stores, atomic operations, or atomic counter operations, and 1144 subsequent GL commands read the memory written by those shader 1145 invocations, these operations must be explicitly synchronized. For more 1146 details, see Section 2.14.X, Shader Memory Access. 1147 1148 1149 Modify Section A.3, Invariance (p. 455) 1150 1151 1152 Add Section A.5, Shader Image Load, Store, and Atomic Invariance (p. 457) 1153 1154 1155 1156Additions to Appendix D of the OpenGL 3.2 (Compatibility Profile) 1157Specification (Invariance) 1158 1159 Modify Section D.3, Propagating State Changes, p. 467 1160 1161 (add to list of bullets at the end of the section, p. 467) 1162 1163 * Rendering commands that trigger shader invocations, where the shader 1164 performs image stores or atomic operations. 1165 1166 1167Additions to the AGL/GLX/WGL Specifications 1168 1169 None. 1170 1171 1172GLX Protocol 1173 1174 !!! TBD !!! 1175 1176 NOTE TO PROTOCOL CREATORS: Don't attempt to use the same protocol for 1177 BindImageTexture and BindImageTextureEXT (from 1178 EXT_shader_image_load_store). BindImageTexture throws an error on 1179 negative <level> and <layer> values; BindImageTextureEXT does not. 1180 1181 1182Modifications to the OpenGL Shading Language Specification, Version 1.50 1183 1184 Including the following line in a shader can be used to control the 1185 language features described in this extension: 1186 1187 #extension GL_ARB_shader_image_load_store : <behavior> 1188 1189 where <behavior> is as specified in section 3.3. 1190 1191 New preprocessor #defines are added to the OpenGL Shading Language: 1192 1193 #define GL_ARB_shader_image_load_store 1 1194 1195 1196 Modify Section 3.6, Keywords, p. 14 1197 1198 (add the following to the list of keywords, p. 14) 1199 1200 coherent 1201 volatile 1202 restrict 1203 readonly 1204 writeonly 1205 1206 image1D iimage1D uimage1D 1207 image2D iimage2D uimage2D 1208 image3D iimage3D uimage3D 1209 image2DRect iimage2DRect uimage2DRect 1210 imageCube iimageCube uimageCube 1211 imageBuffer iimageBuffer uimageBuffer 1212 image1DArray iimage1DArray uimage1DArray 1213 image2DArray iimage2DArray uimage2DArray 1214 imageCubeArray iimageCubeArray uimageCubeArray 1215 image2DMS iimage2DMS uimage2DMS 1216 image2DMSArray iimage2DMSArray uimage2DMSArray 1217 1218 (remove from the list of reserved keywords, p. 15) 1219 1220 volatile 1221 <all others above that are also reserved keywords> 1222 1223 Add all these types into the basic types table, in the opaque sections, 1224 along with their corresponding texture types. 1225 1226 1227 (Insert a new section immediately after Section 4.1.7, Samplers, p. 23) 1228 1229 Section 4.1.X, Images 1230 1231 Like samplers, images are opaque handles to one-, two-, or 1232 three-dimensional images corresponding to all or a portion of a single 1233 level of a texture image bound to an image unit. There are distinct 1234 image variable types for each texture target, and for each of float, 1235 integer, and unsigned integer data types. Image accesses should use 1236 an image type that matches the target of the texture whose level is 1237 bound to the image unit, or for non-layered bindings of 3D or array 1238 images should use the image type that matches the dimensionality of 1239 the layer of the image (i.e. a layer of 3D, 2DArray, Cube, or 1240 CubeArray should use image2D, a layer of 1DArray should use image1D, 1241 and a layer of 2DMSArray should use image2DMS). If the image target type 1242 does not match the bound image in this manner, if the data type does not 1243 match the bound image, or if the format layout qualifier does not match 1244 the image unit format as described in Section 3.9.X of the OpenGL 1245 Specification, the results of image accesses are undefined but cannot 1246 include program termination. 1247 1248 Image variables are used in the image load, store, and atomic functions 1249 described in Section 8.X, "Image Functions" to specify an image to access. 1250 They can only be declared as function parameters or uniform variables (see 1251 Section 4.3.5 "Uniform"). Except for array indexing, structure field 1252 selection, and parentheses, images are not allowed to be operands in 1253 expressions. Images may be aggregated into arrays within a shader (using 1254 square brackets [ ]) and can be indexed with general integer expressions. 1255 The results of accessing an image array with an out-of-bounds index are 1256 undefined. Images cannot be treated as l-values; hence, they cannot be 1257 used as out or inout function parameters, nor can they be assigned into. 1258 As uniforms, they are initialized only with the OpenGL API; they cannot be 1259 declared with an initializer in a shader. As function parameters, images 1260 may only be passed to samplers of matching type. 1261 1262 1263 Add Memory Qualifier Table to Section 4.3, Storage Qualifiers, p. 29 1264 1265 Only variables declared as image types (the basic opaque types with 1266 "image" in their keyword) can be qualified with a memory qualifier. 1267 1268 Variables declared as image types can qualified with one or more of the 1269 following memory qualifiers: 1270 1271 Qualifier Meaning 1272 ------------ ------------------------------------------------- 1273 coherent memory variable where reads and writes are coherent 1274 with reads and writes from other shader invocations 1275 1276 volatile memory variable whose underlying value may be 1277 changed at any point during shader execution by 1278 some source other than the current shader invocation 1279 1280 restrict memory variable where use of that variable is the 1281 only way to read and write the underlying memory 1282 in the relevant shader stage 1283 1284 readonly memory variable that can be used to read the 1285 underlying memory, but cannot be used to write the 1286 underlying memory 1287 1288 writeonly memory variable that can be used to write the 1289 underlying memory, but cannot be used to read the 1290 underlying memory 1291 1292 1293 Modify Section 4.3.2, Constant Qualifier (p. 30) 1294 1295 (add after last paragraph of section) 1296 1297 Because image variables can not be built from constant expressions, the 1298 "const" qualifier may not be used to create a compile-time constant image 1299 variable. 1300 1301 Modify Section 4.3.8.1 (Input Layout Qualifiers), p. 39 1302 1303 Remove "only" from the sentence: 1304 1305 Fragment shaders can have an input layout only for redeclaring the 1306 built-in variable gl_FragCoord... 1307 1308 Add to the end of the section: 1309 1310 Fragment shaders also allow the following layout qualifier on "in" only 1311 (not with variable declarations): 1312 1313 layout-qualifier-id 1314 early_fragment_tests 1315 1316 to request that fragment tests be performed before fragment shader 1317 execution, as described in Section 3.12.2 of the OpenGL Specification. 1318 For example, 1319 1320 layout(early_fragment_tests) in; 1321 1322 Specifying this will make per-fragment tests be performed before fragment 1323 shader execution. If this is not declared, per-fragment tests will be 1324 performed after fragment shader execution. 1325 1326 (Insert immediately after Section 4.3.8.3, Uniform Block Layout 1327 Qualifiers, p. 40) 1328 1329 Section 4.3.8.X, Image Layout Qualifiers 1330 1331 Format layout qualifiers can be used on image variable declarations (those 1332 declared with a basic type having 'image' in its keyword). The format 1333 layout qualifier identifiers for image variable declarations are 1334 1335 <layout-qualifier-id>: 1336 <float-image-format-qualifier> 1337 <int-image-format-qualifier> 1338 <uint-image-format-qualifier> 1339 1340 <float-image-format-qualifier>: 1341 rgba32f 1342 rgba16f 1343 rg32f 1344 rg16f 1345 r11f_g11f_b10f 1346 r32f 1347 r16f 1348 rgba16 1349 rgb10_a2 1350 rgba8 1351 rg16 1352 rg8 1353 r16 1354 r8 1355 rgba16_snorm 1356 rgba8_snorm 1357 rg16_snorm 1358 rg8_snorm 1359 r16_snorm 1360 r8_snorm 1361 1362 <int-image-format-qualifier>: 1363 rgba32i 1364 rgba16i 1365 rgba8i 1366 rg32i 1367 rg16i 1368 rg8i 1369 r32i 1370 r16i 1371 r8i 1372 1373 <uint-image-format-qualifier>: 1374 rgba32ui 1375 rgba16ui 1376 rgb10_a2ui 1377 rgba8ui 1378 rg32ui 1379 rg16ui 1380 rg8ui 1381 r32ui 1382 r16ui 1383 r8ui 1384 1385 A format layout qualifier specifies the image format associated with a 1386 declared image variable. Only one format qualifier may be specified for 1387 any image variable declaration. For image variables with floating-point 1388 component types (image*), signed integer component types (iimage*), or 1389 unsigned integer component types (uimage*), the format qualifier used must 1390 match the <float-image-format-qualifier>, <int-image-format-qualifier>, or 1391 <uint-image-format-qualifier> grammar rules, respectively. It is an error 1392 to declare an image variable where the format qualifier does not match the 1393 image variable type. 1394 1395 Any image variable used for image loads or atomic operations must specify 1396 a format layout qualifier; it is an error to pass an image uniform 1397 variable or function parameter declared without a format layout qualifier 1398 to an image load or atomic function. 1399 1400 Uniforms not qualified with "writeonly" must have a format layout qualifier. 1401 Note that an image variable passed to a function for read access cannot be 1402 declared as "writeonly" and hence must have been declared with a format 1403 layout qualifier. 1404 1405 (Insert immediately after Section 4.3.9, Interpolation, p. 42) 1406 1407 Section 6.1.1 Function Calling Conventions 1408 1409 Add "memory qualifier" as one of the qualifiers that can be used as a formal 1410 "parameter-qualifier". 1411 1412 Section 4.3.X, Memory Access Qualifiers 1413 1414 The "coherent", "volatile", "restrict", and "const" storage qualifiers can 1415 be specified in image variable declarations to control memory accesses 1416 using the declared variables. 1417 1418 Memory accesses to image variables declared using the "coherent" storage 1419 qualifier are performed coherently with similar accesses from other shader 1420 invocations. In particular, when reading a variable declared as 1421 "coherent", the values returned will reflect the results of previously 1422 completed writes performed by other shader invocations. When writing a 1423 variable declared as "coherent", the values written will be reflected in 1424 subsequent coherent reads performed by other shader invocations. As 1425 described in the Section 2.20.X of the OpenGL Specification, shader memory 1426 reads and writes complete in a largely undefined order. The built-in 1427 function memoryBarrier() can be used if needed to guarantee the completion 1428 and relative ordering of memory accesses performed by a single shader 1429 invocation. 1430 1431 When accessing memory using variables not declared as "coherent", the 1432 memory accessed by a shader may be cached by the implementation to service 1433 future accesses to the same address. Memory stores may be cached in such 1434 a way that the values written may not be visible to other shader 1435 invocations accessing the same memory. The implementation may cache the 1436 values fetched by memory reads and return the same values to any shader 1437 invocation accessing the same memory, even if the underlying memory has 1438 been modified since the first memory read. While variables not declared 1439 as "coherent" may not be useful for communicating between shader 1440 invocations, using non-coherent accesses may result in higher performance. 1441 1442 Memory accesses to image variables declared using the "volatile" storage 1443 qualifier must treat the underlying memory as though it could be read or 1444 written at any point during shader execution by some source other than the 1445 executing shader invocation. When a volatile variable is read, its value 1446 must be re-fetched from the underlying memory, even if the shader 1447 invocation performing the read had previously fetched its value from the 1448 same memory. When a volatile variable is written, its value must be 1449 written to the underlying memory, even if the compiler can conclusively 1450 determine that its value will be overwritten by a subsequent write. Since 1451 the external source reading or writing a "volatile" variable may be 1452 another shader invocation, variables declared as "volatile" are 1453 automatically treated as coherent. 1454 1455 Memory accesses to image variables declared using the "restrict" storage 1456 qualifier may be compiled assuming that the variable used to perform the 1457 memory access is the only way to access the underlying memory using the 1458 shader stage in question. This allows the compiler to coalesce or reorder 1459 loads and stores using "restrict"-qualified image variables in ways that 1460 wouldn't be permitted for image variables not so qualified, because the 1461 compiler can assume that the underlying image won't be read or written by 1462 other code. Applications are responsible for ensuring that image memory 1463 referenced by variables qualified with "restrict" will not be referenced 1464 using other variables in the same scope; otherwise, accesses to 1465 "restrict"-qualified variables will have undefined results. 1466 1467 Memory accesses to image variables declared using the "readonly" qualifier 1468 may only read the underlying memory, which is treated as read-only memory 1469 and cannot be written to. It is an error to pass an image variable qualified 1470 with "readonly" to imageStore() or other built-in functions that modify 1471 image memory. 1472 1473 Memory accesses to image variables declared using the "writeonly" qualifier 1474 may only write the underlying memory; the underlying memory cannot be read. 1475 It is an error to pass an image variable qualified with "writeonly" to 1476 imageLoad() or other built-in functions that read image memory. 1477 1478 The values of image variables qualified with "coherent", "volatile", 1479 "restrict", "readonly", or "writeonly" may not be passed to functions 1480 whose formal parameters lack such qualifiers. (See section 6.1 'Function 1481 Definitions' for more detail on function calling.) It is legal to have 1482 additional qualifiers on a formal parameter, but not to have fewer. 1483 1484 vec4 funcA(layout(rgba32f) image2D restrict a) { ... } 1485 vec4 funcB(layout(rgba32f) image2D a) { ... } 1486 layout(rgba32f) uniform image2D img1; 1487 layout(rgba32f) coherent uniform image2D img2; 1488 1489 funcA(img1); // OK, adding "restrict" is allowed 1490 funcB(img2); // illegal, stripping "coherent" is not 1491 1492 Layout qualifiers cannot be used on formal function parameters, but they are 1493 not included in parameter matching. 1494 1495 Note that the use of "const" in an image variable declaration is qualifying 1496 the const-ness of variable being declared, not the image it refers to: The 1497 qualifier "readonly" qualifies the image memory (as accessed through that 1498 variable) while "const" qualifiers the variable itself. 1499 1500 Modify Section 7.4, Built-In Constants, p. 74 1501 1502 (Add the following new constants.) 1503 1504 const int gl_MaxImageUnits = 8; 1505 const int gl_MaxCombinedImageUnitsAndFragmentOutputs = 8; 1506 const int gl_MaxImageSamples = 0; 1507 const int gl_MaxVertexImageUniforms = 0; 1508 const int gl_MaxTessControlImageUniforms = 0; 1509 const int gl_MaxTessEvaluationImageUniforms = 0; 1510 const int gl_MaxGeometryImageUniforms = 0; 1511 const int gl_MaxFragmentImageUniforms = 8; 1512 const int gl_MaxCombinedImageUniforms = 8; 1513 1514 1515 (Insert a new numbered section at the end of Chapter 8, Built-in 1516 Functions, p. 69) 1517 1518 Section 8.X, Image Functions 1519 1520 Variables using one of the image data types may be used in the built-in 1521 shader image memory functions defined in this section to read and write 1522 individual texels of a texture. Each image variable references an image 1523 unit, which has a texture image attached. 1524 1525 When image memory functions access memory, an individual texel in the 1526 image is identified using an i, (i,j), or (i,j,k) coordinate corresponding 1527 to the values of <coord>. For image2DMS and image2DMSArray variables (and 1528 the corresponding int/unsigned int types) corresponding to multisample 1529 textures, each texel may have multiple samples and an individual sample is 1530 identified using the integer <sample> parameter. The coordinates and 1531 sample number are used to select an individual texel in the manner 1532 described in Section 3.9.X of the OpenGL specification. 1533 1534 Loads and stores support float, integer, and unsigned integer types. The 1535 data types "gimage*" serve as placeholders meaning either "image*", 1536 "iimage*", or "uimage*" in the same way as "gvec" or "gsampler". 1537 1538 The "IMAGE_INFO" in the prototypes below is a placeholder representing 1539 33 separate functions, each for a different type of image variable. The 1540 "IMAGE_INFO" placeholder is replaced by one of the following parameter 1541 lists: 1542 1543 gimage1D image, int coord 1544 gimage2D image, ivec2 coord 1545 gimage3D image, ivec3 coord 1546 gimage2DRect image, ivec2 coord 1547 gimageCube image, ivec3 coord 1548 gimageBuffer image, int coord 1549 gimage1DArray image, ivec2 coord 1550 gimage2DArray image, ivec3 coord 1551 gimageCubeArray image, ivec3 coord 1552 gimage2DMS image, ivec2 coord, int sample 1553 gimage2DMSArray image, ivec3 coord, int sample 1554 1555 (Note that each of the "gimage*" lines represents one of three different 1556 image variable types.) 1557 1558 Syntax: 1559 1560 gvec4 imageLoad(readonly IMAGE_INFO); 1561 1562 Description: 1563 1564 Loads the texel at the coordinate <coord> from the image unit specified 1565 by <image>. For multisample loads, the sample number is given by 1566 <sample>. When <image>, <coord>, and <sample> identify a valid texel, 1567 the bits used to represent the selected texel in memory are converted to 1568 a vec4, ivec4, or uvec4 in the manner described in Section 3.9.X of the 1569 OpenGL Specification and returned. 1570 1571 1572 Syntax: 1573 1574 void imageStore(writeonly IMAGE_INFO, gvec4 data); 1575 1576 Description: 1577 1578 Stores the value of <data> into the texel at the coordinate <coord> from 1579 the image specified by <image>. For multisample stores, the sample number 1580 is given by <sample>. When <image>, <coord>, and <sample> identify a 1581 valid texel, the bits used to represent <data> are converted to the format 1582 of the image unit in the manner described in Section 3.9.X of the OpenGL 1583 Specification and stored to the specified texel. 1584 1585 1586 Syntax: 1587 1588 uint imageAtomicAdd(IMAGE_INFO, uint data); 1589 int imageAtomicAdd(IMAGE_INFO, int data); 1590 1591 uint imageAtomicMin(IMAGE_INFO, uint data); 1592 int imageAtomicMin(IMAGE_INFO, int data); 1593 1594 uint imageAtomicMax(IMAGE_INFO, uint data); 1595 int imageAtomicMax(IMAGE_INFO, int data); 1596 1597 uint imageAtomicAnd(IMAGE_INFO, uint data); 1598 int imageAtomicAnd(IMAGE_INFO, int data); 1599 1600 uint imageAtomicOr(IMAGE_INFO, uint data); 1601 int imageAtomicOr(IMAGE_INFO, int data); 1602 1603 uint imageAtomicXor(IMAGE_INFO, uint data); 1604 int imageAtomicXor(IMAGE_INFO, int data); 1605 1606 uint imageAtomicExchange(IMAGE_INFO, uint data); 1607 int imageAtomicExchange(IMAGE_INFO, int data); 1608 1609 uint imageAtomicCompSwap(IMAGE_INFO, uint compare, uint data); 1610 int imageAtomicCompSwap(IMAGE_INFO, int compare, int data); 1611 1612 Description: 1613 1614 These functions perform atomic operations on individual texels or samples 1615 of an image variable. Atomic memory operations read a value from the 1616 selected texel, compute a new value using one of the operations described 1617 below, write the new value to the selected texel, and return the 1618 original value read. The contents of the texel being updated by the 1619 atomic operation are guaranteed not to be updated by any other image store 1620 or atomic function between the time the original value is read and the 1621 time the new value is written. 1622 1623 As with image load and store functions, <image>, <coord>, and <sample> 1624 specify the individual texel to operate on. The method for 1625 identifying the individual texel operated on from <image>, <coord>, and 1626 <sample>, and the method for reading and writing the texel are specified 1627 in Section 3.9.X of the OpenGL specification. Atomic memory operations 1628 are supported on only a subset of all image variable types; <image> must 1629 be either: 1630 1631 * an image variable with signed integer components (iimage*) and a 1632 format qualifier of "r32i", or 1633 1634 * an image variable with unsigned integer components (uimage*) and a 1635 format qualifier of "r32ui". 1636 1637 imageAtomicAdd() computes a new value by adding the value of <data> to the 1638 contents of the selected texel. These functions support 32-bit unsigned 1639 integer operands and 32-bit signed integer operands. 1640 1641 imageAtomicMin() computes a new value by taking the minimum of the value 1642 of <data> and the contents of the selected texel. These functions support 1643 32-bit signed and unsigned integer operands. 1644 1645 imageAtomicMax() computes a new value by taking the maximum of the value 1646 of <data> and the contents of the selected texel. These functions support 1647 32-bit signed and unsigned integer operands. 1648 1649 imageAtomicAnd() computes a new value by performing a bitwise and of the 1650 value of <data> and the contents of the selected texel. These functions 1651 support 32-bit signed and unsigned integer operands. 1652 1653 imageAtomicOr() computes a new value by performing a bitwise or of the 1654 value of <data> and the contents of the selected texel. These functions 1655 support 32-bit signed and unsigned integer operands. 1656 1657 imageAtomicXor() computes a new value by performing a bitwise exclusive or 1658 of the value of <data> and the contents of the selected texel. These 1659 functions support 32-bit signed and unsigned integer operands. 1660 1661 imageAtomicExchange() computes a new value by simply copying the value of 1662 <data>. These functions support 32-bit signed and unsigned integer 1663 operands. 1664 1665 imageAtomicCompSwap() compares the value of <compare> and the contents of 1666 the selected texel. If the values are equal, the new value is given by 1667 <data>; otherwise, it is taken from the original value loaded from the 1668 texel. These functions support 32-bit signed and unsigned integer 1669 operands. 1670 1671 1672 (Insert another new numbered section at the end of Chapter 8, Built-in 1673 Functions, p. 69) 1674 1675 Section 8.Y, Shader Memory Control Functions 1676 1677 Shaders of all types may read and write the contents of textures and 1678 buffer objects using image variables. While the order of reads and writes 1679 visible to a single shader invocation is well-defined, the relative order 1680 of reads and writes to a single shared memory address from multiple 1681 separate shader invocations is largely undefined. Additionally, the order 1682 of accesses to multiple memory addresses performed by a single shader 1683 invocation, as observed by other shader invocations, is also undefined. 1684 1685 Syntax: 1686 1687 void memoryBarrier(void); 1688 1689 Description: 1690 1691 memoryBarrier() can be used to control the ordering of memory transactions 1692 issued by a single shader invocation. When called, memoryBarrier() will 1693 wait on the completion of all memory accesses resulting from the use of 1694 image variables or atomic counters and then return to the caller with no 1695 other effect. When this function returns, the results of any memory 1696 stores performed using coherent variables performed prior to the call will 1697 be visible to any future coherent memory access to the same addresses from 1698 other shader invocations. In particular, the values written this way in 1699 one shader stage are guaranteed to be visible to coherent memory accesses 1700 performed by shader invocations in subsequent stages when those 1701 invocations were triggered by the execution of the original shader 1702 invocation (e.g., fragment shader invocations for a primitive resulting 1703 from a particular geometry shader invocation). 1704 1705 1706 Modify Section 9, Shading Language Grammar (p. 105) 1707 1708 !!! TBD: Add grammar constructs for memory access qualifiers. 1709 1710 1711Errors 1712 1713 INVALID_VALUE is generated by Uniform1i{v} if the location refers to an 1714 image variable and the value specified is less than zero or greater than 1715 or equal to the value of MAX_IMAGE_UNITS. 1716 1717 INVALID_OPERATION is generated by Uniform* functions other than 1718 Uniform1i{v} if the location refers to an image variable. 1719 1720 INVALID_VALUE is generated by BindImageTexture if <unit> is greater 1721 than or equal to the value of MAX_IMAGE_UNITS. 1722 1723 INVALID_VALUE is generated by BindImageTexture if <texture> is not the 1724 name of an existing texture object. 1725 1726 INVALID_VALUE is generated by BindImageTexture if <format> is not a 1727 legal format. 1728 1729 1730Dependencies on OpenGL 3.2 (Core Profile) 1731 1732 If only the core profile of OpenGL 3.2 is supported, references to buffer 1733 objects for conventional vertex attributes and to the Begin and RasterPos 1734 commands should be removed. 1735 1736Dependencies on OpenGL 3.1, ARB_uniform_buffer_object, and 1737EXT_bindable_uniform 1738 1739 If OpenGL 3.1, ARB_uniform_buffer_object, and EXT_bindable_uniform are not 1740 supported, references to UNIFORM_BARRIER_BIT should be removed. 1741 1742Dependencies on ARB_draw_indirect 1743 1744 If ARB_draw_indirect is not supported, references to COMMAND_BARRIER_BIT 1745 should be removed. 1746 1747Dependencies on NV_vertex_buffer_unified_memory 1748 1749 If NV_vertex_buffer_unified_memory is not supported, references to that 1750 extension and GPU addresses in the discussion of 1751 VERTEX_ATTRIB_ARRAY_BARRIER_BIT and ELEMENT_ARRAY_BARRIER_BIT should 1752 be removed. 1753 1754Dependencies on NV_parameter_buffer_object 1755 1756 If NV_parameter_buffer_object is not supported, references to 1757 ProgramBufferParametersNV in the discussion of BUFFER_UPDATE_PARAMETER_BIT 1758 should be removed. 1759 1760Dependencies on OpenGL 3.2 and ARB_texture_multisample 1761 1762 If OpenGL 3.2 and ARB_texture_multisample are not supported, references to 1763 multisample textures should be removed. 1764 1765Dependencies on OpenGL 4.0 and ARB_sample_shading 1766 1767 If OpenGL 4.0 or ARB_sample_shading is supported, the discussion of the 1768 number of shader invocations for a given fragment in the "Shader Memory 1769 Access" section of the specification should be updated to discuss the 1770 sample shading enable and the minimum sample shading factor provided in 1771 that extension. 1772 1773Dependencies on OpenGL 4.0 and ARB_texture_cube_map_array 1774 1775 If OpenGL 4.0 or ARB_texture_cube_map_array are not supported, references 1776 to cube map array textures should be removed. 1777 1778Dependencies on OpenGL 3.3 and ARB_texture_rgb10_a2ui 1779 1780 If OpenGL 3.3 or ARB_texture_rgb10_a2ui are not supported, references to 1781 the RGB10_A2UI texture format should be removed. 1782 1783Dependencies on NV_shader_buffer_load 1784 1785 If NV_shader_buffer_load is supported, the new section 2.14.X (Shader 1786 Memory Access) should be combined with "Section 2.20.X, Shader Memory 1787 Access" from NV_shader_buffer_load. 1788 1789Dependencies on OpenGL 4.0, ARB_gpu_shader5, and NV_gpu_shader5 1790 1791 If OpenGL 4.0, ARB_gpu_shader5, and NV_gpu_shader5 are not supported, the 1792 modifications to the OpenGL Shading Language Specification should be 1793 removed. 1794 1795Dependencies on OpenGL 4.0 and ARB_tessellation_shader 1796 1797 If OpenGL 4.0 and ARB_tessellation_shader are not supported, references to 1798 tessellation control and evaluation shaders should be removed. 1799 1800Dependencies on EXT_shader_atomic_counters and ARB_shader_atomic_counters 1801 1802 If EXT_shader_atomic_counters is not supported, remove references to 1803 atomic counters and ATOMIC_COUNTER_BARRIER_BIT. 1804 1805Dependencies on EXT_depth_bounds_test 1806 1807 If EXT_depth_bounds_test is not supported, references to the depth bounds 1808 test should be removed. 1809 1810Dependencies on ARB_separate_shader_objects 1811 1812 If ARB_separate_shader_objects is supported, early depth tests are enabled 1813 if and only if (a) there is an active program for the fragment shader 1814 stage and (b) the fragment shader in that program enables early depth 1815 tests using a layout qualifier. 1816 1817Dependencies on EXT_shader_image_load_store 1818 1819 Both this extension and EXT_shader_image_load_store provide nearly the 1820 identical functionality. 1821 1822 If both extensions are enabled in the shading language, the "size*" layout 1823 qualifiers are treated as format qualifiers, and are mapped to equivalent 1824 format qualifiers in the table below, according to the type of image 1825 variable. Additionally, if both extensions are enabled in the shading 1826 language, size/format layout qualifiers need not be specified for image 1827 variables used exclusively for stores. 1828 1829 image* iimage* uimage* 1830 -------- -------- -------- 1831 size1x8 n/a r8i r8ui 1832 size1x16 r16f r16i r16ui 1833 size1x32 r32f r32i r32ui 1834 size2x32 rg32f rg32i rg32ui 1835 size4x32 rgba32f rgba32i rgba32ui 1836 1837Issues 1838 1839 (0) How does this extension differ from the similar 1840 EXT_shader_image_load_store? 1841 1842 RESOLVED: The functionality provided by this extension is very similar 1843 to that provided by EXT_shader_image_load_stores. There are some 1844 functional differences. 1845 1846 * "size" layout qualifiers replaced with "format" qualifiers. 1847 1848 * Image loads aren't restricted to "1x8", "1x16", "1x32", "2x32", and 1849 "4x32" formats. Instead, each supported image format has a layout 1850 qualifier, and values loaded from images are converted to an 1851 vec4/ivec4/uvec4 representation appropriate for the image format. 1852 1853 * For textures not allocated by the GL (e.g., images shared from other 1854 external APIs), implementations need not support image unit formats 1855 that don't match the texture format, unless they are in the same 1856 "class", which is generally the case only if component counts and 1857 sizes are exactly the same. 1858 1859 * Image variables used exclusively for image stores need not declare a 1860 format qualifier. 1861 1862 * Added the built-in GLSL constants "gl_MaxImageUnits", 1863 "gl_MaxCombinedImageUnitsAndFragmentOutputs", and 1864 "gl_MaxImageSamples". 1865 1866 * BindImageTexture throws INVALID_VALUE if <level> or <layer> is 1867 negative. 1868 1869 * The <format> parameter of BindImageTexture was changed from an "int" 1870 to an "enum". In the EXT, <format> copied TexImage*'s 1871 <internalformat> parameter, which is an "int" because that's how it 1872 was defined in OpenGL 1.0 (where the parameter was called 1873 <components> and the now-deprecated "1", "2", "3", and "4" formats 1874 were the only ones supported). 1875 1876 * Added implemenentation-dependent limits on the number of active 1877 image uniforms (MAX_*_IMAGE_UNIFORMS) for each stage, and combined 1878 across all stages. Also added corresponding GLSL constants 1879 "gl_Max*ImageUniforms". 1880 1881 * The atomicIncWrap() and atomicDecWrap() built-in functions present 1882 in the EXT have been removed. 1883 1884 * The <index> parameter of BindImageTextureEXT has been renamed to 1885 <unit> for BindImageTexture. 1886 1887 (1) How are the format and type of the load/store determined? 1888 1889 RESOLVED: There is a natural desire to load and store using a 1890 canonical 4-vector in the shader with hardware converting to/from a 1891 format compatible with the bound image, to be consistent with how 1892 texture loads and fragment shader outputs currently behave. There is 1893 also good reason to allow some flexibility in the format used for image 1894 accesses being different from the internal format of the texture level. 1895 We allow format conversions to and from any format that image units 1896 support. We make the format be selected when the image is bound to an 1897 image unit, and define which image unit formats can be used for which 1898 texture level internal formats. For example, it is legal to access an 1899 image whose internal format is RGBA8 with an image unit format of 1900 R32UI. 1901 1902 (2) What set of texture formats should be supported for image loads and 1903 stores? 1904 1905 RESOLVED: We allow textures to be bound to image units using only a 1906 subset of supported formats, to limit the amount of hardware support 1907 required for image operations. Any texture formats not explicitly 1908 enumerated in this extension may not be bound to an image unit, although 1909 future extensions may add new formats to the set of supported formats. 1910 1911 In particular, this extension supports one-, two-, and four-component 1912 textures with 8-, 16-, and 32-bit components, including floating-point, 1913 signed integer, unsigned integer, as well as signed and unsigned 1914 normalized formats. Additionally, a small number of other formats are 1915 supported, including the 11/11/10 RGB format from EXT_packed_float and 1916 10/10/10/2 unsigned normalized RGBA. 1917 1918 (3) Should we general support image loads and stores for three-component 1919 "RGB" formats? 1920 1921 RESOLVED: Not in this extension. If an application needs to perform 1922 image loads and stores on a three-component texture, it could use an 1923 equivalent RGBA format and ignore the alpha component. The 1924 EXT_texture_swizzle extension could be used to make the values returned 1925 by texture appear identical to an RGB texture, if required. 1926 1927 (4) Should textures be unbound from image units when they are deleted? 1928 1929 RESOLVED: Yes, this matches behavior of existing bind points. 1930 1931 (5) Should we support image loads and stores for the deprecated LUMINANCE, 1932 LUMINANCE_ALPHA, and ALPHA formats? 1933 1934 RESOLVED: No, only support the RGBA-style formats. EXT_texture_swizzle 1935 can be used to mimic luminance and alpha if required. 1936 1937 (6) Should we support 64-bit atomics on images? Should we support atomics 1938 at all on formats with 8-, 16-, 64-, or 128-bit texels? 1939 1940 RESOLVED: No, we will only support 32-bit atomic operations on images. 1941 1942 (7) How do shader image loads and stores interact with texture 1943 completeness? What happens if you bind a texture with inconsistent 1944 mipmaps? 1945 1946 RESOLVED: The image unit is treated as if nothing were bound, where 1947 all accesses are treated as invalid. 1948 1949 (8) What happens if the value passed to Uniform1i to specify the image 1950 unit corresponding to a image variable refers to a non-existent image 1951 unit (i.e., is negative or greater than or equal to the number of 1952 image units supported)? 1953 1954 RESOLVED: Values referring to invalid image units will be rejected and 1955 produce an INVALID_VALUE error. 1956 1957 (9) Should we provide counting rules for image variable use in different 1958 shaders like we have for samplers? In particular, there are limits 1959 on the amount of state, the number of active samplers in each shader 1960 stage, and the sum of the active sampler counts in each stage. 1961 1962 RESOLVED: Yes, we provide a similar set of limits. MAX_IMAGE_UNITS 1963 specifies the number of image bindings. MAX_{VERTEX,...}_IMAGE_UNIFORMS 1964 specifies the maximum number of active image uniforms in each shader 1965 stage. MAX_COMBINED_IMAGE_UNIFORMS specifies a limit on the sum of the 1966 number of active image uniforms in all stages of a program. 1967 1968 (10) Can this extension be used to load and store values into a buffer 1969 object? Into a renderbuffer? 1970 1971 RESOLVED: Yes, indirectly. The BUFFER_TEXTURE target provided by 1972 OpenGL 3.0 and the EXT_texture_buffer_object extension allows an 1973 application to create a one-dimensional buffer texture using the data 1974 store of a buffer object. This buffer texture may be bound to an image 1975 unit and accessed with an imageBuffer variable in the Shading Language. 1976 1977 This extension adds support for image accesses to multisample textures, 1978 but not renderbuffers. Note that with the ARB_texture_multisample 1979 extension, there is no longer a good reason to use renderbuffers. 1980 Existing 2D or rectangle targets already provided a superset of single- 1981 sample renderbuffer functionality; the new ARB extension provides a 1982 superset of multisample renderbuffer functionality. 1983 1984 (11) What amount of automatic synchronization is provided for image loads 1985 and stores? In particular, is the use of MemoryBarrier() required 1986 to ensure consistent ordering relative to other GL operations? Or is 1987 some other mechanism (e.g., unbinding a texture from an image unit 1988 and then binding it to a texture image unit) sufficient? 1989 1990 RESOLVED: Use of MemoryBarrier is required, and there is no 1991 automatic synchronization when images are bound or unbound. 1992 1993 Implicit synchronization is difficult, as it might require some 1994 combination of: 1995 1996 - tracking which images might be written (randomly) in the shader 1997 itself; 1998 1999 - assuming that if a shader that performs writes is executed, all 2000 texels of all bound images could be modified and thus must be 2001 treated as dirty; 2002 2003 - idling at the end of each primitive or draw call, so that the 2004 results of all previous commands are complete. 2005 2006 Since normal OpenGL operation is pipelined, idling would result in a 2007 significant performance impact since pipelining would otherwise allow 2008 fragment shader execution for draw call N while simultaneously 2009 performing vertex shader execution for draw call N+1. 2010 2011 (12) Should image loads and stores be allowed for all shader types? 2012 2013 RESOLVED: Yes, it seems useful. 2014 2015 Note that some shader types pose specific implementation complexities 2016 (e.g., reuse of vertices in vertex shaders, number of fragment shader 2017 invocations in multisample modes, relative order of execution within and 2018 between shader groups). We have explicitly specify several cases where 2019 the invocation count and execution order are undefined. While these 2020 cases may be a problem for some algorithms, we expect that many 2021 algorithms will not be adversely impacted. 2022 2023 (13) Should an implementation be required to throw INVALID_OPERATION 2024 errors if the dimension of the texture coordinates implied by the 2025 image variable type doesn't match the structure of the texture 2026 level/layer bound to the corresponding image unit? If not, what 2027 happens in such a mismatch? 2028 2029 RESOLVED: No. The results of image accesses are undefined. 2030 2031 (14) Should shader image variable types include a "format" implying the 2032 data type accepted/returned by shader image loads and stores? For 2033 example, an image variable corresponding to a 2D texture with format 2034 of RGBA32F might have a type "image2Dvec4", with the "vec4" 2035 indicating that the image data lines up with a four-component 2036 floating-point vector. 2037 2038 RESOLVED: No. Separate types are provided for float vs. int vs. 2039 unsigned int, but not for each image format. However, format qualifiers 2040 associated with image variables can (and in many cases must) be used to 2041 associate a format with an image variable. 2042 2043 (15) If shader image variable types include information on the texel 2044 components returned or written by shader image accesses, should an 2045 implementation be required to enforce errors if the variable type is 2046 incompatible with the format of the referenced texture? If not, or 2047 if the image variable type doesn't include format information, what 2048 happens in case of a mismatch between the texture format and the 2049 shader access format? 2050 2051 RESOLVED: We aren't including types in the variable that correspond 2052 to the image format, so an error check in the driver is not possible. 2053 2054 If an individual load, store, or atomic uses a data type incompatible 2055 with the texture bound to the image unit, loads will return and stores 2056 will write undefined values. 2057 2058 (16) Is it possible to bind the "default texture" (numbered zero) for a 2059 given texture target to an image unit? 2060 2061 RESOLVED: No. Passing zero to BindImageTexture unbinds and texture 2062 currently bound to the selected image unit. If this ability were 2063 provided, it would also be necessary to provide some mechanism to 2064 specify a texture target because there is a separate default "zero" 2065 texture for each target. 2066 2067 Note that existing framebuffer objects have a similar behavior; default 2068 textures can't be attached to an FBO. 2069 2070 (17) May bordered textures be used with image loads and stores? 2071 2072 RESOLVED: No. 2073 2074 (18) Should we have defined behavior if invalid coordinates are passed to 2075 an image load, store, or atomic operation? If so, what happens? 2076 2077 RESOLVED: Yes. We define the behavior to return zeroes on a load and 2078 atomic and to have no effect on any bound texture on stores and 2079 atomics. 2080 2081 (19) Should we have a limit on the total number of combined image units 2082 and draw buffers, and if so, what should that be? 2083 2084 RESOLVED: Yes, some hardware requires this. The program will fail to 2085 link. 2086 2087 (20) What happens if a shader specifies an image store or atomic operation 2088 for killed/discarded pixels? 2089 2090 RESOLVED: For GLSL shaders that execute a "discard" instruction, any 2091 image stores or atomics performed before executing the discard will 2092 behave normally. When the "discard" instruction is executed, the shader 2093 invocation will be terminated and will perform no further image store or 2094 atomic operations. 2095 2096 (21) When enabling early depth tests in a program, what happens if a 2097 fragment fails one of the tests (e.g., depth test)? 2098 2099 RESOLVED: The specification indicates that the fragment shader is not 2100 executed. Implementations might still end up running fragment shader 2101 for implementation-dependent reasons. For example, the fragment shader 2102 may be run in order to approximate derivatives for neighboring pixels 2103 that did pass all per-fragment tests. In these cases, implementations 2104 must guarantee that image stores have no effect. 2105 2106 (22) If implementations run fragment shaders for fragments that aren't 2107 covered by the primitive or fail early depth tests (e.g., "helper 2108 pixels"), how does that interact with stores and atomics? 2109 2110 RESOLVED: The current OpenGL specification has no formal notion of 2111 "helper" pixels. In practice, implementations may run fragment shaders 2112 for pixels near the boundaries of rasterized primitives to allow 2113 derivatives to be approximated by differencing. Typically, these shader 2114 invocations have no effect. While they may produce outputs, the outputs 2115 for these pixels will be discarded without affecting the framebuffer. 2116 The spec basically treats these shader invocations as though they don't 2117 exist. 2118 2119 If such a shader invocation performs store or atomic operations, we need 2120 to define what happens. In our definition, stores will have no effect, 2121 atomics will not update memory, and the values returned by atomics will 2122 be undefined. The fact that these invocations don't affect memory is 2123 consistent with the notion of helper pixel shader invocations not 2124 existing. 2125 2126 However, it is possible to write a fragment shader where flow control 2127 depends on the (undefined) values returned by the atomic. In this case, 2128 the undefined values returned for helper pixels could result in very 2129 long execution time (appearing to be hang) or an infinite loop. To 2130 avoid hangs in such cases, it is possible to use the fragment shader 2131 input sample mask to identify helper pixels: 2132 2133 // If the input sample mask is non-zero, at least one sample is 2134 // covered and the invocation should be treated as a real invocation. 2135 // If the sample mask is zero, nothing is covered and this should be 2136 // treated as a helper pixel. If more than 32 samples are supported, 2137 // additional words of gl_SampleMaskIn would need to be checked. 2138 if (gl_SampleMaskIn[0] != 0) { 2139 // "real" pixel, perform atomic operations 2140 } else { 2141 // "helper" pixel, skip atomics 2142 } 2143 2144 It may be desirable to formalize the notion of helper pixels in a future 2145 addition to the shading language. 2146 2147 (23) What API should we use to specify early depth tests? 2148 2149 RESOLVED: Use a layout qualifier in a fragment shader rather than 2150 having a separate program parameter or other piece of GL state. 2151 2152 (24) For formatted loads where the format doesn't include some component, 2153 what values are filled in? (0,0,0,1)? (0,0,0,0)? 2154 2155 RESOLVED: Prefer (0,0,0,1) to match other APIs. 2156 2157 (25) How does the combined-image-and-fragment-output limit interact with 2158 separate shader objects? For example, an application may want to 2159 share a single image unit between two shader stages and not have it 2160 count twice against the limit. 2161 2162 RESOLVED: The known implementations of this extension do not have this 2163 issue, so we chose not to include any spec language. Perhaps a 2164 Begin-time error could be specified in the future if this limit is 2165 exceeded. 2166 2167 (26) What sort of qualifiers should we provide relevant to memory 2168 referenced by image variables? 2169 2170 RESOLVED: We will support the qualifiers "coherent", "volatile", 2171 "restrict", and "const" to be used in image variable declarations. 2172 2173 "coherent" is used to ensure that memory accesses from different shader 2174 invocations are cached coherently (i.e., one invocation will be able to 2175 observe writes from another when the other invocation's writes 2176 complete). This coherence may mean the use of "coherent"-qualified 2177 image variables may perform more slowly than of otherwise equivalent 2178 unqualified variables. 2179 2180 "volatile" behaves as in C, and may be needed if an algorithm requires 2181 reading image memory that may be written asynchronously by other shader 2182 invocations. 2183 2184 "restrict" behaves as in the C99 standard, and can be used to indicate 2185 that no other image variable points to the same underlying data. This 2186 permits optimizations that would otherwise be impossible if the compiler 2187 has to assume that a pair of images might end up pointing to the same 2188 data. For example, in standard C/C++, a loop like: 2189 2190 int *a, *b; 2191 a[0] = b[0] + b[0]; 2192 a[1] = b[0] + b[1]; 2193 a[2] = b[0] + b[2]; 2194 2195 would need to reload b[0] for each assignment because a[0] or a[1] might 2196 point at the same data as b[0]. With restrict, the compiler can assume 2197 that b[0] is not modified by any of the instructions and load it just 2198 once. The same considerations apply to accesses using imageLoad(), 2199 imageStore(), and imageAtomic*() builtins. 2200 2201 "const" behaves as in C, and indicates that the image memory should be 2202 treated as read-only. Note that the use of "const" in image variable 2203 declarations is different from the normal "const" qualifier, as it 2204 treats the image data referenced by the variable as constant. 2205 2206 (27) How should shaders be able to express qualifiers for image variables? 2207 2208 RESOLVED: This extension borrows from C/C++ syntax rules where a 2209 qualifier may be specified before or after the type. For example, 2210 2211 layout(rgba32f) const uniform image2D imageVariable; 2212 2213 declare an image uniform whose image data are treated as read-only. We 2214 permit qualifiers to be provided either before or after the type name 2215 (image2D). The position of the qualifier is meaningful. Qualifiers 2216 before the type name apply to the data referenced by the variable. 2217 Qualifiers after the type name apply to the variable itself. 2218 2219 The closest C/C++ equivalent to the declarations above would turn 2220 declarations like: 2221 2222 layout(rgba32f) const uniform image2D firstImage; 2223 layout(rgba32f) uniform image2D const secondImage; 2224 2225 into: 2226 2227 const struct image2D_data * firstImage; 2228 struct image2D_data * const secondImage; 2229 2230 where "image2D" is replaced with "struct image2D_data *". In this 2231 model, the former declares <firstImage> to be a pointer to constant 2232 image data. The latter declares <secondImage> to be a constant pointer 2233 to non-constant image data. 2234 2235 For "coherent", "volatile", and "const", the qualifier should typically 2236 go before the image type. For "restrict", the qualifier must go after 2237 the image type, since "restrict" applies to the pointer, not the data 2238 being pointed to. 2239 2240 Note that a qualifier could theoretically be specified before and after 2241 the type name, such as: 2242 2243 const image2D const imageVariable; 2244 2245 which would declare <imageVariable> to be constant and to reference 2246 constant image data. In this extension, declaring an image variable to 2247 be constant isn't meaningful, as such variables can never be used as 2248 l-values. 2249 2250 (28) What is the meaning of "restrict" on a system that might run either 2251 multiple invocations of the same shader simultaneously, or multiple 2252 invocations of different shaders (vertex and fragment) 2253 simultaneously? 2254 2255 RESOLVED: When an image variable is qualified with "restrict", the only 2256 guarantee is that no other image variable in the same shader invocation 2257 references the same underlying image data. There is no guarantee that 2258 the same image couldn't be referenced by another invocation of the same 2259 shader, or by an invocation of a different shader. 2260 2261 The main function of "restrict" is to allow compilers to generate more 2262 efficient code for a single shader invocation than it could if it had to 2263 conservatively assume that accesses to other images could touch the same 2264 image data. 2265 2266 (29) What is the purpose of the memoryBarrier() built-in function? 2267 2268 RESOLVED: The memoryBarrier() function can be used to ensure that if 2269 another shader invocation or other portions observe image memory being 2270 written by a shader, that accesses appear in a predictable order. For 2271 example, consider the following code: 2272 2273 uniform imageBuffer buf1; 2274 uniform imageBuffer buf2; 2275 int offset1, offset2; 2276 vec4 data1, data2; 2277 imageStore(buf1, offset1, data1); 2278 imageStore(buf2, offset2, data2); 2279 2280 This specification doesn't require that writes be committed to memory in 2281 the order specified in the shader. It is possible that another shader 2282 invocation or some other observer would see <data2> before seeing 2283 <data1>. If an algorithm involved multiple shader invocations with one 2284 possibly needing to wait on data written by another, observing <data2> 2285 in the second shader would not ensure that <data1> has been written. 2286 However, if memoryBarrier() were used, as in the following code, the 2287 second shader would have such a guarantee. 2288 2289 imageStore(buf1, offset1, data1); 2290 memoryBarrier(); 2291 imageStore(buf2, offset2, data2); 2292 2293 (30) What happens if the texel identified by the coordinates given to an 2294 image load, store, or atomic built-in doesn't exist? (i.e., 2295 coordinates are out of bounds) 2296 2297 RESOLVED: The results of image loads return zero. Stores do not update 2298 image memory. Atomics do not update image memory and return zero. 2299 These same considerations apply if no texture is bound to an image unit, 2300 the texture is incomplete, and various other conditions. We do not ever 2301 apply wrap modes on image operations. 2302 2303 (31) Why do we have a <format> parameter on BindImageTexture? 2304 2305 RESOLVED: It allows some amount of bit-casting, to view a texture with 2306 one format using another format. In addition to any benefits from 2307 viewing textures with a different format, it also permits atomics 2308 operations on some multi-component textures by allowing them to be 2309 viewed using R32I or R32UI formats. 2310 2311 In the EXT_shader_image_load_store extension, there was an additional 2312 benefit to working around a more severe limitation on the set of formats 2313 supported for stores -- only formats like R8, R16, R32F, RG32F, RGBA32F 2314 are supported there. Other formats not supported there can be viewed as 2315 supported formats (e.g., RGBA8 could map to R32UI), with shader code 2316 doing any needed packing and unpacking. 2317 2318 (32) Do we support image atomics on multi-component texture formats? 2319 2320 RESOLVED: Only if the texture formats can be viewed as "R32I" or 2321 "R32UI" formats by using the <format> parameter of BindImageTexture. 2322 Atomics do not operate on a component-by-component basis in this 2323 extension. 2324 2325 (33) What happens if early fragment testing is enabled, the early depth 2326 test passes, and a fragment shader that computes a new depth value is 2327 executed? 2328 2329 RESOLVED: The depth value produced by the fragment shader has no effect 2330 if early depth and stencil tests are enabled. The depth value computed 2331 by a fragment shader is used only by the post-fragment shader stencil 2332 and depth tests, and those tests always have no effect when early 2333 fragment tests is enabled. 2334 2335 (34) How do early fragment tests interact with occlusion queries? 2336 2337 RESOLVED: When early fragment tests are enabled, sample counting for 2338 occlusion queries also happens prior to fragment shader execution. 2339 Enabling early fragment tests can change the overall sample count, 2340 because samples killed by alpha test and alpha to coverage will still be 2341 counted if early fragment tests are enabled. 2342 2343 (35) If we provide support for multiple active program objects (e.g., one 2344 containing a vertex shader, another containing a fragment shader, as 2345 in EXT_separate_shader_object), how will early fragment tests be 2346 handled? 2347 2348 RESOLVED: The early fragment test enable should be taken from the 2349 active program object corresponding to the fragment shader stage. 2350 2351 (36) When specifying a coordinate vector to specify a texel for a 2352 TEXTURE_1D_ARRAY target, what coordinate is used to specify the 2353 layer? 2354 2355 RESOLVED: For GLSL functions, a two-component vector is specified and 2356 the second (y) component is used to select a layer. 2357 2358 (37) How does the synchronization (or lack thereof) of shader accesses to 2359 buffer memory interact with accesses to mapped buffer memory? 2360 2361 RESOLVED: Shader memory accesses are not automatically synchronized 2362 with MapBuffer. Mapping a buffer object will not guarantee that image 2363 stores or atomics issued by shaders triggered by rendering commands 2364 prior to the MapBuffer call are complete before returning a pointer to 2365 the application. This lack of synchronization is similar to what 2366 happens if you call MapBufferRange with MAP_UNSYNCHRONIZED_BIT set. 2367 However, if you call MemoryBarrier with BUFFER_UPDATE_BARRIER_BIT set 2368 prior to mapping the buffer object, the GL will manually synchronize, 2369 ensuring that all prior shader writes to a buffer are complete prior to 2370 any subsequent commands (including MapBuffer) accessing the buffer. 2371 2372Revision History 2373 2374 Rev. Date Author Changes 2375 ---- -------- -------- ----------------------------------------------- 2376 36 10/27/14 pbrown Fix the "Name Strings" entry to include a "GL_" 2377 prefix. 2378 2379 35 09/11/14 pbrown Add missing text for issue (9). 2380 2381 34 06/10/14 Jon Leech Minor typo fixes from bug 7263. 2382 2383 33 10/16/13 pbrown Update issue (20) to clarify that any image 2384 stores and atomics issued before a "discard" do 2385 have an effect. Update issue (22) to better 2386 define the behavior of stores and atomics on 2387 "helper" pixels and to suggest a workaround for 2388 shaders that need to use values returned by 2389 atomics (undefined for helper pixels) in flow 2390 control constructs. 2391 have an effect. 2392 2393 32 03/06/12 pbrown Fix the minimum values for GLSL built-ins 2394 gl_Max{Fragment,Combined}ImageUniforms to 8 2395 to match the minimums for the API specification 2396 (bug 8673). 2397 2398 31 01/18/12 Jon Leech State table fix for 2399 IMAGE_FORMAT_COMPATIBILITY_TYPE (Bug 8430). 2400 2401 30 08/04/11 pbrown Remove imageAtomicIncWrap() and 2402 imageAtomicDecWrap() functions from the ARB 2403 extension (bug 7182). Rename the <index> 2404 argument of BindImageTexture to <unit> (bug 2405 7851). Fix typo in spec language describing 2406 out-of-bounds indexing of image arrays. 2407 2408 29 08/03/11 pbrown Clarify that negative values of <level> and 2409 <layer> will generate errors even in cases where 2410 those parameters would ultimately have no effect 2411 (bug 7850). Add a note recommending that 2412 BindImageTexture not use the same protocol as 2413 its "EXT" equivalent due to differences in error 2414 behavior. 2415 2416 28 07/27/11 pbrown Document new implementation limits added in 2417 version 26 as differences from the EXT (bug 7805). 2418 2419 27 07/22/11 Jon Leech Remove unreachable error condition for negative 2420 <index> (bug 7770). 2421 2422 26 07/22/11 pbrown Add implementation limits on the number of 2423 image uniforms used by each shader stage, and on 2424 the combined total of all stages, as well as 2425 corresponding GLSL constants (bug 7805). 2426 2427 25 06/20/11 johnk Sync. with core specification: adds writeonly, 2428 replaces "const" with "readonly", refers to all 2429 these as "memory qualifiers", includes semantics 2430 for calling functions. Minor non-functional 2431 edits to make them match. 2432 2433 24 06/19/11 pbrown Assign values for new enumerants. 2434 2435 23 06/18/11 pbrown Clarify that image variables can not be stored 2436 in uniform blocks. 2437 2438 22 06/07/11 pbrown Clarify that <layered> and <layer> are ignored 2439 when used with non-layered textures. Clarify 2440 that non-existent layer/face numbers make 2441 accesses invalid for both non-layered and 2442 layered bindings (bug 7721). 2443 2444 21 06/06/11 pbrown Add IMAGE_FORMAT_COMPATIBILITY_TYPE to state 2445 tables, add descriptions for IMAGE_BINDING* 2446 state table entries (bug 7689). 2447 2448 20 06/06/11 pbrown Clarify the language describing data type 2449 conversions for image loads/stores to indicate 2450 that we use the same general process as for 2451 TexImage and GetTexImage commands. For 2452 multisample textures used as images, TexImage 2453 commands do not specify image data and 2454 GetTexImage is not supported (bug 7249). 2455 2456 19 02/14/11 pbrown Remove the repeatability requirement (Appendix 2457 A.1) when using shaders with side effects (bug 2458 7026). Clean up spec language describing GLSL's 2459 memoryBarrier() function, and add a dependency 2460 on atomic counter extensions to indicate that 2461 memoryBarrier() also applies to atomic counters 2462 (bug 7237). 2463 2464 18 02/13/11 Jon Leech Cleanup BindImageTexture language to match 2465 4.2 core spec phrasing. 2466 2467 17 01/20/11 pbrown Clarify that the MAX* limits can be queried 2468 by GetInteger64v (bug 7225). Add INVALID_VALUE 2469 error for BindImageTexture if <level> or <layer> 2470 is negative (bug 7226). Clarify language for 2471 MemoryBarrier's BUFFER_UPDATE_BARRIER_BIT (bug 2472 7228). Update the <format> parameter of 2473 BindImageTexture to be an "enum" instead of 2474 "int". The EXT used "int" to be compatible 2475 with TexImage, which itself derived to the 2476 deprecated "1", "2", "3", and "4" formats from 2477 OpenGL 1.0 (bug 7183). 2478 2479 16 01/20/11 pbrown Add GLSL built-in constants for implementation- 2480 dependent limits (bug 7234). 2481 2482 15 01/18/11 Jon Leech Fix typos from Bug 7235. 2483 2484 14 01/18/11 pbrown Add interaction with NV_parameter_buffer_object 2485 for ProgramBufferParametersNV (bug 7235). 2486 2487 13 01/05/11 Jon Leech Fix typos from Bug 7202. 2488 2489 12 12/17/10 johnk Minor tweaks for grammar and consistency that 2490 also apply to 1.5, that were generated while 2491 incorporating into 4.2 core. 2492 2493 11 12/14/10 pbrown Add edits to invariance and synchronization 2494 rules in Appendices A and D to account for 2495 side effects from shader execution (bug 7026). 2496 2497 10 12/14/10 pbrown Clean up issues section from changes in 2498 revisions 7-9. 2499 2500 9 12/14/10 pbrown Modify the layout qualifier behavior for image 2501 variables to specify a full GL-style format 2502 instead of component/bit counts (bug 6868), 2503 with loaded data converted to a canonical 2504 vector type according to the full format. 2505 Limited the amount of format mismatching allowed 2506 when binding textures allocated outside the GL 2507 to image units. Removed the requirement for 2508 layout qualifiers on image variables used 2509 only for image stores. 2510 2511 8 12/12/10 pbrown Additional minor spec errata fixes (bug 6991). 2512 2513 7 12/12/10 pbrown Fix minor spec errata (bug 6870). Removed 2514 interactions with NV_gpu_program5; this is 2515 already covered by the EXT version of the spec. 2516 2517 6 10/19/10 pdaniell ARBify in preparation for OpenGL 4.2 core. 2518 2519 5 09/17/10 pbrown Clean up the spec language specifying the 2520 mapping of coordinates to texels according to 2521 the texture target. For 1D arrays, GLSL wants 2522 the layer in the second component of a 2523 two-component vector while NV_gpu_program5 wants 2524 it in the third component of a four-component 2525 vector. Also clarify that single-layer bindings 2526 of an array or cube map texture use a target 2527 appropriate to the bound layer. 2528 2529 4 03/23/10 pbrown Add interaction with EXT_separate_shader_objects. 2530 Update issues section to include some issues 2531 left behind in NV_gpu_shader5 when specs were 2532 refactored. 2533 2534 3 03/21/10 pbrown Update spec overview, interactions, and issues 2535 sections; miscellaneous minor clarifications. 2536 2537 2 03/16/10 pbrown Add a separate #extension line for this 2538 extension; needed since the became packaged 2539 separately from ARB_gpu_shader5. Added C99-like 2540 "restrict" qualifier to indicate that an image 2541 variable won't share underlying image contents 2542 with any other variable. Added support for 2543 "const" qualifiers on images to allow indicate 2544 read-only image data. Added language describing 2545 the significance of the position of image 2546 variable qualifiers. Clarified rules on use of 2547 image variables as function parameters; adding 2548 qualifiers is OK, stripping them off is not. 2549 Updated image layout qualifier section to 2550 clarify that "size" layout qualifiers are 2551 required on both uniform and function parameter 2552 declarations. Added "const" qualifier on the 2553 image argument in imageLoad() prototypes. 2554 Updated extension names in dependency sections. 2555 Add support for stores to the RGB10_A2 texture 2556 format from OpenGL 3.3. Add several issues. 2557 2558 1 jbolz Internal revisions. 2559