Name NV_shader_atomic_fp16_vector Name Strings GL_NV_shader_atomic_fp16_vector Contact Jeff Bolz, NVIDIA Corporation (jbolz 'at' nvidia.com) Contributors Pat Brown, NVIDIA Mathias Heyer, NVIDIA Status Shipping Version Last Modified Date: February 4, 2015 NVIDIA Revision: 3 Number OpenGL Extension #474 OpenGL ES Extension #261 Dependencies This extension is written against the OpenGL 4.3 (Compatibility Profile) Specification. This extension is written against version 4.30 of the OpenGL Shading Language Specification. This extension interacts with NV_shader_buffer_store and NV_gpu_shader5. This extension interacts with NV_gpu_program5, NV_shader_buffer_store, and NV_gpu_program5_mem_extended. This extension requires NV_gpu_shader5. This extension interacts with NV_shader_storage_buffer_object. This extension interacts with NV_compute_program5. This extension interacts with NV_image_formats. This extension interacts with OES_shader_image_atomic. Overview This extension provides GLSL built-in functions and assembly opcodes allowing shaders to perform a limited set of atomic read-modify-write operations to buffer or texture memory with 16-bit floating point vector surface formats. New Procedures and Functions None. New Tokens None. Additions to the AGL/GLX/WGL Specifications None. GLX Protocol None. Modifications to the OpenGL Shading Language Specification, Version 4.30 Including the following line in a shader can be used to control the language features described in this extension: #extension GL_NV_shader_atomic_fp16_vector : where is as specified in section 3.3. New preprocessor #defines are added to the OpenGL Shading Language: #define GL_NV_shader_atomic_fp16_vector 1 Modify Section 8.11, Atomic Memory Functions (p. 163) Add before the table of functions: Some atomic memory operations are supported on two- and four-component vectors with 16-bit floating-point components. Add new functions to the table // Computes a new value per-component using the specified operation. // Atomicity is only guaranteed on a per-component basis. f16vec2 atomicAdd(inout f16vec2 mem, f16vec2 data); f16vec4 atomicAdd(inout f16vec4 mem, f16vec4 data); f16vec2 atomicMin(inout f16vec2 mem, f16vec2 data); f16vec4 atomicMin(inout f16vec4 mem, f16vec4 data); f16vec2 atomicMax(inout f16vec2 mem, f16vec2 data); f16vec4 atomicMax(inout f16vec4 mem, f16vec4 data); f16vec2 atomicExchange(inout f16vec2 mem, f16vec2 data); f16vec4 atomicExchange(inout f16vec4 mem, f16vec4 data); Modify Section 8.12, Image Functions (p. 164) Add before the table of functions: Some atomic memory operations are supported on two- and four-component vectors with 16-bit floating-point components, for images with format qualifiers of and . Add new functions to the table: // Computes a new value per-component using the specified operation // Atomicity is only guaranteed on a per-component basis. f16vec2 imageAtomicAdd(IMAGE_PARAMS, f16vec2 data); f16vec4 imageAtomicAdd(IMAGE_PARAMS, f16vec4 data); f16vec2 imageAtomicMin(IMAGE_PARAMS, f16vec2 data); f16vec4 imageAtomicMin(IMAGE_PARAMS, f16vec4 data); f16vec2 imageAtomicMax(IMAGE_PARAMS, f16vec2 data); f16vec4 imageAtomicMax(IMAGE_PARAMS, f16vec4 data); f16vec2 imageAtomicExchange(IMAGE_PARAMS, f16vec2 data); f16vec4 imageAtomicExchange(IMAGE_PARAMS, f16vec4 data); Dependencies on OES_shader_image_atomic If implemented in OpenGL ES and OES_shader_image_atomic is not supported, do not introduce additional imageAtomic* functions. Dependencies on NV_image_formats If implemented in OpenGL ES and NV_image_formats is not supported, remove references to two-component images of format . Dependencies on NV_shader_buffer_store and NV_gpu_shader5 If NV_shader_buffer_store and NV_gpu_shader5 are supported, the following functions should be added to the "Section 8.Y, Shader Memory Functions" language in the NV_shader_buffer_store specification: // Computes a new value per-component using the specified operation // Atomicity is only guaranteed on a per-component basis. f16vec2 atomicAdd(f16vec2 *address, f16vec2 data); f16vec4 atomicAdd(f16vec4 *address, f16vec4 data); f16vec2 atomicMin(f16vec2 *address, f16vec2 data); f16vec4 atomicMin(f16vec4 *address, f16vec4 data); f16vec2 atomicMax(f16vec2 *address, f16vec2 data); f16vec4 atomicMax(f16vec4 *address, f16vec4 data); f16vec2 atomicExchange(f16vec2 *address, f16vec2 data); f16vec4 atomicExchange(f16vec4 *address, f16vec4 data); Dependencies on NV_gpu_program5, NV_shader_buffer_store, and NV_gpu_program5_mem_extended If NV_gpu_program5 is supported and "OPTION NV_shader_atomic_fp16_vector" is specified in an assembly program, "F16X2" and "F16X4" should be allowed as storage modifiers to the ATOM instruction for the atomic operations "ADD", "MIN", "MAX" and "EXCH". These operate on each of the two or four fp16 values independently. Atomicity is only guaranteed on a per-component basis. (Add to "Section 2.X.6, Program Options" of the NV_gpu_program4 extension, as extended by NV_gpu_program5:) + Floating-Point Vector Atomic Operations (NV_shader_atomic_fp16_vector) If a program specifies the "NV_shader_atomic_fp16_vector" option, it may use the "F16X2" and "F16X4" storage modifiers with the "ATOM" opcodes to perform atomic floating-point add or exchange operations. (Add to the table in "Section 2.X.8.Z, ATOM" in NV_gpu_program5:) atomic storage modifier modifiers operation -------- ------------------ -------------------------------------- ADD U32, S32, U64, compute a sum F16X2, F16X4 MIN U32, S32, compute minimum F16X2, F16X4 MAX U32, S32, compute maximum F16X2, F16X4 EXCH U32, S32, F32 exchange memory with operand F16X2, F16X4 ... Dependencies on EXT_shader_image_load_store and NV_gpu_program5 If EXT_shader_image_load_store and NV_gpu_program5 are supported and "OPTION NV_shader_atomic_fp16_vector" is specified in an assembly program, "F16X2" and "F16X4" should be allowed as storage modifiers to the ATOMIM instruction for the atomic operations "ADD", "MIN", "MAX", and "EXCH". These operate on each of the two or four fp16 values independently. Atomicity is only guaranteed on a per-component basis. (Add to the table in "Section 2.X.8.Z, ATOMIM" in the "Dependencies on NV_gpu_program5" portion of the EXT_shader_image_load specification) atomic storage modifier modifiers operation -------- ------------- -------------------------------------- ADD U32, S32, compute a sum F16X2, F16X4 MIN U32, S32, compute minimum F16X2, F16X4 MAX U32, S32, compute maximum F16X2, F16X4 EXCH U32, S32, F32 exchange memory with operand F16X2, F16X4 ... Dependencies on NV_compute_program5 If NV_compute_program5 is supported and "OPTION NV_shader_atomic_fp16_vector" is specified in an assembly program, "F16X2" and "F16X4" should be allowed as storage modifiers to the ATOMB instruction for the atomic operations "ADD", "MIN", "MAX", and "EXCH". These operate on each of the two or four fp16 values independently. Atomicity is only guaranteed on a per-component basis. (Add to the table in "Section 2.X.8.Z, ATOMB" in the "Dependencies on NV_gpu_program5" portion of the NV_shader_storage_buffer_object specification) atomic storage modifier modifiers operation -------- ------------- -------------------------------------- ADD U32, S32, U64 compute a sum F32, F16X2, F16X4 MIN U32, S32, compute minimum F16X2, F16X4 MAX U32, S32, compute maximum F16X2, F16X4 EXCH U32, S32, F32 exchange memory with operand F16X2, F16X4 ... Dependencies on NV_shader_storage_buffer_object If NV_shader_storage_buffer_object is supported and "OPTION NV_shader_atomic_fp16_vector" is specified in an assembly program, "F16X2" and "F16X4" should be allowed as storage modifiers to the ATOMS instruction for the atomic operations "ADD", "MIN", "MAX", and "EXCH". These operate on each of the two or four fp16 values independently. Atomicity is only guaranteed on a per-component basis. (Add to the table in "Section 2.X.8.Z, ATOMS" in the "Dependencies on NV_gpu_program5" portion of the NV_compute_program5 specification) atomic storage modifier modifiers operation -------- ------------- -------------------------------------- ADD U32, S32, U64 compute a sum F32, F16X2, F16X4 MIN U32, S32, compute minimum F16X2, F16X4 MAX U32, S32, compute maximum F16X2, F16X4 EXCH U32, S32, F32 exchange memory with operand F16X2, F16X4 ... Errors None. New State None. New Implementation Dependent State None. Issues (1) Should we allow "partial" atomics to a f16vec2 or f16vec4, only modifying some of the components? RESOLVED: No. If an app really cares to do this, they could inject "special" values in those components that cause the atomic to have no effect for that component (e.g. add zero, max with -infinity, etc). This would work for atomicAdd, atomicMin, and atomicMax, but not for atomicExchange. (2) Are these vector atomics guaranteed to update all components of the vector atomically? RESOLVED: No. The spec only guarantees that individual components of a vector be updated atomically. The initial implementation of this extension will only atomically update pairs of components. For many of the algorithms supported by this extension (computing component-wise sums, minimums, or maximums of multi-component vectors), it is not necessary to update all components in a vector as a single unit. (3) What support should we provide for four-component vectors? RESOLVED: All of image, global, buffer, and shared memory atomic operations will fully support two- and four-component variants. While one might emulate some four-component atomic operations using pairs of two-component operations, we choose to support four-component operations universally. Supporting atomics on four-component vectors seems useful, as it supports computing sums, minimums, or maximums on RGBA color values and other data with more than two components. Revision History Revision 2 - Add OpenGL ES interactions Revision 1 - Internal revisions.