1Name 2 3 NV_shader_atomic_fp16_vector 4 5Name Strings 6 7 GL_NV_shader_atomic_fp16_vector 8 9Contact 10 11 Jeff Bolz, NVIDIA Corporation (jbolz 'at' nvidia.com) 12 13Contributors 14 15 Pat Brown, NVIDIA 16 Mathias Heyer, NVIDIA 17 18Status 19 20 Shipping 21 22Version 23 24 Last Modified Date: February 4, 2015 25 NVIDIA Revision: 3 26 27Number 28 29 OpenGL Extension #474 30 OpenGL ES Extension #261 31 32Dependencies 33 34 This extension is written against the OpenGL 4.3 (Compatibility Profile) 35 Specification. 36 37 This extension is written against version 4.30 of the OpenGL Shading 38 Language Specification. 39 40 This extension interacts with NV_shader_buffer_store and NV_gpu_shader5. 41 42 This extension interacts with NV_gpu_program5, NV_shader_buffer_store, and 43 NV_gpu_program5_mem_extended. 44 45 This extension requires NV_gpu_shader5. 46 47 This extension interacts with NV_shader_storage_buffer_object. 48 49 This extension interacts with NV_compute_program5. 50 51 This extension interacts with NV_image_formats. 52 53 This extension interacts with OES_shader_image_atomic. 54 55Overview 56 57 This extension provides GLSL built-in functions and assembly opcodes 58 allowing shaders to perform a limited set of atomic read-modify-write 59 operations to buffer or texture memory with 16-bit floating point vector 60 surface formats. 61 62New Procedures and Functions 63 64 None. 65 66New Tokens 67 68 None. 69 70Additions to the AGL/GLX/WGL Specifications 71 72 None. 73 74GLX Protocol 75 76 None. 77 78Modifications to the OpenGL Shading Language Specification, Version 4.30 79 80 Including the following line in a shader can be used to control the 81 language features described in this extension: 82 83 #extension GL_NV_shader_atomic_fp16_vector : <behavior> 84 85 where <behavior> is as specified in section 3.3. 86 87 New preprocessor #defines are added to the OpenGL Shading Language: 88 89 #define GL_NV_shader_atomic_fp16_vector 1 90 91 Modify Section 8.11, Atomic Memory Functions (p. 163) 92 93 Add before the table of functions: 94 95 Some atomic memory operations are supported on two- and four-component 96 vectors with 16-bit floating-point components. 97 98 Add new functions to the table 99 100 // Computes a new value per-component using the specified operation. 101 // Atomicity is only guaranteed on a per-component basis. 102 f16vec2 atomicAdd(inout f16vec2 mem, f16vec2 data); 103 f16vec4 atomicAdd(inout f16vec4 mem, f16vec4 data); 104 f16vec2 atomicMin(inout f16vec2 mem, f16vec2 data); 105 f16vec4 atomicMin(inout f16vec4 mem, f16vec4 data); 106 f16vec2 atomicMax(inout f16vec2 mem, f16vec2 data); 107 f16vec4 atomicMax(inout f16vec4 mem, f16vec4 data); 108 f16vec2 atomicExchange(inout f16vec2 mem, f16vec2 data); 109 f16vec4 atomicExchange(inout f16vec4 mem, f16vec4 data); 110 111 112 Modify Section 8.12, Image Functions (p. 164) 113 114 Add before the table of functions: 115 116 Some atomic memory operations are supported on two- and four-component 117 vectors with 16-bit floating-point components, for images with format 118 qualifiers of <rg16f> and <rgba16f>. 119 120 Add new functions to the table: 121 122 // Computes a new value per-component using the specified operation 123 // Atomicity is only guaranteed on a per-component basis. 124 f16vec2 imageAtomicAdd(IMAGE_PARAMS, f16vec2 data); 125 f16vec4 imageAtomicAdd(IMAGE_PARAMS, f16vec4 data); 126 f16vec2 imageAtomicMin(IMAGE_PARAMS, f16vec2 data); 127 f16vec4 imageAtomicMin(IMAGE_PARAMS, f16vec4 data); 128 f16vec2 imageAtomicMax(IMAGE_PARAMS, f16vec2 data); 129 f16vec4 imageAtomicMax(IMAGE_PARAMS, f16vec4 data); 130 f16vec2 imageAtomicExchange(IMAGE_PARAMS, f16vec2 data); 131 f16vec4 imageAtomicExchange(IMAGE_PARAMS, f16vec4 data); 132 133Dependencies on OES_shader_image_atomic 134 135 If implemented in OpenGL ES and OES_shader_image_atomic is not 136 supported, do not introduce additional imageAtomic* functions. 137 138Dependencies on NV_image_formats 139 140 If implemented in OpenGL ES and NV_image_formats is not 141 supported, remove references to two-component images of format 142 <rg16f>. 143 144Dependencies on NV_shader_buffer_store and NV_gpu_shader5 145 If NV_shader_buffer_store and NV_gpu_shader5 are supported, the following 146 functions should be added to the "Section 8.Y, Shader Memory Functions" 147 language in the NV_shader_buffer_store specification: 148 149 // Computes a new value per-component using the specified operation 150 // Atomicity is only guaranteed on a per-component basis. 151 f16vec2 atomicAdd(f16vec2 *address, f16vec2 data); 152 f16vec4 atomicAdd(f16vec4 *address, f16vec4 data); 153 f16vec2 atomicMin(f16vec2 *address, f16vec2 data); 154 f16vec4 atomicMin(f16vec4 *address, f16vec4 data); 155 f16vec2 atomicMax(f16vec2 *address, f16vec2 data); 156 f16vec4 atomicMax(f16vec4 *address, f16vec4 data); 157 f16vec2 atomicExchange(f16vec2 *address, f16vec2 data); 158 f16vec4 atomicExchange(f16vec4 *address, f16vec4 data); 159 160Dependencies on NV_gpu_program5, NV_shader_buffer_store, and 161NV_gpu_program5_mem_extended 162 163 If NV_gpu_program5 is supported and "OPTION NV_shader_atomic_fp16_vector" 164 is specified in an assembly program, "F16X2" and "F16X4" should be allowed 165 as storage modifiers to the ATOM instruction for the atomic operations 166 "ADD", "MIN", "MAX" and "EXCH". These operate on each of the two or four 167 fp16 values independently. Atomicity is only guaranteed on a per-component 168 basis. 169 170 (Add to "Section 2.X.6, Program Options" of the NV_gpu_program4 extension, 171 as extended by NV_gpu_program5:) 172 173 + Floating-Point Vector Atomic Operations (NV_shader_atomic_fp16_vector) 174 175 If a program specifies the "NV_shader_atomic_fp16_vector" option, it may 176 use the "F16X2" and "F16X4" storage modifiers with the "ATOM" opcodes to 177 perform atomic floating-point add or exchange operations. 178 179 (Add to the table in "Section 2.X.8.Z, ATOM" in NV_gpu_program5:) 180 181 atomic storage 182 modifier modifiers operation 183 -------- ------------------ -------------------------------------- 184 ADD U32, S32, U64, compute a sum 185 F16X2, F16X4 186 MIN U32, S32, compute minimum 187 F16X2, F16X4 188 MAX U32, S32, compute maximum 189 F16X2, F16X4 190 EXCH U32, S32, F32 exchange memory with operand 191 F16X2, F16X4 192 ... 193 194Dependencies on EXT_shader_image_load_store and NV_gpu_program5 195 196 If EXT_shader_image_load_store and NV_gpu_program5 are supported and 197 "OPTION NV_shader_atomic_fp16_vector" is specified in an assembly program, 198 "F16X2" and "F16X4" should be allowed as storage modifiers to the ATOMIM 199 instruction for the atomic operations "ADD", "MIN", "MAX", and "EXCH". 200 These operate on each of the two or four fp16 values independently. 201 Atomicity is only guaranteed on a per-component basis. 202 203 (Add to the table in "Section 2.X.8.Z, ATOMIM" in the "Dependencies on 204 NV_gpu_program5" portion of the EXT_shader_image_load specification) 205 206 atomic storage 207 modifier modifiers operation 208 -------- ------------- -------------------------------------- 209 ADD U32, S32, compute a sum 210 F16X2, F16X4 211 MIN U32, S32, compute minimum 212 F16X2, F16X4 213 MAX U32, S32, compute maximum 214 F16X2, F16X4 215 EXCH U32, S32, F32 exchange memory with operand 216 F16X2, F16X4 217 ... 218 219Dependencies on NV_compute_program5 220 221 If NV_compute_program5 is supported and "OPTION 222 NV_shader_atomic_fp16_vector" is specified in an assembly program, "F16X2" 223 and "F16X4" should be allowed as storage modifiers to the ATOMB instruction 224 for the atomic operations "ADD", "MIN", "MAX", and "EXCH". These operate on 225 each of the two or four fp16 values independently. Atomicity is only 226 guaranteed on a per-component basis. 227 228 (Add to the table in "Section 2.X.8.Z, ATOMB" in the "Dependencies on 229 NV_gpu_program5" portion of the NV_shader_storage_buffer_object 230 specification) 231 232 atomic storage 233 modifier modifiers operation 234 -------- ------------- -------------------------------------- 235 ADD U32, S32, U64 compute a sum 236 F32, F16X2, F16X4 237 MIN U32, S32, compute minimum 238 F16X2, F16X4 239 MAX U32, S32, compute maximum 240 F16X2, F16X4 241 EXCH U32, S32, F32 exchange memory with operand 242 F16X2, F16X4 243 ... 244 245Dependencies on NV_shader_storage_buffer_object 246 247 If NV_shader_storage_buffer_object is supported and "OPTION 248 NV_shader_atomic_fp16_vector" is specified in an assembly program, "F16X2" 249 and "F16X4" should be allowed as storage modifiers to the ATOMS instruction 250 for the atomic operations "ADD", "MIN", "MAX", and "EXCH". These operate on 251 each of the two or four fp16 values independently. Atomicity is only 252 guaranteed on a per-component basis. 253 254 (Add to the table in "Section 2.X.8.Z, ATOMS" in the "Dependencies on 255 NV_gpu_program5" portion of the NV_compute_program5 specification) 256 257 atomic storage 258 modifier modifiers operation 259 -------- ------------- -------------------------------------- 260 ADD U32, S32, U64 compute a sum 261 F32, F16X2, F16X4 262 MIN U32, S32, compute minimum 263 F16X2, F16X4 264 MAX U32, S32, compute maximum 265 F16X2, F16X4 266 EXCH U32, S32, F32 exchange memory with operand 267 F16X2, F16X4 268 ... 269 270 271Errors 272 273 None. 274 275New State 276 277 None. 278 279New Implementation Dependent State 280 281 None. 282 283Issues 284 285 (1) Should we allow "partial" atomics to a f16vec2 or f16vec4, only 286 modifying some of the components? 287 288 RESOLVED: No. If an app really cares to do this, they could inject 289 "special" values in those components that cause the atomic to have no 290 effect for that component (e.g. add zero, max with -infinity, etc). This 291 would work for atomicAdd, atomicMin, and atomicMax, but not for 292 atomicExchange. 293 294 (2) Are these vector atomics guaranteed to update all components of the 295 vector atomically? 296 297 RESOLVED: No. The spec only guarantees that individual components of a 298 vector be updated atomically. The initial implementation of this 299 extension will only atomically update pairs of components. For many of 300 the algorithms supported by this extension (computing component-wise sums, 301 minimums, or maximums of multi-component vectors), it is not necessary to 302 update all components in a vector as a single unit. 303 304 (3) What support should we provide for four-component vectors? 305 306 RESOLVED: All of image, global, buffer, and shared memory atomic 307 operations will fully support two- and four-component variants. While one 308 might emulate some four-component atomic operations using pairs of 309 two-component operations, we choose to support four-component operations 310 universally. Supporting atomics on four-component vectors seems useful, 311 as it supports computing sums, minimums, or maximums on RGBA color values 312 and other data with more than two components. 313 314Revision History 315 316 Revision 2 317 - Add OpenGL ES interactions 318 Revision 1 319 - Internal revisions. 320