• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Name
2
3    NV_shader_atomic_fp16_vector
4
5Name Strings
6
7    GL_NV_shader_atomic_fp16_vector
8
9Contact
10
11    Jeff Bolz, NVIDIA Corporation (jbolz 'at' nvidia.com)
12
13Contributors
14
15    Pat Brown, NVIDIA
16    Mathias Heyer, NVIDIA
17
18Status
19
20    Shipping
21
22Version
23
24    Last Modified Date:         February 4, 2015
25    NVIDIA Revision:            3
26
27Number
28
29    OpenGL Extension #474
30    OpenGL ES Extension #261
31
32Dependencies
33
34    This extension is written against the OpenGL 4.3 (Compatibility Profile)
35    Specification.
36
37    This extension is written against version 4.30 of the OpenGL Shading
38    Language Specification.
39
40    This extension interacts with NV_shader_buffer_store and NV_gpu_shader5.
41
42    This extension interacts with NV_gpu_program5, NV_shader_buffer_store, and
43    NV_gpu_program5_mem_extended.
44
45    This extension requires NV_gpu_shader5.
46
47    This extension interacts with NV_shader_storage_buffer_object.
48
49    This extension interacts with NV_compute_program5.
50
51    This extension interacts with NV_image_formats.
52
53    This extension interacts with OES_shader_image_atomic.
54
55Overview
56
57    This extension provides GLSL built-in functions and assembly opcodes
58    allowing shaders to perform a limited set of atomic read-modify-write
59    operations to buffer or texture memory with 16-bit floating point vector
60    surface formats.
61
62New Procedures and Functions
63
64    None.
65
66New Tokens
67
68    None.
69
70Additions to the AGL/GLX/WGL Specifications
71
72    None.
73
74GLX Protocol
75
76    None.
77
78Modifications to the OpenGL Shading Language Specification, Version 4.30
79
80    Including the following line in a shader can be used to control the
81    language features described in this extension:
82
83      #extension GL_NV_shader_atomic_fp16_vector : <behavior>
84
85    where <behavior> is as specified in section 3.3.
86
87    New preprocessor #defines are added to the OpenGL Shading Language:
88
89      #define GL_NV_shader_atomic_fp16_vector         1
90
91    Modify Section 8.11, Atomic Memory Functions (p. 163)
92
93    Add before the table of functions:
94
95    Some atomic memory operations are supported on two- and four-component
96    vectors with 16-bit floating-point components.
97
98    Add new functions to the table
99
100        // Computes a new value per-component using the specified operation.
101        // Atomicity is only guaranteed on a per-component basis.
102        f16vec2 atomicAdd(inout f16vec2 mem, f16vec2 data);
103        f16vec4 atomicAdd(inout f16vec4 mem, f16vec4 data);
104        f16vec2 atomicMin(inout f16vec2 mem, f16vec2 data);
105        f16vec4 atomicMin(inout f16vec4 mem, f16vec4 data);
106        f16vec2 atomicMax(inout f16vec2 mem, f16vec2 data);
107        f16vec4 atomicMax(inout f16vec4 mem, f16vec4 data);
108        f16vec2 atomicExchange(inout f16vec2 mem, f16vec2 data);
109        f16vec4 atomicExchange(inout f16vec4 mem, f16vec4 data);
110
111
112    Modify Section 8.12, Image Functions (p. 164)
113
114    Add before the table of functions:
115
116    Some atomic memory operations are supported on two- and four-component
117    vectors with 16-bit floating-point components, for images with format
118    qualifiers of <rg16f> and <rgba16f>.
119
120    Add new functions to the table:
121
122        // Computes a new value per-component using the specified operation
123        // Atomicity is only guaranteed on a per-component basis.
124        f16vec2 imageAtomicAdd(IMAGE_PARAMS, f16vec2 data);
125        f16vec4 imageAtomicAdd(IMAGE_PARAMS, f16vec4 data);
126        f16vec2 imageAtomicMin(IMAGE_PARAMS, f16vec2 data);
127        f16vec4 imageAtomicMin(IMAGE_PARAMS, f16vec4 data);
128        f16vec2 imageAtomicMax(IMAGE_PARAMS, f16vec2 data);
129        f16vec4 imageAtomicMax(IMAGE_PARAMS, f16vec4 data);
130        f16vec2 imageAtomicExchange(IMAGE_PARAMS, f16vec2 data);
131        f16vec4 imageAtomicExchange(IMAGE_PARAMS, f16vec4 data);
132
133Dependencies on OES_shader_image_atomic
134
135    If implemented in OpenGL ES and OES_shader_image_atomic is not
136    supported, do not introduce additional imageAtomic* functions.
137
138Dependencies on NV_image_formats
139
140    If implemented in OpenGL ES and NV_image_formats is not
141    supported, remove references to two-component images of format
142    <rg16f>.
143
144Dependencies on NV_shader_buffer_store and NV_gpu_shader5
145    If NV_shader_buffer_store and NV_gpu_shader5 are supported, the following
146    functions should be added to the "Section 8.Y, Shader Memory Functions"
147    language in the NV_shader_buffer_store specification:
148
149      // Computes a new value per-component using the specified operation
150      // Atomicity is only guaranteed on a per-component basis.
151      f16vec2 atomicAdd(f16vec2 *address, f16vec2 data);
152      f16vec4 atomicAdd(f16vec4 *address, f16vec4 data);
153      f16vec2 atomicMin(f16vec2 *address, f16vec2 data);
154      f16vec4 atomicMin(f16vec4 *address, f16vec4 data);
155      f16vec2 atomicMax(f16vec2 *address, f16vec2 data);
156      f16vec4 atomicMax(f16vec4 *address, f16vec4 data);
157      f16vec2 atomicExchange(f16vec2 *address, f16vec2 data);
158      f16vec4 atomicExchange(f16vec4 *address, f16vec4 data);
159
160Dependencies on NV_gpu_program5, NV_shader_buffer_store, and
161NV_gpu_program5_mem_extended
162
163    If NV_gpu_program5 is supported and "OPTION NV_shader_atomic_fp16_vector"
164    is specified in an assembly program, "F16X2" and "F16X4" should be allowed
165    as storage modifiers to the ATOM instruction for the atomic operations
166    "ADD", "MIN", "MAX" and "EXCH". These operate on each of the two or four
167    fp16 values independently. Atomicity is only guaranteed on a per-component
168    basis.
169
170    (Add to "Section 2.X.6, Program Options" of the NV_gpu_program4 extension,
171    as extended by NV_gpu_program5:)
172
173      + Floating-Point Vector Atomic Operations (NV_shader_atomic_fp16_vector)
174
175      If a program specifies the "NV_shader_atomic_fp16_vector" option, it may
176      use the "F16X2" and "F16X4" storage modifiers with the "ATOM" opcodes to
177      perform atomic floating-point add or exchange operations.
178
179    (Add to the table in "Section 2.X.8.Z, ATOM" in NV_gpu_program5:)
180
181      atomic     storage
182      modifier   modifiers            operation
183      --------   ------------------   --------------------------------------
184       ADD       U32, S32, U64,       compute a sum
185                 F16X2, F16X4
186       MIN       U32, S32,            compute minimum
187                 F16X2, F16X4
188       MAX       U32, S32,            compute maximum
189                 F16X2, F16X4
190       EXCH      U32, S32, F32        exchange memory with operand
191                 F16X2, F16X4
192       ...
193
194Dependencies on EXT_shader_image_load_store and NV_gpu_program5
195
196    If EXT_shader_image_load_store and NV_gpu_program5 are supported and
197    "OPTION NV_shader_atomic_fp16_vector" is specified in an assembly program,
198    "F16X2" and "F16X4" should be allowed as storage modifiers to the ATOMIM
199    instruction for the atomic operations "ADD", "MIN", "MAX", and "EXCH".
200    These operate on each of the two or four fp16 values independently.
201    Atomicity is only guaranteed on a per-component basis.
202
203    (Add to the table in "Section 2.X.8.Z, ATOMIM" in the "Dependencies on
204    NV_gpu_program5" portion of the EXT_shader_image_load specification)
205
206      atomic     storage
207      modifier   modifiers       operation
208      --------   -------------   --------------------------------------
209       ADD       U32, S32,       compute a sum
210                 F16X2, F16X4
211       MIN       U32, S32,       compute minimum
212                 F16X2, F16X4
213       MAX       U32, S32,       compute maximum
214                 F16X2, F16X4
215       EXCH      U32, S32, F32   exchange memory with operand
216                 F16X2, F16X4
217       ...
218
219Dependencies on NV_compute_program5
220
221    If NV_compute_program5 is supported and "OPTION
222    NV_shader_atomic_fp16_vector" is specified in an assembly program, "F16X2"
223    and "F16X4" should be allowed as storage modifiers to the ATOMB instruction
224    for the atomic operations "ADD", "MIN", "MAX", and "EXCH". These operate on
225    each of the two or four fp16 values independently. Atomicity is only
226    guaranteed on a per-component basis.
227
228    (Add to the table in "Section 2.X.8.Z, ATOMB" in the "Dependencies on
229    NV_gpu_program5" portion of the NV_shader_storage_buffer_object
230    specification)
231
232      atomic     storage
233      modifier   modifiers          operation
234      --------   -------------      --------------------------------------
235       ADD       U32, S32, U64      compute a sum
236                 F32, F16X2, F16X4
237       MIN       U32, S32,          compute minimum
238                 F16X2, F16X4
239       MAX       U32, S32,          compute maximum
240                 F16X2, F16X4
241       EXCH      U32, S32, F32      exchange memory with operand
242                 F16X2, F16X4
243       ...
244
245Dependencies on NV_shader_storage_buffer_object
246
247    If NV_shader_storage_buffer_object is supported and "OPTION
248    NV_shader_atomic_fp16_vector" is specified in an assembly program, "F16X2"
249    and "F16X4" should be allowed as storage modifiers to the ATOMS instruction
250    for the atomic operations "ADD", "MIN", "MAX", and "EXCH". These operate on
251    each of the two or four fp16 values independently. Atomicity is only
252    guaranteed on a per-component basis.
253
254    (Add to the table in "Section 2.X.8.Z, ATOMS" in the "Dependencies on
255    NV_gpu_program5" portion of the NV_compute_program5 specification)
256
257      atomic     storage
258      modifier   modifiers          operation
259      --------   -------------      --------------------------------------
260       ADD       U32, S32, U64      compute a sum
261                 F32, F16X2, F16X4
262       MIN       U32, S32,          compute minimum
263                 F16X2, F16X4
264       MAX       U32, S32,          compute maximum
265                 F16X2, F16X4
266       EXCH      U32, S32, F32      exchange memory with operand
267                 F16X2, F16X4
268       ...
269
270
271Errors
272
273    None.
274
275New State
276
277    None.
278
279New Implementation Dependent State
280
281    None.
282
283Issues
284
285    (1) Should we allow "partial" atomics to a f16vec2 or f16vec4, only
286    modifying some of the components?
287
288    RESOLVED: No. If an app really cares to do this, they could inject
289    "special" values in those components that cause the atomic to have no
290    effect for that component (e.g. add zero, max with -infinity, etc).  This
291    would work for atomicAdd, atomicMin, and atomicMax, but not for
292    atomicExchange.
293
294    (2) Are these vector atomics guaranteed to update all components of the
295    vector atomically?
296
297    RESOLVED:  No.  The spec only guarantees that individual components of a
298    vector be updated atomically.  The initial implementation of this
299    extension will only atomically update pairs of components.  For many of
300    the algorithms supported by this extension (computing component-wise sums,
301    minimums, or maximums of multi-component vectors), it is not necessary to
302    update all components in a vector as a single unit.
303
304    (3) What support should we provide for four-component vectors?
305
306    RESOLVED:  All of image, global, buffer, and shared memory atomic
307    operations will fully support two- and four-component variants.  While one
308    might emulate some four-component atomic operations using pairs of
309    two-component operations, we choose to support four-component operations
310    universally.  Supporting atomics on four-component vectors seems useful,
311    as it supports computing sums, minimums, or maximums on RGBA color values
312    and other data with more than two components.
313
314Revision History
315
316    Revision 2
317    - Add OpenGL ES interactions
318    Revision 1
319    - Internal revisions.
320