• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Name
2
3    NV_gpu_program5_mem_extended
4
5Name Strings
6
7    GL_NV_gpu_program5_mem_extended
8
9Contact
10
11    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
12
13Status
14
15    Shipping.
16
17Version
18
19    Last Modified Date:         October 30, 2012
20    NVIDIA Revision:            1
21
22Number
23
24    OpenGL Extension #434
25
26Dependencies
27
28    NV_gpu_program5 is required.
29
30    This extension is written against the NV_gpu_program5 extension
31    specification, which itself is written against the NV_gpu_program4 and
32    OpenGL 2.0 Specifications.
33
34    This extension interacts trivially with EXT_shader_image_load_store,
35    NV_shader_storage_buffer_object, and NV_compute_program5.
36
37Overview
38
39    This extension provides a new set of storage modifiers that can be used by
40    NV_gpu_program5 assembly program instructions loading from or storing to
41    various forms of GPU memory.  In particular, we provide support for loads
42    and stores using the storage modifiers:
43
44        .F16X2  .F16X4  .F16    (for 16-bit floating-point scalars/vectors)
45        .S8X2   .S8X4           (for 8-bit signed integer vectors)
46        .S16X2  .S16X4          (for 16-bit signed integer vectors)
47        .U8X2   .U8X4           (for 8-bit unsigned integer vectors)
48        .U16X2  .U16X4          (for 16-bit unsigned integer vectors)
49
50    These modifiers are allowed for the following load/store instructions:
51
52        LDC             Load from constant buffer
53
54        LOAD            Global load
55        STORE           Global store
56
57        LOADIM          Image load (via EXT_shader_image_load_store)
58        STOREIM         Image store (via EXT_shader_image_load_store)
59
60        LDB             Load from storage buffer (via
61                          NV_shader_storage_buffer_object)
62        STB             Store to storage buffer (via
63                          NV_shader_storage_buffer_object)
64
65        LDS             Load from shared memory (via NV_compute_program5)
66        STS             Store to shared memory (via NV_compute_program5)
67
68    For assembly programs prior to this extension, it was necessary to access
69    memory using packed types and then unpack with additional shader
70    instructions.
71
72    Similar capabilities have already been provided in the OpenGL Shading
73    Language (GLSL) via the NV_gpu_shader5 extension, using the extended data
74    types provided there (e.g., "float16_t", "u8vec4", "s16vec2").
75
76New Procedures and Functions
77
78    None.
79
80New Tokens
81
82    None.
83
84Additions to Chapter 2 of the OpenGL 2.0 Specification (OpenGL Operation)
85
86    (All modifications are relative to Section 2.X, GPU Programs, from the
87     NV_gpu_program4 specification.)
88
89    Modify Section 2.X.2, Program Grammar
90
91    (add after the long list of grammar rules) If a program specifies the
92    NV_gpu_program5_mem_extended program option, the following rules are added
93    to the NV_gpu_program5 base program grammar:
94
95    <opModifier>            ::= "F16X2"
96                              | "F16X4"
97                              | "S8X2"
98                              | "S8X4"
99                              | "S16X2"
100                              | "S16X4"
101                              | "U8X2"
102                              | "U8X4"
103                              | "U16X2"
104                              | "U16X4"
105
106    (Note:  This extension also provides new capabilities for the "F16"
107     modifier.  Since it was already supported in NV_gpu_program5, it isn't
108     being added to the grammar here.)
109
110
111    Modify Section 2.X.4.1, Program Instruction Modifiers
112
113    (add to Table X.14 of the NV_gpu_program4 specification.)
114
115      Modifier  Description
116      --------  ---------------------------------------------------
117      F16       Convert to or from one 16-bit floating-point value,
118                or access one 16-bit floating-point value
119
120      F16X2     Access two 16-bit floating-point values
121      F16X4     Access four 16-bit floating-point values
122      S8X2      Access two 8-bit signed integer values
123      S8X4      Access four 8-bit signed integer values
124      S16X2     Access two 16-bit signed integer values
125      S16X4     Access four 16-bit signed integer values
126      U8X2      Access two 8-bit unsigned integer values
127      U8X4      Access four 8-bit unsigned integer values
128      U16X2     Access two 16-bit unsigned integer values
129      U16X4     Access four 16-bit unsigned integer values
130
131    (modify discussion of storage modifiers for load and store operations,
132     adding the entries added to the table above)
133
134    For load and store operations, the "F32", "F32X2", "F32X4", "F64",
135    "F64X2", "F64X4", "S8", "S8X2", "S8X4", "S16", "S16X2", "S16X4", "S32",
136    "S32X2", "S32X4", "S64", "S64X2", "S64X4", "U8", "U8X2", "U8X4", "U16",
137    "U16X2", "U16X4", "U32", "U32X2", "U32X4", "U64", "U64X2", "U64X4", "F16",
138    "F16X2", and "F16X4" storage modifiers control how data are loaded from or
139    stored to memory. ...
140
141
142    Modify Section 2.X.4.5, Program Memory Access, from NV_gpu_program5
143
144    (update pseudocode for BufferMemoryLoad)
145
146      result_t_vec BufferMemoryLoad(char *address, OpModifier modifier)
147      {
148        result_t_vec result = { 0, 0, 0, 0 };
149        switch (modifier) {
150
151        /* Existing cases and code from NV_gpu_program5 unchanged. */
152
153        case F16:
154            result.x = ((float16_t *)address)[0];
155            break;
156        case F16X2:
157            result.x = ((float16_t *)address)[0];
158            result.y = ((float16_t *)address)[1];
159            break;
160        case S8X2:
161            result.x = ((int8_t *)address)[0];
162            result.y = ((int8_t *)address)[1];
163            break;
164        case S8X4:
165            result.x = ((int8_t *)address)[0];
166            result.y = ((int8_t *)address)[1];
167            result.z = ((int8_t *)address)[2];
168            result.w = ((int8_t *)address)[3];
169            break;
170        case S16X2:
171            result.x = ((int16_t *)address)[0];
172            result.y = ((int16_t *)address)[1];
173            break;
174        case S16X4:
175            result.x = ((int16_t *)address)[0];
176            result.y = ((int16_t *)address)[1];
177            result.z = ((int16_t *)address)[2];
178            result.w = ((int16_t *)address)[3];
179            break;
180        case U8X2:
181            result.x = ((uint8_t *)address)[0];
182            result.y = ((uint8_t *)address)[1];
183            break;
184        case U8X4:
185            result.x = ((uint8_t *)address)[0];
186            result.y = ((uint8_t *)address)[1];
187            result.z = ((uint8_t *)address)[2];
188            result.w = ((uint8_t *)address)[3];
189            break;
190        case U16X2:
191            result.x = ((uint16_t *)address)[0];
192            result.y = ((uint16_t *)address)[1];
193            break;
194        case U16X4:
195            result.x = ((uint16_t *)address)[0];
196            result.y = ((uint16_t *)address)[1];
197            result.z = ((uint16_t *)address)[2];
198            result.w = ((uint16_t *)address)[3];
199            break;
200        }
201        return result;
202      }
203
204    (update pseudocode for BufferMemoryStore)
205
206      void BufferMemoryStore(char *address, operand_t_vec operand,
207                             OpModifier modifier)
208      {
209        switch (modifier) {
210
211        /* Existing cases and code from NV_gpu_program5 unchanged. */
212
213        case F16:
214            ((float16_t *)address)[0] = operand.x;
215            break;
216        case F16X2:
217            ((float16_t *)address)[0] = operand.x;
218            ((float16_t *)address)[1] = operand.y;
219            break;
220        case S8X2:
221            ((int8_t *)address)[0] = operand.x;
222            ((int8_t *)address)[1] = operand.y;
223            break;
224        case S8X4:
225            ((int8_t *)address)[0] = operand.x;
226            ((int8_t *)address)[1] = operand.y;
227            ((int8_t *)address)[2] = operand.z;
228            ((int8_t *)address)[3] = operand.w;
229            break;
230        case S16X2:
231            ((int16_t *)address)[0] = operand.x;
232            ((int16_t *)address)[1] = operand.y;
233            break;
234        case S16X4:
235            ((int16_t *)address)[0] = operand.x;
236            ((int16_t *)address)[1] = operand.y;
237            ((int16_t *)address)[2] = operand.z;
238            ((int16_t *)address)[3] = operand.w;
239            break;
240        case U8X2:
241            ((uint8_t *)address)[0] = operand.x;
242            ((uint8_t *)address)[1] = operand.y;
243            break;
244        case U8X4:
245            ((uint8_t *)address)[0] = operand.x;
246            ((uint8_t *)address)[1] = operand.y;
247            ((uint8_t *)address)[2] = operand.z;
248            ((uint8_t *)address)[3] = operand.w;
249            break;
250        case U16X2:
251            ((uint16_t *)address)[0] = operand.x;
252            ((uint16_t *)address)[1] = operand.y;
253            break;
254        case U16X4:
255            ((uint16_t *)address)[0] = operand.x;
256            ((uint16_t *)address)[1] = operand.y;
257            ((uint16_t *)address)[2] = operand.z;
258            ((uint16_t *)address)[3] = operand.w;
259            break;
260        }
261      }
262
263    (modify paragraph to indicate the alignment requirement for new storage
264    modifiers) The address used for global memory loads or stores or offset
265    used for constant buffer loads must be aligned to the fetch size
266    corresponding to the storage opcode modifier.  For S8 and U8, the offset
267    has no alignment requirements.  For F16, S8X2, S16, U8X2, and U16, the
268    offset must be a multiple of two basic machine units.  For F32, S32, and
269    U32, F16X2, S16X2, U16X2, S8X4, and U8X4, the offset must be a multiple of
270    four.  For F32X2, F64, S32X2, S64, U32X2, U64, S16X4, and U16X4, the
271    offset must be a multiple of eight.  ...  If an offset is not correctly
272    aligned, the values returned by a buffer memory load will be undefined,
273    and the effects of a buffer memory store will also be undefined.
274
275
276    Modify Section 2.X.6, Program Options
277
278    + Extended Memory Format Support (NV_gpu_program5_mem_extended)
279
280    If a program specifies the "NV_gpu_program5_mem_extended" option, it may
281    use the "F16", "F16X2", "F16X4", "S8X2", "S8X4", "S16X2", "S16X4", "U8X2",
282    "U8X4", "U16X2", and "U16X4" storage modifiers on instructions loading
283    values from memory or storing values to memory (LDC, LOAD, STORE, LOADIM,
284    STOREIM, LDB, STB, LDS, STS).
285
286
287Additions to Chapter 3 of the OpenGL 2.0 Specification (Rasterization)
288
289    None.
290
291Additions to Chapter 4 of the OpenGL 2.0 Specification (Per-Fragment
292Operations and the Frame Buffer)
293
294    None.
295
296Additions to Chapter 5 of the OpenGL 2.0 Specification (Special Functions)
297
298    None.
299
300Additions to Chapter 6 of the OpenGL 2.0 Specification (State and
301State Requests)
302
303    None.
304
305Additions to Appendix A of the OpenGL 2.0 Specification (Invariance)
306
307    None.
308
309Additions to the AGL/GLX/WGL Specifications
310
311    None.
312
313Dependencies on EXT_shader_image_load_store, NV_shader_storage_buffer_object,
314and NV_compute_program5
315
316    If EXT_shader_image_load_store is not supported, references to the LOADIM
317    and STOREIM opcodes should be removed.
318
319    If NV_shader_storage_buffer_object is not supported, references to the LDB
320    and STB opcodes should be removed.
321
322    If NV_compute_program5 is not supported, references to the LDS and STS
323    opcodes should be removed.
324
325Errors
326
327    None.
328
329New State
330
331    None.
332
333New Implementation Dependent State
334
335    None.
336
337Issues
338
339    (1) Should this extension have its own extension string entry, or should
340        its existence be inferred from the NV_gpu_program5 extension or some
341        other extension?
342
343      RESOLVED:  Provide a separate extension string entry, since this
344      functionality was added after NV_gpu_program5 was published and may not
345      be available on older drivers supporting NV_gpu_program5.
346
347Revision History
348
349    Revision 1, October 30, 2012 (pbrown):  Initial revision.
350