1Name 2 3 NV_gpu_program5_mem_extended 4 5Name Strings 6 7 GL_NV_gpu_program5_mem_extended 8 9Contact 10 11 Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) 12 13Status 14 15 Shipping. 16 17Version 18 19 Last Modified Date: October 30, 2012 20 NVIDIA Revision: 1 21 22Number 23 24 OpenGL Extension #434 25 26Dependencies 27 28 NV_gpu_program5 is required. 29 30 This extension is written against the NV_gpu_program5 extension 31 specification, which itself is written against the NV_gpu_program4 and 32 OpenGL 2.0 Specifications. 33 34 This extension interacts trivially with EXT_shader_image_load_store, 35 NV_shader_storage_buffer_object, and NV_compute_program5. 36 37Overview 38 39 This extension provides a new set of storage modifiers that can be used by 40 NV_gpu_program5 assembly program instructions loading from or storing to 41 various forms of GPU memory. In particular, we provide support for loads 42 and stores using the storage modifiers: 43 44 .F16X2 .F16X4 .F16 (for 16-bit floating-point scalars/vectors) 45 .S8X2 .S8X4 (for 8-bit signed integer vectors) 46 .S16X2 .S16X4 (for 16-bit signed integer vectors) 47 .U8X2 .U8X4 (for 8-bit unsigned integer vectors) 48 .U16X2 .U16X4 (for 16-bit unsigned integer vectors) 49 50 These modifiers are allowed for the following load/store instructions: 51 52 LDC Load from constant buffer 53 54 LOAD Global load 55 STORE Global store 56 57 LOADIM Image load (via EXT_shader_image_load_store) 58 STOREIM Image store (via EXT_shader_image_load_store) 59 60 LDB Load from storage buffer (via 61 NV_shader_storage_buffer_object) 62 STB Store to storage buffer (via 63 NV_shader_storage_buffer_object) 64 65 LDS Load from shared memory (via NV_compute_program5) 66 STS Store to shared memory (via NV_compute_program5) 67 68 For assembly programs prior to this extension, it was necessary to access 69 memory using packed types and then unpack with additional shader 70 instructions. 71 72 Similar capabilities have already been provided in the OpenGL Shading 73 Language (GLSL) via the NV_gpu_shader5 extension, using the extended data 74 types provided there (e.g., "float16_t", "u8vec4", "s16vec2"). 75 76New Procedures and Functions 77 78 None. 79 80New Tokens 81 82 None. 83 84Additions to Chapter 2 of the OpenGL 2.0 Specification (OpenGL Operation) 85 86 (All modifications are relative to Section 2.X, GPU Programs, from the 87 NV_gpu_program4 specification.) 88 89 Modify Section 2.X.2, Program Grammar 90 91 (add after the long list of grammar rules) If a program specifies the 92 NV_gpu_program5_mem_extended program option, the following rules are added 93 to the NV_gpu_program5 base program grammar: 94 95 <opModifier> ::= "F16X2" 96 | "F16X4" 97 | "S8X2" 98 | "S8X4" 99 | "S16X2" 100 | "S16X4" 101 | "U8X2" 102 | "U8X4" 103 | "U16X2" 104 | "U16X4" 105 106 (Note: This extension also provides new capabilities for the "F16" 107 modifier. Since it was already supported in NV_gpu_program5, it isn't 108 being added to the grammar here.) 109 110 111 Modify Section 2.X.4.1, Program Instruction Modifiers 112 113 (add to Table X.14 of the NV_gpu_program4 specification.) 114 115 Modifier Description 116 -------- --------------------------------------------------- 117 F16 Convert to or from one 16-bit floating-point value, 118 or access one 16-bit floating-point value 119 120 F16X2 Access two 16-bit floating-point values 121 F16X4 Access four 16-bit floating-point values 122 S8X2 Access two 8-bit signed integer values 123 S8X4 Access four 8-bit signed integer values 124 S16X2 Access two 16-bit signed integer values 125 S16X4 Access four 16-bit signed integer values 126 U8X2 Access two 8-bit unsigned integer values 127 U8X4 Access four 8-bit unsigned integer values 128 U16X2 Access two 16-bit unsigned integer values 129 U16X4 Access four 16-bit unsigned integer values 130 131 (modify discussion of storage modifiers for load and store operations, 132 adding the entries added to the table above) 133 134 For load and store operations, the "F32", "F32X2", "F32X4", "F64", 135 "F64X2", "F64X4", "S8", "S8X2", "S8X4", "S16", "S16X2", "S16X4", "S32", 136 "S32X2", "S32X4", "S64", "S64X2", "S64X4", "U8", "U8X2", "U8X4", "U16", 137 "U16X2", "U16X4", "U32", "U32X2", "U32X4", "U64", "U64X2", "U64X4", "F16", 138 "F16X2", and "F16X4" storage modifiers control how data are loaded from or 139 stored to memory. ... 140 141 142 Modify Section 2.X.4.5, Program Memory Access, from NV_gpu_program5 143 144 (update pseudocode for BufferMemoryLoad) 145 146 result_t_vec BufferMemoryLoad(char *address, OpModifier modifier) 147 { 148 result_t_vec result = { 0, 0, 0, 0 }; 149 switch (modifier) { 150 151 /* Existing cases and code from NV_gpu_program5 unchanged. */ 152 153 case F16: 154 result.x = ((float16_t *)address)[0]; 155 break; 156 case F16X2: 157 result.x = ((float16_t *)address)[0]; 158 result.y = ((float16_t *)address)[1]; 159 break; 160 case S8X2: 161 result.x = ((int8_t *)address)[0]; 162 result.y = ((int8_t *)address)[1]; 163 break; 164 case S8X4: 165 result.x = ((int8_t *)address)[0]; 166 result.y = ((int8_t *)address)[1]; 167 result.z = ((int8_t *)address)[2]; 168 result.w = ((int8_t *)address)[3]; 169 break; 170 case S16X2: 171 result.x = ((int16_t *)address)[0]; 172 result.y = ((int16_t *)address)[1]; 173 break; 174 case S16X4: 175 result.x = ((int16_t *)address)[0]; 176 result.y = ((int16_t *)address)[1]; 177 result.z = ((int16_t *)address)[2]; 178 result.w = ((int16_t *)address)[3]; 179 break; 180 case U8X2: 181 result.x = ((uint8_t *)address)[0]; 182 result.y = ((uint8_t *)address)[1]; 183 break; 184 case U8X4: 185 result.x = ((uint8_t *)address)[0]; 186 result.y = ((uint8_t *)address)[1]; 187 result.z = ((uint8_t *)address)[2]; 188 result.w = ((uint8_t *)address)[3]; 189 break; 190 case U16X2: 191 result.x = ((uint16_t *)address)[0]; 192 result.y = ((uint16_t *)address)[1]; 193 break; 194 case U16X4: 195 result.x = ((uint16_t *)address)[0]; 196 result.y = ((uint16_t *)address)[1]; 197 result.z = ((uint16_t *)address)[2]; 198 result.w = ((uint16_t *)address)[3]; 199 break; 200 } 201 return result; 202 } 203 204 (update pseudocode for BufferMemoryStore) 205 206 void BufferMemoryStore(char *address, operand_t_vec operand, 207 OpModifier modifier) 208 { 209 switch (modifier) { 210 211 /* Existing cases and code from NV_gpu_program5 unchanged. */ 212 213 case F16: 214 ((float16_t *)address)[0] = operand.x; 215 break; 216 case F16X2: 217 ((float16_t *)address)[0] = operand.x; 218 ((float16_t *)address)[1] = operand.y; 219 break; 220 case S8X2: 221 ((int8_t *)address)[0] = operand.x; 222 ((int8_t *)address)[1] = operand.y; 223 break; 224 case S8X4: 225 ((int8_t *)address)[0] = operand.x; 226 ((int8_t *)address)[1] = operand.y; 227 ((int8_t *)address)[2] = operand.z; 228 ((int8_t *)address)[3] = operand.w; 229 break; 230 case S16X2: 231 ((int16_t *)address)[0] = operand.x; 232 ((int16_t *)address)[1] = operand.y; 233 break; 234 case S16X4: 235 ((int16_t *)address)[0] = operand.x; 236 ((int16_t *)address)[1] = operand.y; 237 ((int16_t *)address)[2] = operand.z; 238 ((int16_t *)address)[3] = operand.w; 239 break; 240 case U8X2: 241 ((uint8_t *)address)[0] = operand.x; 242 ((uint8_t *)address)[1] = operand.y; 243 break; 244 case U8X4: 245 ((uint8_t *)address)[0] = operand.x; 246 ((uint8_t *)address)[1] = operand.y; 247 ((uint8_t *)address)[2] = operand.z; 248 ((uint8_t *)address)[3] = operand.w; 249 break; 250 case U16X2: 251 ((uint16_t *)address)[0] = operand.x; 252 ((uint16_t *)address)[1] = operand.y; 253 break; 254 case U16X4: 255 ((uint16_t *)address)[0] = operand.x; 256 ((uint16_t *)address)[1] = operand.y; 257 ((uint16_t *)address)[2] = operand.z; 258 ((uint16_t *)address)[3] = operand.w; 259 break; 260 } 261 } 262 263 (modify paragraph to indicate the alignment requirement for new storage 264 modifiers) The address used for global memory loads or stores or offset 265 used for constant buffer loads must be aligned to the fetch size 266 corresponding to the storage opcode modifier. For S8 and U8, the offset 267 has no alignment requirements. For F16, S8X2, S16, U8X2, and U16, the 268 offset must be a multiple of two basic machine units. For F32, S32, and 269 U32, F16X2, S16X2, U16X2, S8X4, and U8X4, the offset must be a multiple of 270 four. For F32X2, F64, S32X2, S64, U32X2, U64, S16X4, and U16X4, the 271 offset must be a multiple of eight. ... If an offset is not correctly 272 aligned, the values returned by a buffer memory load will be undefined, 273 and the effects of a buffer memory store will also be undefined. 274 275 276 Modify Section 2.X.6, Program Options 277 278 + Extended Memory Format Support (NV_gpu_program5_mem_extended) 279 280 If a program specifies the "NV_gpu_program5_mem_extended" option, it may 281 use the "F16", "F16X2", "F16X4", "S8X2", "S8X4", "S16X2", "S16X4", "U8X2", 282 "U8X4", "U16X2", and "U16X4" storage modifiers on instructions loading 283 values from memory or storing values to memory (LDC, LOAD, STORE, LOADIM, 284 STOREIM, LDB, STB, LDS, STS). 285 286 287Additions to Chapter 3 of the OpenGL 2.0 Specification (Rasterization) 288 289 None. 290 291Additions to Chapter 4 of the OpenGL 2.0 Specification (Per-Fragment 292Operations and the Frame Buffer) 293 294 None. 295 296Additions to Chapter 5 of the OpenGL 2.0 Specification (Special Functions) 297 298 None. 299 300Additions to Chapter 6 of the OpenGL 2.0 Specification (State and 301State Requests) 302 303 None. 304 305Additions to Appendix A of the OpenGL 2.0 Specification (Invariance) 306 307 None. 308 309Additions to the AGL/GLX/WGL Specifications 310 311 None. 312 313Dependencies on EXT_shader_image_load_store, NV_shader_storage_buffer_object, 314and NV_compute_program5 315 316 If EXT_shader_image_load_store is not supported, references to the LOADIM 317 and STOREIM opcodes should be removed. 318 319 If NV_shader_storage_buffer_object is not supported, references to the LDB 320 and STB opcodes should be removed. 321 322 If NV_compute_program5 is not supported, references to the LDS and STS 323 opcodes should be removed. 324 325Errors 326 327 None. 328 329New State 330 331 None. 332 333New Implementation Dependent State 334 335 None. 336 337Issues 338 339 (1) Should this extension have its own extension string entry, or should 340 its existence be inferred from the NV_gpu_program5 extension or some 341 other extension? 342 343 RESOLVED: Provide a separate extension string entry, since this 344 functionality was added after NV_gpu_program5 was published and may not 345 be available on older drivers supporting NV_gpu_program5. 346 347Revision History 348 349 Revision 1, October 30, 2012 (pbrown): Initial revision. 350