1Name 2 3 NVX_gpu_multicast2 4 5Name Strings 6 7 GL_NVX_gpu_multicast2 8 9Contact 10 11 Joshua Schnarr, NVIDIA Corporation (jschnarr 'at' nvidia.com) 12 Ingo Esser, NVIDIA Corporation (iesser 'at' nvidia.com) 13 14Contributors 15 16 Robert Menzel, NVIDIA 17 Ralf Biermann, NVIDIA 18 19Status 20 21 Complete. 22 23Version 24 25 Last Modified Date: July 23, 2019 26 Author Revision: 8 27 28Number 29 30 OpenGL Extension #543 31 32Dependencies 33 34 This extension is written against the OpenGL 4.6 specification 35 (Compatibility Profile), dated October 24, 2016. 36 37 This extension requires NV_gpu_multicast. 38 39 This extension requires EXT_device_group. 40 41 This extension requires NV_viewport_array. 42 43 This extension requires NV_clip_space_w_scaling. 44 45 This extension requires NVX_progress_fence. 46 47 48Overview 49 50 This extension provides additional mechanisms that influence multicast rendering which is 51 simultaneous rendering to multiple GPUs. 52 53New Procedures and Functions 54 55 uint AsyncCopyImageSubDataNVX( 56 sizei waitSemaphoreCount, const uint *waitSemaphoreArray, const uint64 *waitValueArray, 57 uint srcGpu, GLbitfield dstGpuMask, 58 uint srcName, GLenum srcTarget, int srcLevel, int srcX, int srcY, int srcZ, 59 uint dstName, GLenum dstTarget, int dstLevel, int dstX, int dstY, int dstZ, 60 sizei srcWidth, sizei srcHeight, sizei srcDepth, 61 sizei signalSemaphoreCount, const uint *signalSemaphoreArray, const uint64 *signalValueArray); 62 63 sync AsyncCopyBufferSubDataNVX( 64 sizei waitSemaphoreCount, const uint *waitSemaphoreArray, const uint64 *fenceValueArray, 65 uint readGpu, GLbitfield writeGpuMask, 66 uint readBuffer, uint writeBuffer, 67 GLintptr readOffset, GLintptr writeOffset, sizeiptr size, 68 sizei signalSemaphoreCount, const uint *signalSemaphoreArray, const uint64 *signalValueArray); 69 70 void UploadGpuMaskNVX(bitfield mask); 71 72 void MulticastViewportArrayvNVX(uint gpu, uint first, sizei count, const float *v); 73 74 void MulticastScissorArrayvNVX(uint gpu, uint first, sizei count, const int *v); 75 76 void MulticastViewportPositionWScaleNVX(uint gpu, uint index, float xcoeff, float ycoeff); 77 78 79New Tokens 80 81 Accepted by the <pname> parameter of GetIntegerv and GetInteger64v: 82 83 UPLOAD_GPU_MASK_NVX 0x954A 84 85Additions to Chapter 20 (Multicast Rendering) added to the OpenGL 4.5 (Compatibility Profile) 86Specification by NV_gpu_multicast 87 88 Additions to Section 20.1 (Controlling Individual GPUs) 89 90 Texture data uploads using the functions TexImage1D, TexImage2D, TexImage3D, 91 TexSubImage1D, TexSubImage2D and TexSubImage3D are restricted to a specific set of GPUs with 92 93 void UploadGpuMaskNVX(bitfield mask); 94 95 This command also restricts buffer object data uploads using the functions BufferStorage, 96 NamedBufferStorage, BufferSubData and NamedBufferSubData to the specified set of GPUs. 97 98 Further this command also restricts buffer object clears using the functions ClearBufferData, 99 ClearNamedBufferData, ClearBufferSubData and ClearNamedBufferSubData. 100 101 The following errors apply to UploadGpuMaskNVX: 102 103 INVALID_VALUE is generated 104 * if <mask> is zero, 105 * if <mask> is greater than or equal to 2^n, where n is equal to MULTICAST_GPUS_NV 106 107 If the command does not generate an error, UPLOAD_GPU_MASK_NVX is set to <mask>. 108 109 The default value of UPLOAD_GPU_MASK_NVX is (2^n)-1. 110 111 If a function restricted by UploadGpuMaskNVX operates on textures or buffer objects 112 with GPU-shared storage type (as opposed to per-GPU storage), UPLOAD_GPU_MASK_NVX is ignored. 113 114 Modify Section 20.2 (Multi-GPU Buffer Storage) 115 116 Append the following paragraphs: 117 118 To initiate a copy of buffer data without waiting for it to complete, use the following command: 119 120 void AsyncCopyBufferSubDataNVX( 121 sizei waitSemaphoreCount, const uint *waitSemaphoreArray, const uint64 *fenceValueArray, 122 uint readGpu, GLbitfield writeGpuMask, 123 uint readBuffer, uint writeBuffer, 124 GLintptr readOffset, GLintptr writeOffset, sizeiptr size, 125 sizei signalSemaphoreCount, const uint *signalSemaphoreArray, const uint64 *signalValueArray); 126 127 This command behaves equivalently to MulticastCopyBufferSubDataNV, except that it may be 128 performed concurrently with commands submitted in the future. 129 Fence semaphore objects created with CreateProgressFenceNVX are used for synchronization of one or 130 multiple copies. 131 An array of <waitSemaphoreCount> synchronization objects can be specified in the <waitSemaphoresArray> 132 parameter as a pointer to the array of semaphore objects. 133 The copy will wait for all fence semaphores in the <waitSemaphoreArray> array to be reach or exceed 134 their corresponding fence value in <fenceValueArray> before starting the transfer. 135 A signal operation for each of the <signalSemaphoreCount> semaphores in <signalSemaphoresArray> is written 136 after the copy with the corresponding fence value in <signalValueArray>. 137 To wait for the copy to complete, use WaitSemaphoreui64NVX or ClientWaitSemaphoreui64NVX to wait 138 for the semaphores in <signalSemaphoreArray> to be signalled with the fence values in <signalValueArray>. 139 140 Modify Section 20.3.1 (Copying Image Data Between GPUs) 141 142 Insert the following paragraphs above the line starting "To copy pixel values": 143 144 To initiate a copy of texel data without waiting for it to complete, use the following command: 145 146 void AsyncCopyImageSubDataNVX( 147 sizei waitSemaphoreCount, const uint *waitSemaphoreArray, const uint64 *waitValueArray, 148 uint srcGpu, GLbitfield dstGpuMask, 149 uint srcName, GLenum srcTarget, int srcLevel, int srcX, int srcY, int srcZ, 150 uint dstName, GLenum dstTarget, int dstLevel, int dstX, int dstY, int dstZ, 151 sizei srcWidth, sizei srcHeight, sizei srcDepth, 152 sizei signalSemaphoreCount, const uint *signalSemaphoreArray, const uint64 *signalValueArray); 153 154 This command behaves equivalently to MulticastCopyImageSubDataNV, except that it may be 155 performed concurrently with commands submitted in the future. 156 Fence semaphore objects created with CreateProgressFenceNVX are used for synchronization of one or 157 multiple copies. An array of <waitSemaphoreCount> synchronization objects can be specified in the 158 <waitSemaphoreArray> parameter as a pointer to the array of semaphore objects. 159 The copy will wait for all fence semaphores in the <waitSemaphoresArray> array to be reach or exceed 160 their corresponding fence value in <fenceValueArray> before starting the transfer. 161 A signal operation for each of the <signalSemaphoreCount> semaphores in <signalSemaphoreArray> is written 162 after the copy with the corresponding fence value in <signalValueArray>. 163 To wait for the copy to complete, use WaitSemaphoreui64NVX or ClientWaitSemaphoreui64NVX to wait 164 for the semaphores in <signalSemaphoresArray> to be signalled with the fence values in <signalValueArray>. 165 166Additions to Chapter 13 (Fixed-Function Vertex Post-Processing) added to the OpenGL 4.5 (Compatibility Profile) 167 168 Modify Section 13.6 (Coordinate transformations) 169 170 Viewport transformation parameters for multiple viewports are specified using 171 172 MulticastViewportArrayvNVX(uint gpu, uint first, sizei count, const float * v); 173 174 where the array of viewport parameters can be controlled for each multicast GPU, respectively. 175 176 A set of scissor rectangles that are each applied to the corresponding viewport is specified 177 using 178 179 MulticastScissorArrayvNVX(uint gpu, uint first, sizei count, const int *v); 180 181 where the rectangle parameters can be controlled for each multicast GPU, respectively. 182 183 184 If VIEWPORT_POSITION_W_SCALE_NV is enabled, the w coordinates for each 185 primitive sent to a given viewport will be scaled as a function of 186 its x and y coordinates using the following equation: 187 188 w' = xcoeff * x + ycoeff * y + w; 189 190 The coefficients for "x" and "y" used in the above equation depend on the 191 viewport index and can be controlled for each multicast GPU, respectively, by the command 192 193 MulticastViewportPositionWScaleNVX(uint gpu, uint index, float xcoeff, float ycoeff); 194 195 An error INVALID_VALUE error is generated if <gpu> is greater than or equal to MULTICAST_GPUS_NV. 196 197Additions to the OpenGL Shading Language Specification, Version 4.50 198 199 Including the following line in a shader can be used to enumerate multicast GPUs 200 by using the shader built-in variable gl_DeviceIndex: 201 202 #extension GL_EXT_device_group : enable 203 204 Each multicast GPU contains a unique device index in the gl_DeviceIndex variable. 205 206Errors 207 208 Relaxation of INVALID_ENUM errors 209 --------------------------------- 210 GetIntegerv and GetInteger64v now accept new tokens as 211 described in the "New Tokens" section. 212 213New State 214 215 Additions to Table 23.6 Buffer Object State 216 Initial 217 Get Value Type Get Command Value Description Sec. Attribute 218 -------------------------- ------ ----------- ----- ----------------------- ---- --------- 219 UPLOAD_GPU_MASK_NVX Z+ GetIntegerv * Mask of GPUs that 20.1 - 220 restricts buffer data 221 writes 222 * See section 20.1 223 224 225New Implementation Dependent State 226 227 None. 228 229Sample Code 230 231 None. 232 233Issues 234 235 None. 236 237Revision History 238 239 Rev. Date Author Changes 240 ---- -------- -------- ----------------------------------------------- 241 1 09/20/17 jschnarr initial draft 242 2 02/23/18 rbiermann updated draft with new functions 243 3 05/23/18 rbiermann updated draft with new ViewportArray and AsyncCopy functions 244 4 06/08/18 rbiermann added NVX_progress_fence for synchronization objects 245 5 08/15/18 rbiermann updated draft with gl_deviceIndex 246 6 04/16/19 rbiermann updated draft with UploadGpuMaskNVX 247 7 07/19/19 rbiermann updated draft with modifications of UploadGpuMaskNVX section 248 8 07/23/19 rbiermann updated draft with support of Clear(Named)Buffer(Sub)Data by UploadGpuMaskNVX 249 250