1Name 2 3 WGL_NV_gpu_affinity 4 5Name Strings 6 7 WGL_NV_gpu_affinity 8 9Contact 10 11 Barthold Lichtenbelt, NVIDIA (blichtenbelt 'at' nvidia.com) 12 13Notice 14 15 Copyright NVIDIA Corporation, 2005-2006. 16 17Status 18 19 Completed. 20 21Version 22 23 Last Modified Date: 11/08/2006 24 Author revision: 11 25 26Number 27 28 355 29 30Dependencies 31 32 WGL_ARB_extensions_string is required. 33 34 This extension interacts with WGL_ARB_make_current_read. 35 36 This extension interacts with WGL_ARB_pbuffer. 37 38 This extension interacts with GL_EXT_framebuffer_object 39 40Overview 41 42 On systems with more than one GPU it is desirable to be able to 43 select which GPU(s) in the system become the target for OpenGL 44 rendering commands. This extension introduces the concept of a GPU 45 affinity mask. OpenGL rendering commands are directed to the 46 GPU(s) specified by the affinity mask. GPU affinity is immutable. 47 Once set, it cannot be changed. 48 49 This extension also introduces the concept called affinity-DC. An 50 affinity-DC is a device context with a GPU affinity mask embedded 51 in it. This restricts the device context to only allow OpenGL 52 commands to be sent to the GPU(s) in the affinity mask. 53 54 Handles for the GPUs present in a system are enumerated with the 55 command wglEnumGpusNV. An affinity-DC is created by calling 56 wglCreateAffinityDCNV. This function takes a list of GPU handles, 57 which make up the affinity mask. An affinity-DC can also 58 indirectly be created by obtaining a DC from a pBuffer handle, by 59 calling wglGetPbufferDC, which in turn was created from an 60 affinity-DC by calling wglCreatePbuffer. 61 62 A context created from an affinity DC will inherit the GPU 63 affinity mask from the DC. Once inherited, it cannot be changed. 64 Such a context is called an affinity-context. This restricts the 65 affinity-context to only allow OpenGL commands to be sent to those 66 GPU(s) in its affinity mask. Once created, this context can be 67 used in two ways: 68 69 1. Make the affinity-context current to an affinity-DC. This 70 will only succeed if the context's affinity mask is the same 71 as the affinity mask in the DC. There is no window 72 associated with an affinity DC, therefore this is a way to 73 achieve off-screen rendering to an OpenGL context. This can 74 either be rendering to a pBuffer, or an application created 75 framebuffer object. In the former case, the affinity-mask of 76 the pBuffer DC, which is obtained from a pBuffer handle, 77 will be the same affinity-mask as the DC used to created the 78 pBuffer handle. In the latter case, the default framebuffer 79 object will be incomplete because there is no window-system 80 created framebuffer. Therefore, the application will have to 81 create and bind a framebuffer object as the target for 82 rendering. 83 2. Make the affinity-context current to a DC obtained from a 84 window. Rendering only happens to the sub rectangles(s) of 85 the window that overlap the parts of the desktop that are 86 displayed by the GPU(s) in the affinity mask of the context. 87 88 Sharing OpenGL objects between affinity-contexts, by calling 89 wglShareLists, will only succeed if the contexts have identical 90 affinity masks. 91 92 It is not possible to make a regular context (one without an 93 affinity mask) current to an affinity-DC. This would mean a way 94 for a context to inherit affinity information, which makes the 95 context affinity mutable, which is counter to the premise of this 96 extension. 97 98New Procedures, Functions and Structures: 99 100 DECLARE_HANDLE(HGPUNV); 101 102 typedef struct _GPU_DEVICE { 103 DWORD cb; 104 CHAR DeviceName[32]; 105 CHAR DeviceString[128]; 106 DWORD Flags; 107 RECT rcVirtualScreen; 108 } GPU_DEVICE, *PGPU_DEVICE; 109 110 BOOL wglEnumGpusNV(UINT iGpuIndex, 111 HGPUNV *phGpu); 112 113 BOOL wglEnumGpuDevicesNV(HGPUNV hGpu, 114 UINT iDeviceIndex, 115 PGPU_DEVICE lpGpuDevice); 116 117 HDC wglCreateAffinityDCNV(const HGPUNV *phGpuList); 118 119 BOOL wglEnumGpusFromAffinityDCNV(HDC hAffinityDC, 120 UINT iGpuIndex, 121 HGPUNV *hGpu); 122 123 BOOL wglDeleteDCNV(HDC hdc); 124 125New Tokens 126 127 New error codes set by wglShareLists, wglMakeCurrent and 128 wglMakeContextCurrentARB: 129 130 ERROR_INCOMPATIBLE_AFFINITY_MASKS_NV 0x20D0 131 132 New error codes set by wglMakeCurrent and 133 wglMakeContextCurrentARB: 134 135 ERROR_MISSING_AFFINITY_MASK_NV 0x20D1 136 137Additions to the WGL Specification 138 139 GPU Affinity 140 141 To query handles for all GPUs in a system call: 142 143 BOOL wglEnumGpusNV(UINT iGpuIndex, HGPUNV *phGPU); 144 145 <iGpuIndex> is an index value that specifies a GPU. 146 147 <phGPU> upon return will contain a handle for GPU number 148 <iGpuIndex>. The first GPU will be index 0. 149 150 By looping over wglEnumGpusNV and incrementing <iGpuIndex>, 151 starting at index 0, all GPU handles can be queried. If the 152 function succeeds, the return value is TRUE. If the function 153 fails, the return value is FALSE and <phGPU> will be unmodified. 154 The function fails if <iGpuIndex> is greater or equal than the 155 number of GPUs supported by the system. 156 157 To retrieve information about the display devices supported by a 158 GPU call: 159 160 BOOL wglEnumGpuDevicesNV(HGPUNV hGpu, 161 UINT iDeviceIndex, 162 PGPU_DEVICE lpGpuDevice); 163 164 <hGpu> is a handle to the GPU to query. 165 166 <iDeviceIndex> is an index value that specifies a display device, 167 supported by <hGpu>, to query. The first display device will be 168 index 0. 169 170 <lpGpuDevice> pointer to a GPU_DEVICE structure which will receive 171 information about the display device at index <iDeviceIndex>. 172 173 By looping over the function wglEnumGpuDevicesNV and incrementing 174 <iDeviceIndex>, starting at index 0, all display devices can be 175 queried. If the function succeeds, the return value is TRUE. If 176 the function fails, the return value is FALSE and <lpGpuDevice> 177 will be unmodified. The function fails if <iDeviceIndex> is 178 greater or equal than the number of display devices supported by 179 <hGpu>. 180 181 The GPU_DEVICE structure has the following members: 182 183 typedef struct _GPU_DEVICE { 184 DWORD cb; 185 CHAR DeviceName[32]; 186 CHAR DeviceString[128]; 187 DWORD Flags; 188 RECT rcVirtualScreen; 189 } GPU_DEVICE, *PGPU_DEVICE; 190 191 <cb> is the size of the GPU_DEVICE structure. Before calling 192 wglEnumGpuDevicesNV, set <cb> to the size, in bytes, of 193 GPU_DEVICE. 194 195 <DeviceName> is a string identifying the display device name. This 196 will be the same string as stored in the <DeviceName> field of the 197 DISPLAY_DEVICE structure, which is filled in by 198 EnumDisplayDevices. 199 200 <DeviceString> is a string describing the GPU for this display 201 device. It is the same string as stored in the <DeviceString> 202 field in the DISPLAY_DEVICE structure that is filled in by 203 EnumDisplayDevices when it describes a display adapter (and not a 204 monitor). 205 206 <Flags> Indicates the state of the display device. It can be a 207 combination of any of the following: 208 209 DISPLAY_DEVICE_ATTACHED_TO_DESKTOP If set, the device is part 210 of the desktop. 211 212 DISPLAY_DEVICE_PRIMARY_DEVICE If set, the primary 213 desktop is on this device. Only one device in the system can have 214 this set. 215 216 <rcVirtualScreen> specifies the display device rectangle, in 217 virtual screen coordinates. The value of <rcVirtualScreen> is 218 undefined if the device is not part of the desktop, i.e. 219 DISPLAY_DEVICE_ATTACHED_TO_DESKTOP is not set in the <Flags> 220 field. 221 222 The function wglEnumGpuDevicesNV can fail for a variety of 223 reasons. Call GetLastError to get extended error information. 224 Possible errors are as follows: 225 226 ERROR_INVALID_HANDLE <hGpu> is not a valid GPU handle. 227 228 A new type of DC, called an affinity-DC, can be used to direct 229 OpenGL commands to a specific GPU or set of GPUs. An affinity-DC 230 is a device context with a GPU affinity mask embedded in it. This 231 restricts the device context to only allow OpenGL commands to be 232 sent to the GPU(s) in the affinity mask. An affinity-DC can be 233 created directly, using the new function wglCreateAffinityDCNV and 234 also indirectly by calling wglCreatePbufferARB followed by 235 wglGetPbufferDCARB. To create an affinity-DC directly call: 236 237 HDC wglCreateAffinityDCNV(const HGPUNV *phGpuList); 238 239 <phGpuList> is a NULL-terminated array of GPU handles to which the 240 affinity-DC will be restricted. If an element in the list is not a 241 GPU handle, as returned by wglEnumGpusNV, it is silently ignored. 242 243 If successful, the function returns an affinity-DC. If it fails, 244 NULL will be returned. 245 246 To create an affinity-DC indirectly, first call 247 wglCreatePbufferARB passing it an affinity-DC. Next, pass the 248 handle returned by the call to wglCreatePbufferARB to 249 wglGetPbufferDCARB to create an affinity-DC for the pBuffer. The 250 DC returned by wglGetPbufferDCARB will have the same affinity mask 251 as the DC used to create the pBuffer handle by calling 252 wglCreatePbufferARB. 253 254 An affinity-DC has no window associated with it, and therefore it 255 has no default window-system-provided framebuffer. (Note: This is 256 terminology borrowed from EXT_framebuffer_object). A context made 257 current to an affinity-DC will only be able to render into an 258 application-created framebuffer object, or a pBuffer. The default 259 window-system-framebuffer object, when bound, will be incomplete. 260 The EXT_framebuffer_object specification defines what 'incomplete' 261 means exactly. 262 263 A context created from an affinity-DC, by calling wglCreateContext 264 and passing it an affinity-DC, is called an affinity-context. This 265 context will inherit the affinity mask from the DC. This affinity- 266 mask cannot be changed. The affinity mask restricts the affinity- 267 context to only allow OpenGL commands to be sent to those GPU(s) 268 in its affinity mask. 269 270 The function wglCreateAffinityDCNV can fail for a variety of 271 reasons. Call GetLastError to get extended error information. 272 Possible errors are as follows: 273 274 ERROR_NO_SYSTEM_RESOURCES Insufficient resources exist to 275 create the affinity-DC. 276 277 ERROR_INVALID_DATA <phGpuList> is empty or contains no 278 valid GPU handles 279 280 An affinity-context can only be made current to an affinity-DC 281 with the same affinity-mask, otherwise wglMakeCurrent and 282 wglMakeContextCurrentARB will fail and return FALSE. In the case 283 of wglMakeContextCurrentARB, the affinity masks of both the "read" 284 and "draw" DCs need to match the affinity-mask of the context. 285 286 If a context that has no affinity mask is made current to an 287 affinity-DC, wglMakeCurrent and wglMakeContextCurrentARB will fail 288 and return FALSE. In the case of wglMakeContextCurrentARB it will 289 fail if either the "read" or "draw" DC is an affinity-DC. 290 291 If an affinity-context is made current to a DC obtained from a 292 window, by calling GetDC, then rendering will only happen to the 293 subrectangle(s) of the window that overlap the parts of the 294 desktop that are displayed by the GPU(s) in the affinity-mask of 295 the context. Note that a DC obtained from a window does not have 296 an affinity mask set. 297 298 The following error codes are added to the description of 299 wglMakeCurrent and wglMakeContextCurrentARB: 300 301 ERROR_INCOMPATIBLE_AFFINITY_MASKS_NV The device context(s) and 302 rendering context have non-matching affinity masks. 303 304 ERROR_MISSING_AFFINITY_MASK_NV The rendering context does 305 not have an affinity mask set. 306 307 Sharing OpenGL objects between affinity-contexts, by calling 308 wglShareLists, will only succeed if the contexts have identical 309 affinity masks. The following error codes are added to the 310 description of wglShareLists: 311 312 ERROR_INCOMPATIBLE_AFFINITY_MASKS_NV The contexts have non- 313 matching affinity masks. 314 315 To delete an affinity-DC call: 316 317 BOOL wglDeleteDCNV(HDC hdc) 318 319 <hdc> Is a handle of an affinity-DC to delete. 320 321 If the function succeeds, TRUE is returned. If the function fails, 322 FALSE is returned. Call GetLastError to get extended error 323 information. Possible errors are as follows: 324 325 ERROR_INVALID_HANDLE <hdc> is not a handle of an affinity-DC. 326 327 To retrieve a list of GPU handles that make up the affinity-mask 328 of an affinity-DC, call: 329 330 BOOL wglEnumGpusFromAffinityDCNV(HDC hAffinityDC, 331 UINT iGpuIndex, 332 HGPUNV *phGpu); 333 334 <hAffinityDC> is a handle of the affinity-DC to query. 335 336 <iGpuIndex> is an index value of the GPU handle in the affinity 337 mask of <hAffinityDC> to query. 338 339 <phGpu> upon return will contain a handle for GPU number 340 <iGpuIndex>. The first GPU will be at index 0. 341 342 By looping over wglEnumGpusFromAffinityDCNV and incrementing 343 <iGpuIndex>, starting at index 0, all GPU handles associated with 344 the DC can be queried. If the function succeeds, the return value 345 is TRUE. If the function fails, the return value is FALSE and 346 <phGPU> will be unmodified. The function fails if <iGpuIndex> is 347 greater or equal than the number of GPUs associated with 348 <hAffinityDC>. 349 350 Call GetLastError to get extended error information. Possible 351 errors are as follows: 352 353 ERROR_INVALID_HANDLE <hAffinityDC> is not a handle of an 354 affinity-DC. 355 356Interactions with WGL_ARB_make_current_read 357 358 If the make current read extension is not supported, all language 359 referring to wglMakeContextCurrentARB is deleted. 360 361Interactions with WGL_ARB_pbuffer 362 363 If the pbuffer extension is not supported, all language referring 364 to puffers, wglGetPbuferDC and wglCreatePbuffer are deleted. 365 366Interactions with GL_EXT_framebuffer_object 367 368 If the framebuffer object extension is not supported, all language 369 referring to framebuffer objects is deleted. 370 371Usage examples 372 373 // Example 1 - Normal window creation, DC setup and 374 // context creation. 375 376 PIXELFORMATDESCRIPTOR pfd; 377 int pf; 378 HDC hDC; 379 HGLRC hRC; 380 HWND hWnd; 381 382 hWnd = CreateWindow(...); 383 hDC = GetDC(hWnd); 384 385 memset(&pfd, 0, sizeof(pfd)); 386 pfd.nSize = sizeof(pfd); 387 pfd.nVersion = 1; 388 pfd.dwFlags = PFD_DRAW_TO_WINDOW | PFD_SUPPORT_OPENGL; 389 pfd.iPixelType = PFD_TYPE_RGBA; 390 pfd.cColorBits = 32; 391 392 // Note, for ease of code reading no error checking is done. 393 pf = ChoosePixelFormat(hDC, &pfd); 394 SetPixelFormat(hDC, pf, &pfd); 395 DescribePixelFormat(hDC, pf, sizeof(PIXELFORMATDESCRIPTOR), 396 &pfd); 397 398 hRC = wglCreateContext(hDC); 399 wglMakeCurrent(hDC, hRC); 400 401 402 // Example 2 - Offscreen rendering to one GPU using a FBO 403 // It is assumed that a context already has been created (and 404 // possibly destroyed) and was used to query the proc addresses 405 // of the WGL affinity related entrypoints. 406 407 #define MAX_GPU 4 408 409 PIXELFORMATDESCRIPTOR pfd; 410 int pf, gpuIndex = 0; 411 HGPUNV hGPU[MAX_GPU]; 412 HGPUNV GpuMask[MAX_GPU]; 413 HDC affDC; 414 HGLRC affRC; 415 416 // Get a list of the first MAX_GPU GPUs in the system 417 while ((gpuIndex < MAX_GPU) && wglEnumGpusNV(gpuIndex, 418 &hGPU[gpuIndex])) { 419 gpuIndex++; 420 } 421 422 // Create an affinity-DC associated with the first GPU 423 GpuMask[0] = hGPU[0]; 424 GpuMask[1] = NULL; 425 426 affDC = wglCreateAffinityDCNV(GpuMask); 427 428 // Set a pixelformat on the affinity-DC 429 pf = ChoosePixelFormat(affDC, &pfd); 430 SetPixelFormat(affDC, pf, &pfd); 431 DescribePixelFormat(affDC, pf, sizeof(PIXELFORMATDESCRIPTOR), 432 &pfd); 433 434 affRC = wglCreateContext(affDC); 435 wglMakeCurrent(affDC, affRC); 436 437 // Make a previously created FBO current so we have something 438 // to render into. Since there's no window, the default system 439 // created FBO is incomplete. 440 glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, fb); 441 442 <Now draw> 443 444 // Example 3 - Offscreen rendering to one GPU using a pBuffer 445 // It is assumed that a context already has been created (and 446 // possibly destroyed) and was used to query the proc addresses 447 // of the WGL affinity and pbuffer related entrypoints. 448 449 #define MAX_GPU 4 450 451 int gpuIndex = 0; 452 HGPUNV hGPU[MAX_GPU]; 453 HGPUNV GpuMask[MAX_GPU]; 454 HDC affDC, pBufferAffDC; 455 HGLRC affRC; 456 457 // Get a list of the first MAX_GPU GPUs in the system 458 while ((gpuIndex < MAX_GPU) && wglEnumGpusNV(gpuIndex, 459 &hGPU[gpuIndex])) { 460 gpuIndex++; 461 } 462 463 // Create an affinity-DC associated with the first GPU 464 GpuMask[0] = hGPU[0]; 465 GpuMask[1] = NULL; 466 467 affDC = wglCreateAffinityDCNV(GpuMask); 468 469 // Setup desired pixelformat attributes for the pbuffer 470 // including WGL_DRAW_TO_PBUFFER_ARB. 471 HPBUFFERARB handle; 472 int width = 512, height = 512, format = 0; 473 unsigned int nformats; 474 475 int attribList[] = 476 { 477 WGL_RED_BITS_ARB, 8, 478 WGL_GREEN_BITS_ARB, 8, 479 WGL_BLUE_BITS_ARB, 8, 480 WGL_ALPHA_BITS_ARB, 8, 481 WGL_STENCIL_BITS_ARB, 0, 482 WGL_DEPTH_BITS_ARB, 0, 483 WGL_DRAW_TO_PBUFFER_ARB, true, 484 0, 485 }; 486 487 wglChoosePixelFormatARB(affDC, attribList, NULL, 1, 488 &format, &nformats); 489 490 handle = wglCreatePbufferARB(affDC, format, width, height, NULL); 491 492 // pbufferAffDC will have the same affinity-mask as affDC. 493 pBufferAffDC = wglGetPbufferDCARB(handle); 494 495 // affRC will inherit the affinity-mask from pBufferAffDC. 496 affRC = wglCreateContext(pBufferAffDC); 497 wglMakeCurrent(pBufferAffDC, affRC); 498 499 <Now draw into the pBuffer> 500 501Issues 502 503 1) Do we really need an affinity-DC, or can we do with just an 504 affinity context? 505 506 DISCUSSION: If affinity is not part of a DC, a new function will 507 need to be defined to create an affinity-context or set an 508 affinity-mask for an existing context. Passing NULL as a HDC to 509 wglMakeCurrent will then be one way to create an off-screen 510 rendering context, where rendering will have to go to a FBO. If 511 the HDC passed to wglMakeCurrent is one for a pBuffer, the 512 affinity-mask in the affinity-context dictates where rendering is 513 direct to. This might mean pBuffer resources will have to move, or 514 alternatively, duplicated across all GPUs in a system. That is 515 counter to the whole idea of this extension. Thus an affinity-DC 516 is definitely needed for a pBuffer. 517 518 Thus the question reduces to, do we need an affinity-DC in order 519 to facilitate off-screen rendering to a FBO? Having an affinity-DC 520 has the following advantages: 521 522 a) It is consistent with making current to a pBuffer or window, 523 that does need a DC. 524 b) passing NULL as a HDC to wglMakeCurrent might be filtered out 525 by the MS layer on future OSes. 526 c) The driver implementation might benefit from knowing at DC 527 creation time what the affinity-mask is, rather than at 528 wglMakeCurrent time. 529 530 RESOLUTION: Yes. 531 532 2) Should the GPU affinity concept also apply to D3D and/or GDI 533 commands? 534 535 DISCUSSION: It could be especially desirable to apply the 536 affinity concept to D3D. However, D3D is sufficiently different 537 that this extension doesn't directly apply. 538 539 RESOLUTION: That falls outside this extension. 540 541 3) Should setting a pixelformat on an affinity-DC be required? 542 543 DISCUSSION: Setting a pixelformat on an affinity-DC is not 544 strictly necessary if the application does off-screen rendering to 545 a FBO. However, the Microsoft layer of wglMakeCurrent requires 546 that the pixelformats of the DC and RC passed to it match. This 547 becomes an issue when making an affinity-context current to a DC 548 obtained from a window. The DC has a pixelformat set by the 549 application, and therefore the affinity-context needs to have the 550 same pixelformat. This means the affinity-DC, that the affinity- 551 context is created from, needs to have the same pixelformat set. 552 553 RESOLUTION: YES. Setting a pixelformat on an affinity-DC is 554 required. 555 556 4) Is it allowed to make an affinity-context current to an 557 affinity-DC where the mask of the context spans more GPUs than the 558 mask in the DC? 559 560 5) Is it allowed to make an affinity-context current to an 561 affinity-DC where the mask of the context spans less GPUs than the 562 mask in the DC? 563 564 DISCUSSION: Issues 4 and 5 are lumped together in this discussion. 565 For example, is this scenario something we want to support: An 566 application wants to share objects across two contexts and have 567 these two contexts each render to a different GPU. It can do this 568 by creating two affinity-DCs. One has an affinity mask for the 569 first GPU, the other for the second GPU. It also creates two 570 affinity-contexts that both have an affinity-mask that spans both 571 GPUs. Making one context current to the first affinity-DC will 572 lock the context to the GPU in the mask of that affinity-DC. Make 573 another context current to the second affinity-DC will lock that 574 context to the second GPU. This is effectively what issue 4) is 575 asking. . The simplest solution is to disallow these cases, and 576 that is how the spec is currently written. 577 578 RESOLUTION: NO, we will not allow this to keep the spec simple. If 579 necessary, these restrictions can always be lifted later. 580 581 6) What should an application do if the enum functions that return 582 BOOL fail for another reason than they are done? For example, if 583 they fail because they run out of memory? 584 585 RESOLUTION: An application will have to call GetLastError to find 586 out the reason of failure. 587 588 7) The "Enum" API commands in this extension assume that the list 589 of things being enumerated does not change dynamically. Is that 590 reasonable? 591 592 DISCUSSION: Display devices, and possibly GPUs in the future, can 593 be changed dynamically and/or hotplugged. Thus yes, this is a 594 potential issue. Existing OS functionality like EnumDisplayDevices 595 and even wglMakeCurrent will suffer from this too. In the latter 596 case, the application could make a context current to a device 597 that was removed from the system. A possible solution would be 598 some sort of notification mechanism to the application. Possibly 599 combined with being able to snapshot state first, then enumerate 600 that snapshot. That snapshot of state might immediately become 601 invalid, but at least the enumeration will walk a consistent list. 602 603 RESOLUTION: This is a wider issue than just this specification, 604 and not currently addressed. 605 606 8) How do I transfer data efficiently between two affinity- 607 contexts? 608 609 DISCUSSION: It is desired for an application to render in one 610 context, and transfer the result of that rendering to another 611 context. These two contexts can be on different GPUs. If they are, 612 how does the application efficiently transfer this data? Currently 613 OpenGL provides two mechanisms, neither of which are ideal: 614 615 1) The application can do a ReadPixels followed by a DrawPixels / 616 TexImage call. This involves transfer through host memory, which 617 can be slow. 618 619 2) The application can share objects among the two contexts using 620 wglShareLists(). This will work, but is counter to the premise of 621 this extension where each GPU has its own set of resources, not 622 shared with another GPU. 623 624 RESOLUTION: This is a hole which needs to be addressed separately. 625 626Revision history 627 628 None 629