1<!-- markdownlint-disable MD041 --> 2<!-- Copyright 2015-2019 LunarG, Inc. --> 3[![Khronos Vulkan][1]][2] 4 5[1]: https://vulkan.lunarg.com/img/Vulkan_100px_Dec16.png "https://www.khronos.org/vulkan/" 6[2]: https://www.khronos.org/vulkan/ 7 8# GPU-Assisted Validation 9 10[![Creative Commons][3]][4] 11 12[3]: https://i.creativecommons.org/l/by-nd/4.0/88x31.png "Creative Commons License" 13[4]: https://creativecommons.org/licenses/by-nd/4.0/ 14 15GPU-Assisted validation is implemented in the SPIR-V Tools optimizer and the `VK_LAYER_KHRONOS_validation layer (or, in the 16soon-to-be-deprecated `VK_LAYER_LUNARG_core_validation` layer). This document covers the design of the layer portion of the 17implementation. 18 19## Basic Operation 20 21The basic operation of GPU-Assisted validation is comprised of instrumenting shader code to perform run-time checking in shaders and 22reporting any error conditions to the layer. 23The layer then reports the errors to the user via the same reporting mechanisms used by the rest of the validation system. 24 25The layer instruments the shaders by passing the shader's SPIR-V bytecode to the SPIR-V optimizer component and 26instructs the optimizer to perform an instrumentation pass to add the additional instructions to perform the run-time checking. 27The layer then passes the resulting modified SPIR-V bytecode to the driver as part of the process of creating a ShaderModule. 28 29The layer also allocates a buffer that describes the length of all descriptor arrays and the write state of each element of each array. 30It only does this if the VK_EXT_descriptor_indexing extension is enabled. 31 32As the shader is executed, the instrumented shader code performs the run-time checks. 33If a check detects an error condition, the instrumentation code writes an error record into the GPU's device memory. 34This record is small and is on the order of a dozen 32-bit words. 35Since multiple shader stages and multiple invocations of a shader can all detect errors, the instrumentation code 36writes error records into consecutive memory locations as long as there is space available in the pre-allocated block of device memory. 37 38The layer inspects this device memory block after completion of a queue submission. 39If the GPU had written an error record to this memory block, 40the layer analyzes this error record and constructs a validation error message 41which is then reported in the same manner as other validation messages. 42If the shader was compiled with debug information (source code and SPIR-V instruction mapping to source code lines), the layer 43also provides the line of shader source code that provoked the error as part of the validation error message. 44 45## GPU-Assisted Validation Checks 46 47The initial release (Jan 2019) of GPU-Assisted Validation includes checking for out-of-bounds descriptor array indexing 48for image/texel descriptor types. 49 50The second release (Apr 2019) adds validation for out-of-bounds descriptor array indexing and use of unwritten descriptors when the 51VK_EXT_descriptor_indexing extension is enabled. Also added (June 2019) was validation for buffer descriptors. 52 53### Out-of-Bounds(OOB) Descriptor Array Indexing 54 55Checking for correct indexing of descriptor arrays is sometimes referred to as "bind-less validation". 56It is called "bind-less" because a binding in a descriptor set may contain an array of like descriptors. 57And unless there is a constant or compile-time indication of which descriptor in the array is selected, 58the descriptor binding status is considered to be ambiguous, leaving the actual binding to be determined at run-time. 59 60As an example, a fragment shader program may use a variable to index an array of combined image samplers. 61Such a line might look like: 62 63```glsl 64uFragColor = light * texture(tex[tex_ind], texcoord.xy); 65``` 66 67The array of combined image samplers is `tex` and has 6 samplers in the array. 68The complete validation error message issued when `tex_ind` indexes past the array is: 69 70```terminal 71ERROR : VALIDATION - Message Id Number: 0 | Message Id Name: UNASSIGNED-Image descriptor index out of bounds 72 Index of 6 used to index descriptor array of length 6. Command buffer (CubeDrawCommandBuf)(0xbc24b0). 73 Pipeline (0x45). Shader Module (0x43). Shader Instruction Index = 108. Stage = Fragment. 74 Fragment coord (x,y) = (419.5, 254.5). Shader validation error occurred in file: 75 /home/user/src/Vulkan-ValidationLayers/external/Vulkan-Tools/cube/cube.frag at line 45. 7645: uFragColor = light * texture(tex[tex_ind], texcoord.xy); 77``` 78The VK_EXT_descriptor_indexing extension allows a shader to declare a descriptor array without specifying its size 79```glsl 80layout(set = 0, binding = 1) uniform sampler2D tex[]; 81``` 82In this case, the layer needs to tell the optimization code how big the descriptor array is so the code can determine what is out of 83bounds and what is not. 84 85The extension also allows descriptor set bindings to be partially bound, meaning that as long as the shader doesn't use certain 86array elements, those elements are not required to have been written. 87The instrumentation code needs to know which elements of a descriptor array have been written, so that it can tell if one is used 88that has not been written. 89 90Note that currently, VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK_EXT validation is not working and all accesses are reported as valid. 91 92 93## GPU-Assisted Validation Options 94 95Here are the options related to activating GPU-Assisted Validation: 96 971. Enable GPU-Assisted Validation - GPU-Assisted Validation is off by default and must be enabled. 98 99 GPU-Assisted Validation is disabled by default because the shader instrumentation may introduce significant 100 shader performance degradation and additional resource consumption. 101 GPU-Assisted Validation requires additional resources such as device memory and descriptors. 102 It is desirable for the user to opt-in to this feature because of these requirements. 103 In addition, there are several limitations that may adversely affect application behavior, 104 as described later in this document. 105 1062. Reserve a Descriptor Set Binding Slot - Modifies the value of the `VkPhysicalDeviceLimits::maxBoundDescriptorSets` 107 property to return a value one less than the actual device's value to "reserve" a descriptor set binding slot for use by GPU validation. 108 109 This option is likely only of interest to applications that dynamically adjust their descriptor set bindings to adjust for 110 the limits of the device. 111 112### Enabling and Specifying Options with a Configuration File 113 114The existing layer configuration file mechanism can be used to enable GPU-Assisted Validation. 115This mechanism is described on the 116[LunarXchange website](https://vulkan.lunarg.com/doc/sdk/latest/windows/layer_configuration.html), 117in the "Layers Overview and Configuration" document. 118 119To turn on GPU validation, add the following to your layer settings file, which is often 120named `vk_layer_settings.txt`. 121 122```code 123khronos_validation.gpu_validation = all 124``` 125 126To turn on GPU validation and request to reserve a binding slot: 127 128```code 129khronos_validation.gpu_validation = all,reserve_binding_slot 130``` 131 132Note: When using the core_validation layer, the above settings should use `lunarg_core_validation` in place of 133`khronos_validation`. 134 135Some platforms do not support configuration of the validation layers with this configuration file. 136Programs running on these platforms must then use the programmatic interface. 137 138### Enabling and Specifying Options with the Programmatic Interface 139 140The `VK_EXT_validation_features` extension can be used to enable GPU-Assisted Validation at CreateInstance time. 141 142Here is sample code illustrating how to enable it: 143 144```C 145VkValidationFeatureEnableEXT enables[] = {VK_VALIDATION_FEATURE_ENABLE_GPU_ASSISTED_EXT}; 146VkValidationFeaturesEXT features = {}; 147features.sType = VK_STRUCTURE_TYPE_VALIDATION_FEATURES_EXT; 148features.enabledValidationFeatureCount = 1; 149features.pEnabledValidationFeatures = enables; 150 151VkInstanceCreateInfo info = {}; 152info.pNext = &features; 153``` 154 155Use the `VK_VALIDATION_FEATURE_ENABLE_GPU_ASSISTED_RESERVE_BINDING_SLOT_EXT` enum to reserve a binding slot. 156 157## GPU-Assisted Validation Limitations 158 159There are several limitations that may impede the operation of GPU-Assisted Validation: 160 161### Vulkan 1.1 162 163Vulkan 1.1 or later is required because the GPU instrumentation code uses SPIR-V 1.3 features. 164Vulkan 1,1 is required to ensure that SPIR-V 1.3 is available. 165 166### Descriptor Types 167 168The current implementation works with image, texel, and buffer descriptor types. 169A complete list appears later in this document. 170 171### Descriptor Set Binding Limit 172 173This is probably the most important limitation and is related to the 174`VkPhysicalDeviceLimits::maxBoundDescriptorSets` device limit. 175 176When applications use all the available descriptor set binding slots, 177GPU-Assisted Validation cannot be performed because it needs a descriptor set to 178locate the memory for writing the error report record. 179 180This problem is most likely to occur on devices, often mobile, that support only the 181minimum required value for `VkPhysicalDeviceLimits::maxBoundDescriptorSets`, which is 4. 182Some applications may be written to use 4 slots since this is the highest value that 183is guaranteed by the specification. 184When such an application using 4 slots runs on a device with only 4 slots, 185then GPU-Assisted Validation cannot be performed. 186 187In this implementation, this condition is detected and gracefully recovered from by 188building the graphics pipeline with non-instrumented shaders instead of instrumented ones. 189An error message is also displayed informing the user of the condition. 190 191Applications don't have many options in this situation and it is anticipated that 192changing the application to free a slot is difficult. 193 194### Device Memory 195 196GPU-Assisted Validation does allocate device memory for the error report buffers, and if 197descriptor indexing is enabled, for the input buffer of descriptor sizes and write state. 198This can lead to a greater chance of memory exhaustion, especially in cases where 199the application is trying to use all of the available memory. 200The extra memory allocations are also not visible to the application, making it 201impossible for the application to account for them. 202 203Note that if descriptor indexing is enabled, the input buffer size will be equal to 204(1 + (number_of_sets * 2) + (binding_count * 2) + descriptor_count) words of memory where 205binding_count is the binding number of the largest binding in the set. 206This means that sparsely populated sets and sets with a very large binding will cause 207the input buffer to be much larger than it could be with more densely packed binding numbers. 208As a best practice, when using GPU-Assisted Validation with descriptor indexing enabled, 209make sure descriptor bindings are densely packed. 210 211If GPU-Assisted Validation device memory allocations fail, the device could become 212unstable because some previously-built pipelines may contain instrumented shaders. 213This is a condition that is nearly impossible to recover from, so the layer just 214prints an error message and refrains from any further allocations or instrumentations. 215There is a reasonable chance to recover from these conditions, 216especially if the instrumentation does not write any error records. 217 218### Descriptors 219 220This is roughly the same problem as the device memory problem mentioned above, 221but for descriptors. 222Any failure to allocate a descriptor set means that the instrumented shader code 223won't have a place to write error records, resulting in unpredictable device 224behavior. 225 226### Other Device Limits 227 228This implementation uses additional resources that may count against the following limits, 229and possibly others: 230 231* `maxMemoryAllocationCount` 232* `maxBoundDescriptorSets` 233* `maxPerStageDescriptorStorageBuffers` 234* `maxPerStageResources` 235* `maxDescriptorSetStorageBuffers` 236* `maxFragmentCombinedOutputResources` 237 238The implementation does not take steps to avoid exceeding these limits 239and does not update the tracking performed by other validation functions. 240 241### A Note About the `VK_EXT_buffer_device_address` Extension 242 243The recently introduced `VK_EXT_buffer_device_address` extension can be used 244to implement GPU-Assisted Validation without some of the limitations described above. 245This approach would use this extension to obtain a GPU device pointer to a storage 246buffer and make it available to the shader via a specialization constant. 247This technique removes the need to create descriptors, use a descriptor set slot, 248modify pipeline layouts, etc, and would relax some of the limitations listed above. 249 250This alternate implementation is under consideration. 251 252## GPU-Assisted Validation Internal Design 253 254This section may be of interest to readers who are interested on how GPU-Assisted Validation is implemented. 255It isn't necessarily required for using the feature. 256 257### General 258 259In general, the implementation does: 260 261* For each draw, dispatch, and trace rays call, allocate a buffer with enough device memory to hold a single debug output record written by the 262 instrumented shader code. 263 If descriptor indexing is enabled, calculate the amount of memory needed to describe the descriptor arrays sizes and 264 write states and allocate device memory and a buffer for input to the instrumented shader. 265 The Vulkan Memory Allocator is used to handle this efficiently. 266 267 There is probably little advantage in providing a larger output buffer in order to obtain more debug records. 268 It is likely, especially for fragment shaders, that multiple errors occurring near each other have the same root cause. 269 270 A block is allocated on a per draw basis to make it possible to associate a shader debug error record with 271 a draw within a command buffer. 272 This is done partly to give the user more information in the error report, namely the command buffer handle/name and the draw within that command buffer. 273 An alternative design allocates this block on a per-device or per-queue basis and should work. 274 However, it is not possible to identify the command buffer that causes the error if multiple command buffers 275 are submitted at once. 276* For each draw, dispatch, and trace rays call, allocate a descriptor set and update it to point to the block of device memory just allocated. 277 If descriptor indexing is enabled, also update the descriptor set to point to the allocated input buffer. 278 Fill the input buffer with the size and write state information for each descriptor array. 279 There is a descriptor set manager to handle this efficiently. 280 Also make an additional call down the chain to create a bind descriptor set command to bind our descriptor set at the desired index. 281 This has the effect of binding the device memory block belonging to this draw so that the GPU instrumentation 282 writes into this buffer for when the draw is executed. 283 The end result is that each draw call has its own buffer containing GPU instrumentation error 284 records, if any occurred while executing that draw. 285* Determine the descriptor set binding index that is eventually used to bind the descriptor set just allocated and updated. 286 Usually, it is `VkPhysicalDeviceLimits::maxBoundDescriptorSets` minus one. 287 For devices that have a very high or no limit on this bound, pick an index that isn't too high, but above most other device 288 maxima such as 32. 289* When creating a ShaderModule, pass the SPIR-V bytecode to the SPIR-V optimizer to perform the instrumentation pass. 290 Pass the desired descriptor set binding index to the optimizer via a parameter so that the instrumented 291 code knows which descriptor to use for writing error report data to the memory block. 292 If descriptor indexing is enabled, turn on OOB and write state checking in the instrumentation pass. 293 Use the instrumented bytecode to create the ShaderModule. 294* For all pipeline layouts, add our descriptor set to the layout, at the binding index determined earlier. 295 Fill any gaps with empty descriptor sets. 296 297 If the incoming layout already has a descriptor set placed at our desired index, the layer must not add its 298 descriptor set to the layout, replacing the one in the incoming layout. 299 Instead, the layer leaves the layout alone and later replaces the instrumented shaders with 300 non-instrumented ones when the pipeline layout is later used to create a graphics pipeline. 301 The layer issues an error message to report this condition. 302* When creating a GraphicsPipeline, ComputePipeline, or RayTracingPipeline, check to see if the pipeline is using the debug binding index. 303 If it is, replace the instrumented shaders in the pipeline with non-instrumented ones. 304* Before calling QueueSubmit, if descriptor indexing is enabled, check to see if there were any unwritten descriptors that were declared 305 update-after-bind. 306 If there were, update the write state of those elements. 307* After calling QueueSubmit, perform a wait on the queue to allow the queue to finish executing. 308 Then map and examine the device memory block for each draw or trace ray command that was submitted. 309 If any debug record is found, generate a validation error message for each record found. 310 311The above describes only the high-level details of GPU-Assisted Validation operation. 312More detail is found in the discussion of the individual hooked functions below. 313 314### Initialization 315 316When the validation layer loads, it examines the user options from both the layer settings file and the 317`VK_EXT_validation_features` extension. 318Note that it also processes the subsumed `VK_EXT_validation_flags` extension for simple backwards compatibility. 319From these options, the layer sets instance-scope flags in the validation layer tracking data to indicate if 320GPU-Assisted Validation has been requested, along with any other associated options. 321 322### "Calling Down the Chain" 323 324Much of the GPU-Assisted Validation implementation involves making "application level" Vulkan API 325calls outside of the application's API usage to create resources and perform its required operations 326inside of the validation layer. 327These calls are not routed up through the top of the loader/layer/driver call stack via the loader. 328Instead, they are simply dispatched via the containing layer's dispatch table. 329 330These calls therefore don't pass through any validation checks that occur before the gpu validation checks are run. 331This doesn't present any particular problem, but it does raise some issues: 332 333* The additional API calls are not fully validated 334 335 This implies that this additional code may never be checked for validation errors. 336 To address this, the code can "just" be written carefully so that it is "valid" Vulkan, 337 which is hard to do. 338 339 Or, this code can be checked by loading a khronos validation layer with 340 GPU validation enabled on top of "normal" standard validation in the 341 layer stack, which effectively validates the API usage of this code. 342 This sort of checking is performed by layer developers to check that the additional 343 Vulkan usage is valid. 344 345 This validation can be accomplished by: 346 347 * Building the validation layer with a hack to force GPU-Assisted Validation to be enabled. 348 Can't use the exposed mechanisms because we probably don't want it on twice. 349 * Rename this layer binary to something else like "khronos_validation2" to keep it apart from the 350 "normal" khronos validation. 351 * Create a new JSON file with the new layer name. 352 * Set up the layer stack so that the "khronos_validation2" layer is on top of or before the actual khronos 353 validation layer 354 * Then run tests and check for validation errors pointing to API usage in the "khronos_validation2" layer. 355 356 This should only need to be done after making any major changes to the implementation. 357 358 Another approach involves capturing an application trace with `vktrace` and then playing 359 it back with `vkreplay`. 360 361* The additional API calls are not state-tracked 362 363 This means that things like device memory allocations and descriptor allocations are not 364 tracked and do not show up in any of the bookkeeping performed by the validation layers. 365 For example, any device memory allocation performed by GPU-Assisted Validation won't be 366 counted towards the maximum number of allocations allowed by a device. 367 This could lead to an early allocation failure that is not accompanied by a validation error. 368 369 This shortcoming is left as not addressed in this implementation because it is anticipated that 370 a later implementation of GPU-Assisted Validation using the `VK_EXT_buffer_device_address` 371 extension will have less of a need to allocate these 372 tracked resources and it therefore becomes less of an issue. 373 374### Code Structure and Relationship to the Core Validation Layer 375 376The GPU-Assisted Validation code is largely contained in one 377[file](https://github.com/KhronosGroup/Vulkan-ValidationLayers/blob/master/layers/gpu_validation.cpp), with "hooks" in 378the other validation code that call functions in this file. 379These hooks in the validation code look something like this: 380 381```C 382if (GetEnables(dev_data)->gpu_validation) { 383 GpuPreCallRecordDestroyPipeline(dev_data, pipeline_state); 384} 385``` 386 387The GPU-Assisted Validation code is linked into the shared library for the khronos and core validation layers. 388 389#### Review of Khronos Validation Code Structure 390 391Each function for a Vulkan API command intercepted in the khronos validation layer is usually split up 392into several decomposed functions in order to organize the implementation. 393These functions take the form of: 394 395* PreCallValidate<foo>: Perform validation steps before calling down the chain 396* PostCallValidate<foo>: Perform validation steps after calling down the chain 397* PreCallRecord<foo>: Perform state recording before calling down the chain 398* PostCallRecord<foo>: Perform state recording after calling down the chain 399 400The GPU-Assisted Validation functions follow this pattern not by hooking into the top-level validation API shim, but 401by hooking one of these decomposed functions. 402 403The design of each hooked function follows: 404 405#### GpuPreCallRecordCreateDevice 406 407* Modify the `VkPhysicalDeviceFeatures` to turn on two additional physical device features: 408 * `fragmentStoresAndAtomics` 409 * `vertexPipelineStoresAndAtomics` 410 411#### GpuPostCallRecordCreateDevice 412 413* Determine and record (save in device state) the desired descriptor set binding index. 414* Initialize Vulkan Memory Allocator 415 * Determine error record block size based on the maximum size of the error record and alignment limits of the device. 416* Initialize descriptor set manager 417* Make a descriptor set layout to describe our descriptor set 418* Make a descriptor set layout to describe a "dummy" descriptor set that contains no descriptors 419 * This is used to "pad" pipeline layouts to fill any gaps between the used bind indices and our bind index 420* Record these objects in the per-device state 421 422#### GpuPreCallRecordDestroyDevice 423 424* Destroy descriptor set layouts created in CreateDevice 425* Clean up descriptor set manager 426* Clean up Vulkan Memory Allocator (VMA) 427* Clean up device state 428 429#### GpuAllocateValidationResources 430 431* For each Draw, Dispatch, or TraceRays call: 432 * Get a descriptor set from the descriptor set manager 433 * Get an output buffer and associated memory from VMA 434 * If descriptor indexing is enabled, get an input buffer and fill with descriptor array information 435 * Update (write) the descriptor set with the memory info 436 * Check to see if the layout for the pipeline just bound is using our selected bind index 437 * If no conflict, add an additional command to the command buffer to bind our descriptor set at our selected index 438* Record the above objects in the per-CB state 439Note that the Draw and Dispatch calls include vkCmdDraw, vkCmdDrawIndexed, vkCmdDrawIndirect, vkCmdDrawIndexedIndirect, vkCmdDispatch, vkCmdDispatchIndirect, and vkCmdTraceRaysNV. 440 441#### GpuPreCallRecordFreeCommandBuffers 442 443* For each command buffer: 444 * Destroy the VMA buffer(s), releasing the memory 445 * Give the descriptor sets back to the descriptor set manager 446 * Clean up CB state 447 448#### GpuOverrideDispatchCreateShaderModule 449 450This function is called from PreCallRecordCreateShaderModule. 451This routine sets up to call the SPIR-V optimizer to run the "BindlessCheckPass", replacing the original SPIR-V with the instrumented SPIR-V 452which is then used in the call down the chain to CreateShaderModule. 453 454This function generates a "unique shader ID" that is passed to the SPIR-V optimizer, 455which the instrumented code puts in the debug error record to identify the shader. 456This ID is returned by this function so it can be recorded in the shader module at PostCallRecord time. 457It would have been convenient to use the shader module handle returned from the driver to use as this shader ID. 458But the shader needs to be instrumented before creating the shader module and therefore the handle is not available to use 459as this ID to pass to the optimizer. 460Therefore, the layer keeps a "counter" in per-device state that is incremented each time a shader is instrumented 461to generate unique IDs. 462This unique ID is given to the SPIR-V optimizer and is stored in the shader module state tracker after the shader module is created, which creates the necessary association between the ID and the shader module. 463 464The process of instrumenting the SPIR-V also includes passing the selected descriptor set binding index 465to the SPIR-V optimizer which the instrumented 466code uses to locate the memory block used to write the debug error record. 467An instrumented shader is now "hard-wired" to write error records via the descriptor set at that binding 468if it detects an error. 469This implies that the instrumented shaders should only be allowed to run when the correct bindings are in place. 470 471The original SPIR-V bytecode is left stored in the shader module tracking data. 472This is important because the layer may need to replace the instrumented shader with the original shader if, for example, 473there is a binding index conflict. 474The application cannot destroy the shader module until it has used the shader module to create the pipeline. 475This ensures that the original SPIR-V bytecode is available if we need it to replace the instrumented shader. 476 477#### GpuOverrideDispatchCreatePipelineLayout 478 479This is function is called through PreCallRecordCreatePipelineLayout. 480 481* Check for a descriptor set binding index conflict. 482 * If there is one, issue an error message and leave the pipeline layout unmodified 483 * If no conflict, for each pipeline layout: 484 * Create a new pipeline layout 485 * Copy the original descriptor set layouts into the new pipeline layout 486 * Pad the new pipeline layout with dummy descriptor set layouts up to but not including the last one 487 * Add our descriptor set layout as the last one in the new pipeline layout 488* Create the pipeline layouts by calling down the chain with the original or modified create info 489 490#### GpuPreCallQueueSubmit 491 492* For each primary and secondary command buffer in the submission: 493 * Call helper function to see if there are any update after bind descriptors whose write state may need to be updated 494 and if so, map the input buffer and update the state. 495 496#### GpuPostCallQueueSubmit 497 498* Submit a command buffer containing a memory barrier to make GPU writes available to the host domain. 499* Call QueueWaitIdle. 500* For each primary and secondary command buffer in the submission: 501 * Call a helper function to process the instrumentation debug buffers (described later) 502 503#### GpuPreCallValidateCmdWaitEvents 504 505* Report an error about a possible deadlock if CmdWaitEvents is recorded with VK_PIPELINE_STAGE_HOST_BIT set. 506 507#### GpuPreCallRecordCreateGraphicsPipelines 508 509* Examine the pipelines to see if any use the debug descriptor set binding index 510* For those that do: 511 * Create non-instrumented shader modules from the saved original SPIR-V 512 * Modify the CreateInfo data to use these non-instrumented shaders. 513 * This prevents instrumented shaders from using the application's descriptor set. 514 515#### GpuPostCallRecordCreateGraphicsPipelines 516 517* For every shader in the pipeline: 518 * Destroy the shader module created in GpuPreCallRecordCreateGraphicsPipelines, if any 519 * These are found in the CreateInfo used to create the pipeline and not in the shader_module 520 * Create a shader tracking record that saves: 521 * shader module handle 522 * unique shader id 523 * graphics pipeline handle 524 * shader bytecode if it contains debug info 525 526This tracker is used to attach the shader bytecode to the shader in case it is needed 527later to get the shader source code debug info. 528 529The current shader module tracker in the validation code stores the bytecode, 530but this tracker has the same life cycle as the shader module itself. 531It is possible for the application to destroy the shader module after 532creating graphics pipeline and before submitting work that uses the shader, 533making the shader bytecode unavailable if needed for later analysis. 534Therefore, the bytecode must be saved at this opportunity. 535 536This tracker exists as long as the graphics pipeline exists, 537so the graphics pipeline handle is also stored in this tracker so that it can 538be looked up when the graphics pipeline is destroyed. 539At that point, it is safe to free the bytecode since the pipeline is never used again. 540 541#### GpuPreCallRecordDestroyPipeline 542 543* Find the shader tracker(s) with the graphics pipeline handle and free the tracker, along with any bytecode it has stored in it. 544 545### Shader Instrumentation Scope 546 547The shader instrumentation process performed by the SPIR-V optimizer applies descriptor index bounds checking 548to descriptors of the following types: 549 550 VK_DESCRIPTOR_TYPE_STORAGE_IMAGE 551 VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE 552 VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER 553 VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER 554 VK_DESCRIPTOR_TYPE_STORAGE_TEXEL_BUFFER 555 VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER 556 VK_DESCRIPTOR_TYPE_STORAGE_BUFFER 557 VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC 558 VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC 559 560Instrumentation is applied to the following SPIR-V operations: 561 562 OpImageSampleImplicitLod 563 OpImageSampleExplicitLod 564 OpImageSampleDrefImplicitLod 565 OpImageSampleDrefExplicitLod 566 OpImageSampleProjImplicitLod 567 OpImageSampleProjExplicitLod 568 OpImageSampleProjDrefImplicitLod 569 OpImageSampleProjDrefExplicitLod 570 OpImageGather 571 OpImageDrefGather 572 OpImageQueryLod 573 OpImageSparseSampleImplicitLod 574 OpImageSparseSampleExplicitLod 575 OpImageSparseSampleDrefImplicitLod 576 OpImageSparseSampleDrefExplicitLod 577 OpImageSparseSampleProjImplicitLod 578 OpImageSparseSampleProjExplicitLod 579 OpImageSparseSampleProjDrefImplicitLod 580 OpImageSparseSampleProjDrefExplicitLod 581 OpImageSparseGather 582 OpImageSparseDrefGather 583 OpImageFetch 584 OpImageRead 585 OpImageQueryFormat 586 OpImageQueryOrder 587 OpImageQuerySizeLod 588 OpImageQuerySize 589 OpImageQueryLevels 590 OpImageQuerySamples 591 OpImageSparseFetch 592 OpImageSparseRead 593 OpImageWrite 594 595Also, OpLoad and OpStore with an AccessChain into a base of OpVariable with 596either Uniform or StorageBuffer storage class and a type which is either a 597struct decorated with Block, or a runtime or statically-sized array of such 598a struct. 599 600 601### Shader Instrumentation Error Record Format 602 603The instrumented shader code generates "error records" in a specific format. 604 605This description includes the support for future GPU-Assisted Validation features 606such as checking for uninitialized descriptors in the partially-bound scenario. 607These items are not used in the current implementation for descriptor array 608bounds checking, but are provided here to complete the description of the 609error record format. 610 611The format of this buffer is as follows: 612 613```C 614struct DebugOutputBuffer_t 615{ 616 uint DataWrittenLength; 617 uint Data[]; 618} 619``` 620 621`DataWrittenLength` is the number of uint32_t words that have been attempted to be written. 622It should be initialized to 0. 623 624The `Data` array is the uint32_t words written by the shaders of the pipeline to record bindless validation errors. 625All elements of `Data` should be initialized to 0. 626Note that the `Data` array has runtime length. 627The shader queries the length of the `Data` array to make sure that it does not write past the end of `Data`. 628The shader only writes complete records. 629The layer uses the length of `Data` to control the number of records written by the shaders. 630 631The `DataWrittenLength` is atomically updated by the shaders so that shaders do not overwrite each others data. 632The shader takes the value it gets from the atomic update. 633If the value plus the record length is greater than the length of `Data`, it does not write the record. 634 635Given this protocol, the value in `DataWrittenLength` is not very meaningful if it is greater than the length of `Data`. 636However, the format of the written records plus the fact that `Data` is initialized to 0 should be enough to determine 637the records that were written. 638 639### Record Format 640 641The format of an output record is the following: 642 643 Word 0: Record size 644 Word 1: Shader ID 645 Word 2: Instruction Index 646 Word 3: Stage 647 <Stage-Specific Words> 648 <Validation-Specific Words> 649 650The Record Size is the number of words in this record, including the the Record Size. 651 652The Shader ID is a handle that was provided by the layer when the shader was instrumented. 653 654The Instruction Index is the instruction within the original function at which the error occurred. 655For bindless, this will be the instruction which consumes the descriptor in question, 656or the instruction that consumes the OpSampledImage that consumes the descriptor. 657 658The Stage is the integer value used in SPIR-V for each of the Execution Models: 659 660| Stage | Value | 661|---------------|:-----:| 662|Vertex |0 | 663|TessCtrl |1 | 664|TessEval |2 | 665|Geometry |3 | 666|Fragment |4 | 667|Compute |5 | 668|RayGenerationNV|5313 | 669|IntersectionNV |5314 | 670|AnyHitNV |5315 | 671|ClosestHitNV |5316 | 672|MissNV |5317 | 673|CallableNV |5318 | 674 675### Stage Specific Words 676 677These are words that identify which "instance" of the shader the validation error occurred in. 678Here are words for each stage: 679 680| Stage | Word 0 | Word 1 | World 2 | 681|---------------|------------------|------------|------------| 682|Vertex |VertexID |InstanceID | unused | 683|Tess* |InvocationID |unused | unused | 684|Geometry |PrimitiveID |InvocationID| unused | 685|Fragment |FragCoord.x |FragCoord.y | unused | 686|Compute |GlobalInvocationID|unused | unused | 687|RayGenerationNV|LaunchIdNV.x |LaunchIdNV.y|LaunchIdNV.z| 688|IntersectionNV |LaunchIdNV.x |LaunchIdNV.y|LaunchIdNV.z| 689|AnyHitNV |LaunchIdNV.x |LaunchIdNV.y|LaunchIdNV.z| 690|ClosestHitNV |LaunchIdNV.x |LaunchIdNV.y|LaunchIdNV.z| 691|MissNV |LaunchIdNV.x |LaunchIdNV.y|LaunchIdNV.z| 692|CallableNV |LaunchIdNV.x |LaunchIdNV.y|LaunchIdNV.z| 693 694"unused" means not relevant, but still present. 695 696### Validation-Specific Words 697 698These are words that are specific to the validation being done. 699For bindless validation, they are variable. 700 701The first word is the Error Code. 702 703For the *OutOfBounds errors, two words will follow: Word0:DescriptorIndex, Word1:DescriptorArrayLength 704 705For the *Uninitialized errors, one word will follow: Word0:DescriptorIndex 706 707| Error | Code | Word 0 | Word 1 | 708|-----------------------------|:----:|----------------|-----------------------| 709|IndexOutOfBounds |0 |Descriptor Index|Descriptor Array Length| 710|DescriptorUninitialized |1 |Descriptor Index|unused | 711 712So the words written for an image descriptor bounds error in a fragment shader is: 713 714 Word 0: Record size (9) 715 Word 1: Shader ID 716 Word 2: Instruction Index 717 Word 3: Stage (4:Fragment) 718 Word 4: FragCoord.x 719 Word 5: FragCoord.y 720 Word 6: Error (0: ImageIndexOutOfBounds) 721 Word 7: DescriptorIndex 722 Word 8: DescriptorArrayLength 723 724If another error is encountered, that record is written starting at Word 10, if the whole record will not overflow Data. 725If overflow will happen, no words are written.. 726 727The validation layer can continue to read valid records until it sees a Record Length of 0 or the end of Data is reached. 728 729#### Programmatic interface 730 731The programmatic interface for the above informal description is codified in the 732[SPIRV-Tools](https://github.com/KhronosGroup/SPIRV-Tools) repository in file 733[`instrument.hpp`](https://github.com/KhronosGroup/SPIRV-Tools/blob/master/include/spirv-tools/instrument.hpp). 734It consists largely of integer constant definitions for the codes and values mentioned above and 735offsets into the record for locating each item. 736 737## GPU-Assisted Validation Error Report 738 739This is a fairly simple process of mapping the debug report buffer associated with 740each draw in the command buffer that was just submitted and looking to see if the GPU instrumentation 741code wrote anything. 742Each draw in the command buffer should have a corresponding result buffer in the command buffer's list of result buffers. 743The report generating code loops through the result buffers, maps each of them, checks for errors, and unmaps them. 744The layer clears the buffer to zeros when it is allocated and after processing any 745buffer that was written to. 746The instrumented shader code expects these buffers to be cleared to zeros before it 747writes to them. 748 749The layer then prepares a "common" validation error message containing: 750 751* command buffer handle - This is easily obtained because we are looping over the command 752 buffers just submitted. 753* draw number - keep track of how many draws we've processed for a given command buffer. 754* pipeline handle - The shader tracker discussed earlier contains this handle 755* shader module handle - The "Shader ID" (Word 1 in the record) is used to lookup 756 the shader tracker which is then used to obtain the shader module and pipeline handles 757* instruction index - This is the SPIR-V instruction index where the invalid array access occurred. 758 It is not that useful by itself, since the user would have to use it to locate a SPIR-V instruction 759 in a SPIR-V disassembly and somehow relate it back to the shader source code. 760 But it could still be useful to some and it is easy to report. 761 The user can build the shader with debug information to get source-level information. 762 763For all objects, the layer also looks up the objects in the Debug Utils object name map in 764case the application used that extension to name any objects. 765If a name exists for that object, it is included in the error message. 766 767The layer then adds on error message text obtained from decoding the stage-specific and 768validation-specific data as described earlier. 769 770This completes the error report when there is no source-level debug information in the shader. 771 772### Source-Level Debug Information 773 774This is one of the more complicated and code-heavy parts of the GPU-Assisted Validation feature 775and all it really does is display source-level information when the shader is compiled 776with debugging info (`-g` option in the case of `glslangValidator`). 777 778The process breaks down into two steps: 779 780#### OpLine Processing 781 782The SPIR-V generator (e.g., glslangValidator) places an OpLine SPIR-V instruction in the 783shader program ahead of code generated for each source code statement. 784The OpLine instruction contains the filename id (for an OpString), 785the source code line number and the source code column number. 786It is possible to have two source code statements on the same line in the source file, 787which explains the need for the column number. 788 789The layer scans the SPIR-V looking for the last OpLine instruction that appears before the instruction 790at the instruction index obtained from the debug report. 791This OpLine then contains the correct filename id, line number, and column number of the 792statement causing the error. 793The filename itself is obtained by scanning the SPIR-V again for an OpString instruction that 794matches the id from the OpLine. 795This OpString contains the text string representing the filename. 796This information is added to the validation error message. 797 798For online compilation when there is no "file", only the line number information is reported. 799 800#### OpSource Processing 801 802The SPIR-V built with source-level debug info also contains OpSource instructions that 803have a string containing the source code, delimited by newlines. 804Due to possible pre-processing, the layer just cannot simply use the source file line number 805from the OpLine to index into this set of source code lines. 806 807Instead, the correct source code line is found by first locating the "#line" directive in the 808source that specifies a line number closest to and less than the source line number reported 809by the OpLine located in the previous step. 810The correct "#line" directive must also match its filename, if specified, 811with the filename from the OpLine. 812 813Then the difference between the "#line" line number and the OpLine line number is added 814to the place where the "#line" was found to locate the actual line of source, which is 815then added to the validation error message. 816 817For example, if the OpLine line number is 15, and there is a "#line 10" on line 40 818in the OpSource source, then line 45 in the OpSource contains the correct source line. 819 820### Shader Instrumentation Input Record Format 821 822Although the input buffer is a linear array of unsigned integers, conceptually there are arrays within the linear array 823 824Word 1 starts an array (denoted by sets_to_sizes) that is number_of_sets long, with an index that indicates the start of that set's entries in the sizes array 825 826After the sets_to_sizes array is the sizes array, that contains the array size (or 1 if descriptor is not an array) of each descriptor in the set. Bindings with no descriptor are filled in with zeros 827 828After the sizes array is the sets_to_bindings array that for each descriptor set, indexes into the bindings_to_written array. Word 0 contains the index that is the start of the sets_to_bindings array 829 830After the sets_to_bindings array, is the bindings_to_written array that for each binding in the set, indexes to the start of that binding's entries in the written array 831 832Lastly comes the written array, which indicates whether a given binding / array element has been written 833 834Example: 835``` 836Assume Descriptor Set 0 looks like: And Descriptor Set 1 looks like: 837 Binding Binding 838 0 Array[3] 2 Array[4] 839 1 Non Array 3 Array[5] 840 3 Array[2] 841 842Here is what the input buffer should look like: 843 844 Index of sets_to_sizes sizes sets_to_bindings bindings_to_written written 845 sets_to_bindings 846 847 0 |11| sets_to_bindings 1 |3| set 0 sizes start at 3 3 |3| S0B0 11 |13| set 0 bindings start at 13 13 |21| S0B0 21 |1| S0B0I0 was written 848 starts at 11 2 |7| set 1 sizes start at 7 4 |1| S0B1 12 |17| set 1 bindings start at 17 14 |24| S0B1 22 |1| S0B0I1 was written 849 5 |0| S0B2 15 |0 | S0B2 23 |1| S0B0I3 was written 850 6 |2| S0B3 16 |25| S0B3 24 |1| S0B1 was written 851 7 |0| S1B0 17 |0 | S1B0 25 |1| S0B3I0 was written 852 8 |0| S1B1 18 |0 | S1B1 26 |1| S0B3I1 was written 853 9 |4| S1B2 19 |27| S1B2 27 |0| S1B2I0 was not written 854 10 |5| S1B3 20 |31| S1B3 28 |1| S1B2I1 was written 855 29 |1| S1B2I2 was written 856 30 |1| S1B2I3 was written 857 31 |1| S1B3I0 was written 858 32 |1| S1B3I1 was written 859 33 |1| S1B3I2 was written 860 34 |1| S1B3I3 was written 861 35 |1| S1B3I4 was written 862``` 863Alternately, you could describe the array size and write state data as: 864(set = s, binding = b, index = i) is not initialized if 865``` 866Input[ i + Input[ b + Input[ s + Input[ Input[0] ] ] ] ] == 0 867``` 868and the array's size = Input[ Input[ s + 1 ] + b ] 869 870## GPU-Assisted Validation Testing 871 872Validation Layer Tests (VLTs) exist for GPU-Assisted Validation. 873They cannot be run with the "mock ICD" in headless CI environments because they need to 874actually execute shaders. 875But they are still useful to run on real devices to check for regressions. 876 877There isn't anything else that remarkable or different about these tests. 878They activate GPU-Assisted Validation via the programmatic 879interface as described earlier. 880 881The tests exercise the extraction of source code information when the shader 882is built with debug info. 883