• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1// Copyright 2021-2023 The Khronos Group, Inc.
2//
3// SPDX-License-Identifier: CC-BY-4.0
4
5= VK_EXT_shader_tile_image
6:toc: left
7:refpage: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/
8:sectnums:
9
10`VK_EXT_shader_tile_image` is a device extension that explicitly enables access to on-chip pixel data. For GPUs supporting this extension, it is a replacement for many use-cases for subpasses, which are not available when the `VK_KHR_dynamic_rendering` extension is used.
11
12== Problem Statement
13
14Some implementations, in particular tile-based GPUs, want to allow applications to effectively exploit local, e.g. on-chip, memory.
15A classic example would be optimizing G-buffer based deferred shading techniques where the G-buffer is produced and consumed on-chip.
16
17Subpasses were designed to support such use-cases with an API mechanism that was portable across all implementations. In practice, that has led to some problems, including:
18
19 * the high level abstraction is far removed from the mental model an application developer needs to have to be able to optimize for keeping data on-chip
20 * the subpass design affects other parts of the API and is seen as a 'tax' on applications that do not target implementations that benefit from on-chip storage
21 * developers wanting to optimize for a specific class of GPUs often need to make GPU specific optimization choices, so the abstraction does not add much
22
23These problems motivated `VK_KHR_dynamic_rendering`, which offers an alternative API without subpasses. But keeping data on-chip is still an important optimization for a class of GPUs.
24
25This proposal aims to provide the most essential functionality of subpasses, but in an explicit manner.
26The abstractions in this proposal are a closer match to what the underlying GPU implementation does and should make it easier to communicate best practices and performance guarantees to developers.
27
28== Solution Space
29
30=== High-level choices
31
32The solution space can be split in two axes: scope and abstraction level.
33
34The abstraction level is a question of whether we want an API that is only targeted at tile-based GPUs or if we should have a higher-level API that would allow the feature to be supported on a wider range of GPUs.
35The main argument for a higher abstraction level is application portability.
36Arguments against additional abstractions include:
37
38 * It would be hard for developers to reason about performance expectations, for the same reasons that it is hard to do this for subpasses
39 * "Framebuffer fetch" and "programmable blend" semantics are naturally expressed as direct reads from color attachments, and adding abstractions just obfuscate what (some) GPU hardware is doing
40 * GPUs that are not tile-based would not gain much from exposing this - at least not unless the scope is expanded - so the abstractions add little practical value
41
42There are two choices broadly based on what the functionality is for, and which GPUs are able to support it:
43
441. An explicit API to allow certain tile-based GPUs to expose on-chip memory with fast raster order access.
45 * Provides framebuffer fetch and Pixel Local Storage functionality and forms the basis for Tile Shader like functionality.
46 * This is mainly targeted at GPUs which defer fragment shading into framebuffer tiles where each tile is typically processed just once.
47 * This addresses use cases such as keeping G-buffer data on-chip.
48 * No DRAM bandwidth paid for render targets which are cleared on load, consumed within the render pass, and content discarded at end of render pass.
49 * Raster order access (coherent access) to framebuffer data from fragment shader is efficient or even "free" - depending on the GPU.
50 * No descriptors needed for render target access.
51
522. A slightly higher level API to enable broad GPU support for framebuffer fetch like functionality within draw calls in dynamic render passes.
53 * Provides framebuffer fetch like functionality.
54 * This is intended to be supported by a wide range of GPUs. The GPUs in general have optimised support for framebuffer fetch within a render pass.
55 * This addresses use cases such a programmable image composition, or programmable resolve.
56 * Attachment data is not guaranteed to be on-chip within a render pass and may spill to DRAM. Implementations may opportunistically cache data in their cache hierarchy.
57 * Raster order access to framebuffer data from fragment shader is not "free". Many implementations may prefer non-coherent access with explicit synchronization from applications.
58 * Descriptors need to be bound for render target access (at least for some implementations).
59
60This proposal targets the first choice.
61
62The options for scope include:
63
64 * "Framebuffer fetch" equivalent, i.e. enable access to the previously written pixel in the local framebuffer region
65 * "Pixel local storage" equivalent, i.e. as above with the addition of pixel format reinterpretation
66 * "Tile shader" equivalent, i.e. enable access to a region larger than 1x1 pixels
67
68This proposal targets the first option, but adds building blocks to enable future enhancements.
69The reasoning behind this choice is that:
70
71 * It should be possible to support this extension on existing GPUs
72 * Many use-cases that benefit from subpasses could be implemented with this functionality
73 * Ease of integration; this option requires the least amount of changes to rendering engines
74 * Time to market; several IHVs would like at least the subpass equivalent functionality to be implemented alongside `VK_KHR_dynamic_rendering`
75
76=== Implementation choices
77
78It is useful to provide tile image access for all attachment types.
79But implementations may manage depth/stencil differently than color, which could add constraints.
80We will therefore expose separate feature bits for color, depth, and stencil access.
81
82Tile image variables currently have to 'alias' a color attachment location, and their format is implicitly specified to match the color attachment format.
83
84== Proposal
85
86=== Concept
87
88image::{images}/tile_image.svg[align="center",title="Tile Image",align="center",opts="{imageopts}"]
89
90Introduce the concept of a 'tile image'. When the extension is enabled, the framebuffer is logically divided into a grid of non-overlapping tiles called tile images.
91
92=== API changes
93
94Add a new feature struct `VkPhysicalDeviceShaderTileImageFeaturesEXT` containing:
95
96 * shaderTileImageColorReadAccess
97 * shaderTileImageDepthReadAccess
98 * shaderTileImageStencilReadAccess
99
100shaderTileImageColorReadAccess is mandatory if this extension is supported.
101
102shaderTileImageColorReadAccess provides the ability to access current (rasterization order) color values from tile memory via tile images.
103There is no support for the storage format to be redefined as part of this feature.
104Output data is still written via Fragment Output variables.
105Since the framebuffer format is not re-declared, fixed-function blending works as normal.
106
107Existing shaders do not to need to be modified to write to color attachments.
108
109Reading color values using the functionality in this extension guarantees that the access is in rasterization order.
110See the spec (Fragment Shader Tile Image Reads) for details on which samples reads qualify for coherent read access.
111
112shaderTileImageDepthReadAccess and shaderTileImageStencilReadAccess provide similar ability to read the depth and stencil values of any sample location covered by the fragment.
113Depth and stencil fetches use implicit tile images.
114If no depth / stencil attachment is present then the values returned by fetches are undefined.
115Early fragment tests are disallowed if depth or stencil fetch is used.
116
117Reading depth/stencil values have similar rasterization order and synchronization guarantees as color.
118
119=== SPIR-V changes
120
121This proposal leverages `OpTypeImage` and makes 'TileImageDataEXT' another `Dim` similar to `SubpassData`.
122
123Specifically:
124
125 * `Dim` is extended with `TileImageDataEXT`.
126 * `OpTypeImage` gets the additional constraint that if `Dim` is `TileImageDataEXT`:
127 ** `Sampled` must: be `2`
128 ** `Image Format` must be `Unknown` as the format is implicitly specified by the color attachment
129 *** (We could relax this in a further extension if we wanted to support format reinterpretation in the shader.)
130 ** `Execution Model` must be `Fragment`
131 ** `Arrayed` must be `0`
132 ** Extend the use of `Location` such that it specifies the color attachment index
133 * Add `OpColorAttachmentReadEXT`, which is similar to `OpImageRead` but helps disambiguate between color/depth/stencil.
134 * Add `OpDepthAttachmentReadEXT` and `OpStencilAttachmentReadEXT` to read depth/stencil
135 ** These take an optional `Sample` parameter for MSAA use-cases
136 * Add a `TileImageEXT` Storage Class that is only supported for variables of `OpTypeImage` with `Dim` equal to `TileImageDataEXT`
137
138=== GLSL changes
139
140Main changes:
141
142 * New type: `attachmentEXT`
143 * The `location` layout qualifier is used to specify the corresponding color attachment
144 * New storage qualifier (supported only in fragment shaders): `tileImageEXT`
145 * New functions: `colorAttachmentReadEXT`, `depthAttachmentReadEXT`, `stencilAttachmentReadEXT`
146
147Mapping to SPIR-V:
148
149 * `attachmentEXT` maps to `OpTypeImage` with `Dim` equal to `TileImageDataEXT`
150 * `colorAttachmentReadEXT` maps to `OpColorAttachmentReadEXT`
151 * `depthAttachmentReadEXT` maps to `OpDepthAttachmentReadEXT`
152 * `stencilAttachmentReadEXT` maps to `OpStencilAttachmentReadEXT`
153
154Function signatures:
155[source,c]
156----
157// color
158gvec4 colorAttachmentReadEXT(gattachment attachmentEXT);
159gvec4 colorAttachmentReadEXT(gattachment attachmentEXT, int sample);
160
161// depth
162highp float depthAttachmentReadEXT();
163highp float depthAttachmentReadEXT(int sample);
164
165// stencil
166lowp uint stencilAttachmentReadEXT();
167lowp uint stencilAttachmentReadEXT(int sample);
168----
169
170=== HLSL Changes
171
172== Examples
173
174=== Color reads
175
176[source,c]
177----
178// ------ Subpass Example --------
179layout( set = 0, binding = 0, input_attachment_index = 0 ) uniform highp subpassInput color0;
180layout( set = 0, binding = 1, input_attachment_index = 1 ) uniform highp subpassInput color1;
181
182layout( location = 0 ) out vec4 fragColor;
183
184void main()
185{
186    vec4 value = subpassLoad(color0) + subpassLoad(color1);
187    fragColor = value;
188}
189
190// ----- Equivalent Tile Image approach ------
191
192// NOTES:
193// 'tileImageEXT' is a storage qualifier.
194// 'attachmentEXT' is an opaque type; similar to subpassInput
195// 'aliased' means that the variable shares _tile image_ with the corresponding attachment; there is no in-memory aliasing
196
197layout( location = 0 /* aliased to color attachment 0 */ ) tileImageEXT highp attachmentEXT color0;
198layout( location = 1 /* aliased to color attachment 1 */ ) tileImageEXT highp attachmentEXT color1;
199
200layout( location = 0 ) out vec4 fragColor;
201
202void main()
203{
204    vec4 value = colorAttachmentReadEXT(color0) + colorAttachmentReadEXT(color1);
205    fragColor = value;
206}
207----
208
209==== Depth reads
210
211[source,c]
212----
213void main()
214{
215    // read sample 0: works for non-MSAA or MSAA targets
216    highp float last_depth = depthAttachmentReadEXT();
217}
218----
219
220== Alternate Proposals
221
222The following proposals explore alternate ways to expose the functionality for reading from the tile memory for color data - reading depth and stencil and the API changes are kept unchanged from the main proposal.
223
224=== Proposal B: OpTypeTileImage
225
226==== SPIR-V Changes
227
228Add new type: `TileImage`. We have two options for defining `TileImage`:
229
230. `TileImage` variables which are instanced per-pixel (or per-sample in case of multisampled framebuffers)
231. `TileImage` defines a 2D array of pixels similar to an image but in tile memory.
232.. Note: Defining this as a 2D array fits well for future `Tile Shaders` functionality where tile shader invocations on a tile can access any location within a TileImage on the tile.
233
234Add new instruction: `OpTypeTileImage`. The instruction declares a `tile image`. `Tile image` is an opaque type. `OpTypeTileImage` has the following operands:
235
236* `Image Format`: the imageformat. This must be set to `Unknown` as the format is implicitly specified by the color attachment.
237** (We could relax this in a further extension if we wanted to support format reinterpretation in the shader.)
238* `MS` : indicates whether the content is multisampled. 0 - single-sampled. 1 - multisampled.
239
240`Tile image` variables must be decorated with `Location` which specifies the color attachment index.
241`Execution Model` must be `Fragment`.
242
243Add `OpTileImageRead`, `OpDepthTileImageRead`, `OpStencilTileImageRead` to read from color, depth, stencil tile images.
244Add `Tile` storage class.
245
246==== GLSL Changes
247
248GLSL changes remain the same as in the main proposal except the mapping changes to `OpTypeTileImage` instead of `OpTypeImage`:
249
250 * `tileImage` maps to `OpTypeTileImage`
251
252=== Proposal C: Storage Class / PLS style
253
254==== SPIR-V Changes
255
256Introduce `TileImage` as a new storage class.
257
258* Variables declared with `TileImage` must have `Location` decoration specified - this specifies the attachment index to alias to.
259* If image format reinterpretation is to be supported then a new `Imageformat` decoration is specified.
260* `TileImage` storage class variables are multisampled with the sample count of the framebuffer if multisampling is enabled.
261* Reading of TileImage variables is done via `OpTileImageRead`.
262** `OpTileImageRead` which accepts a `sample` parameter for MSAA use cases.
263
264* If aggregate types are to be supported in `TileImage` storage class, we would need the following:
265** `Location` and `Imageformat` must only be applied to non-structure type (that is, scalars or vectors or arrays of scalars or arrays of vectors).
266
267==== GLSL Changes
268
269* New storage class `tileImage`.
270* Add support for grouping `tileImage` variable declarations into an interface block.
271* layout `location` must be specified for the variables.
272* Add new builtin function `tileImageRead`, which accepts an optional parameter `sample`
273* If reinterpretation of formats is supported (within the same draw call), then we need `tileImageIn` and `tileImageOut` (or make `tileImage` an auxiliary storage specifier, similar to `patch` so we could use `tileImage in` and `tileImage out`).
274
275== Non-coherent access
276
277Some implementations have a penalty for support raster order access to tile image data. To support this functionality on such implementations we would add the following changes to the base proposal:
278
279=== API Changes
280
281* A property bit `shaderTileImagePreferCoherentReadAccess` indicating whether the implementation prefers coherent read accesses are used.
282
283* Support for specifying the barriers - three broad options (see next section)
284
285* Note: The gains from tile image feature with raster order access enabled are expected to match the gains from subpasses.
286
287=== Barrier Proposal A: MemoryBarrier via vkCmdPipelineBarrier2
288
289`vkCmdPipelineBarrier2` would be allowed within dynamic render passes to specify a `VkMemoryBarrier2` with some restrictions. The enums `VK_ACCESS_2_COLOR_ATTACHMENT_READ_BIT` and `VK_ACCESS_2_DEPTH_STENCIL_ATTACHMENT_READ_BIT` are reused for tileimage read accesses.
290
291This approach would allow synchronizing all color attachments, or depth stencil attachment, but does not support synchronizing individual color attachments.
292
293Example synchronizing two draw calls, where the first writes to color attachments and the second reads via the tileimage variables.
294
295[source,c]
296----
297vkCmdDraw(...);
298
299VkMemoryBarrier2 memoryBarrier = {
300        ...
301        .srcStageMask = VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT,
302        .srcAccessMask = VK_ACCESS_2_COLOR_ATTACHMENT_WRITE_BIT,
303        .dstStageMask = VK_PIPELINE_STAGE_2_FRAGMENT_SHADER_BIT,
304        .dstAccessMask = VK_ACCESS_2_COLOR_ATTACHMENT_READ_BIT
305};
306
307VkDependencyInfo dependencyInfo {
308        ...
309        VK_DEPENDENCY_BY_REGION, //dependency flags
310        1, //memory barrier count
311        &memoryBarrier, //memory barrier
312        ...
313};
314
315vkCmdPipelineBarrier2(commandBuffer, &dependencyInfo);
316
317vkCmdDraw(...);
318----
319
320=== Barrier Proposal B: ImageMemoryBarrier via vkCmdPipelineBarrier2
321
322`vkCmdPipelineBarrier2` would be allowed within dynamic render passes to specify a `VkMemoryBarrier2` with some restrictions. The enums `VK_ACCESS_2_COLOR_ATTACHMENT_READ_BIT` and `VK_ACCESS_2_DEPTH_STENCIL_ATTACHMENT_READ_BIT` are reused to express tileimage read accesses.
323
324This approach would allow synchronizing individual color attachments, or depth or stencil attachment.
325
326Example synchronizing two draw calls, where the first writes to color attachments and the second reads via the tileimage variables.
327
328[source,c]
329----
330vkCmdDraw(...);
331
332VkImageMemoryBarrier2 imageMemoryBarrier = {
333        ...
334        .srcStageMask = VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT,
335        .srcAccessMask = VK_ACCESS_2_COLOR_ATTACHMENT_WRITE_BIT,
336        .dstStageMask = VK_PIPELINE_STAGE_2_FRAGMENT_SHADER_BIT,
337        .dstAccessMask = VK_ACCESS_2_COLOR_ATTACHMENT_READ_BIT,
338        .oldLayout = ..., //layouts not allowed to be changed.
339        .newLayout ...,
340        .image = .., //image and subresource identifying the specific attachment.
341        .subresourceRange = ..
342};
343
344VkDependencyInfo dependencyInfo {
345        ...
346        VK_DEPENDENCY_BY_REGION, //dependency flags
347        ...
348        1, //image memory barrier count
349        &imageMemoryBarrier, //memory barrier
350        ...
351};
352
353vkCmdPipelineBarrier2(commandBuffer, &dependencyInfo);
354
355vkCmdDraw(...);
356----
357
358=== Barrier Proposal C: New simple API for tile image barriers
359
360New API entry point `vkCmdTileBarrierEXT(..)` where the app can specify which attachments to synchronize. This can be easily extended to tile shader if an implementation desires explicit barriers - by specifying all of tile memory needs to be synchronized and explicitly specifying tile-wide synchronization.
361
362[source,c]
363----
364//New Vulkan function and types
365vkCmdTileBarrierEXT(
366    VkCommandBuffer             commandBuffer,
367    VkDependencyFlags           dependencyFlags,
368    VkTileMemoryTypeFlagsEXT    tileMemoryMask);
369
370typedef enum VkTileMemoryTypeFlagsBitsEXT {
371    VK_TILE_IMAGE_COLOR_ATTACHMENTS_BIT = 0x00000001,
372    VK_TILE_IMAGE_DEPTH_STENCIL_ATTACHMENT_BIT = 0x00000002,
373}
374----
375
376Example synchronizing two draw calls, where the first writes to color attachments and the second reads via the tile image variables.
377
378[source,c]
379----
380vkCmdDraw(...);
381
382vkCmdTileBarrierEXT(commandBuffer,
383    VK_DEPENDENCY_BY_REGION,
384    VK_TILE_IMAGE_COLOR_ATTACHMENTS_BIT);
385
386vkCmdDraw(...);
387----
388
389
390=== SPIR-V and GLSL changes
391
392* Tile Image data variables can optionally be specified with "noncoherent" layout qualifier in GLSL. For Depth and Stencil we could use a special fragment shader layout qualifier (similar to early_fragment_tests) to indicate depth and stencil access is "noncoherent".
393* Three new Execution modes in SPIR-V to specify that color, depth or stencil reads via the functionality in this extension are non-coherent (that is the reads are no longer guaranteed to be in raster order with respect to write operations from prior fragments).
394
395== Issues
396
397=== 1. RESOLVED: Should we allow early fragment tests?
398
399Early fragment tests are disallowed if reading frag depth / stencil.
400
401=== 2. RESOLVED: Should depth / stencil fetch be a separate extension?
402
403Access to depth / stencil is defined differently than color, but we suggest keeping them together - with separate feature bits.
404
405=== 3. RESOLVED: What should we name these variables? What should the extension be named?
406
407Other APIs have similar but not identical concepts, so a unique name is useful.
408
409We call these resources tile images.
410On typical implementations supporting this extension, the framebuffer is divided into tiles and fragment processing is deferred such that each framebuffer tile is typically visited just once.
411A tile image is a view of a framebuffer attachment, restricted to the tile being processed.
412
413Note that fragment shaders still can only color, depth, and stencil values from their fragment location and not the entire tile.
414
415The extension is called VK_EXT_shader_tile_image.
416
417=== 4. RESOLVED: Are there any non-obvious interactions with the suspend/resume functionality in `VK_KHR_dynamic_rendering`?
418
419Not at present.
420If we were to allow non-aliased tile image variables, then implementations would have to be able to guarantee that those variables never have to 'spill' from tile image.
421
422=== 5. RESOLVED: Enable / Disable raster order access
423
424Some implementations pay a performance cost to guarantee raster order access. We need to give them a way to disable raster order access and add support for barriers to explicitly perform synchronization.
425
426Three proposals have been added to the Non-coherent access section in this document. The spec changes currently choose Barrier Proposal A: MemoryBarrier via vkCmdPipelineBarrier2.
427
428Vulkan barriers have been difficult for developers to use, so Barrier Proposal C might offer a simpler API.
429
430Consensus was to keep things consistent with existing barriers in Vulkan, so Barrier Proposal A was chosen.
431
432=== 7. RESOLVED: Should this extension reuse OpTypeImage, or introduce a new type for declaring tile images?
433
434OpTypeImage is reused with a special Dim for tile images, following what was done for subpass attachments.
435
436An alternative would have been to make tile images their own type, and introduce an OpTypeTileImage type.
437That would require less special-casing of OpTypeImage, but comes with higher initial burden in tooling.
438
439=== 8. RESOLVED: Should Color, Depth, and Stencil reads use the same SPIR-V opcode?
440
441No. The extension introduces separate opcodes.
442
443Tile based GPUs which guarantee framebuffer residency in tile memory can offer efficient raster order access to color, depth, stencil data with relatively low overhead.
444Some GPU implementations would have a significant performance penalty in raster order access if the implementation cannot determine from the SPIR-V shader whether a specific access is color, depth, or stencil.
445
446This design choice is in-line with other API extensions (GL framebuffer fetch and framebuffer fetch depth stencil) and other APIs where depth/stencil access is clearly disambiguated.
447
448=== 9. RESOLVED: Should Depth and Stencil read opcodes consume an image operand specifying the attachment, or should it be implicit?
449
450No operand is necessary as there is depth and stencil uniquely identify the attachments unlike with color.
451
452The other options considered were:
453
454 A. Allow depth and stencil tile images to be declared as variables. Tile images are defined to map to the color attachment specified via the `Location` decoration - some equivalent needs to be defined for depth and stencil. Pixel Local Storage like functionality of supporting format reinterpretation is only supported for color attachments, and hence must be disallowed for depth and stencil. There is very little benefit to declaring the depth and stencil variables given these restrictions.
455 B. Depth and stencil tile images are exposed as built-in variables.
456
457Given the design choice made for issue 8, the alternate options do not add any value.
458
459=== 10. RESOLVED: Should this extension reuse the image Dim SubpassData or introduce a new Dim?
460
461The extension introduces a new Dim.
462
463This extension is intended to serve as foundation for further functionality - for example Pixel Local Storage like format reinterpretation, or to define the tile size and allow tile shaders to access any pixel within the tile.
464In SPIR-V, input attachments use images with Dim of SubpassData. We use a new Dim so we can easily distinguish whether an image is an input attachment or a tile image.
465
466=== 11. RESOLVED: Should this extension require applications to create and bind descriptors for tile images?
467
468No.
469Some GPUs internally require descriptors to be able to access framebuffer data. The input attachments in Vulkan subpasses help these GPU implementations.
470
471Other GPUs do not require apps to bind such descriptors. The intent with this extension is to provide functionality roughly in the lines of GL_EXT_shader_framebuffer_fetch, GL_EXT_shader_pixel_local_storage - which do not require apps to manage and bind descriptors.
472
473=== 12. RESOLVED: What does 'undefined value' mean for tile image reads?
474
475It simply means that the value has no well-defined meaning to an application. It does _not_ mean that the value is random nor that it could have been leaked from other contexts, processes, or memory other than the framebuffer attachments.
476
477== Further Functionality
478
479=== Fragment Shading Rate interactions
480
481With `VK_KHR_fragment_shading_rate` multi-pixel fragments read some implementation-defined pixel from the input attachments. We could define stronger requirements in this extension.
482
483=== Allow non-aliased Tile Image variables and/or image format redeclaration
484
485This would provide "Pixel local storage" equivalent functionality.
486
487A possible approach for that would be to specify the format as layout parameter - similar to image access:
488[source,c]
489----
490layout(r11f_g11f_b10f) tile readonly highp tileImage normal;
491----
492
493=== Tile Image size query
494
495If we were to allow non-aliased Tile Image variables, we would need to expose some limits on tile image size and tile dimensions so that applications can make performance trade-offs on tile size vs storage requirements.
496
497=== Memoryless attachments
498
499We have lazily allocated images in Vulkan, but they do not guarantee that memory is not allocated.
500