• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1// Copyright 2023-2024 The Khronos Group Inc.
2//
3// SPDX-License-Identifier: CC-BY-4.0
4
5# VK_ARM_render_pass_striped
6:toc: left
7:refpage: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/
8:sectnums:
9
10This document describes a proposal for a new extension that allows the
11processing of a render pass instance to be split into stripes.
12
13## Problem Statement
14
15It is common to do post-processing on the images produced by a render pass
16instance. This is typically done using additional render pass instances or
17compute passes on the same graphics and compute queue - and on the same
18physical device.
19
20In some cases, however, the post-processing can be more efficiently done
21on specialized hardware - that is not Vulkan-capable. The various memory
22import and export extensions can support this use-case. But existing
23synchronization requires that the Vulkan device has rendered the complete
24output image before the external device can access it. In many cases, the
25latency would be reduced if the post-processing could start before the
26complete image is rendered.
27
28This proposal aims to support this latency reduction by splitting the
29processing of a render pass instance into a set of stripes that can be
30individually synchronized with an external device.
31
32## Solution Space
33
34The first option to consider is to just use the render area (in
35VkRenderingInfo or VkRenderPassBeginInfo) to render each stripe
36separately. This would be functionality correct, but requires that the
37complete render pass is replicated for each stripe which adds overhead
38for both the application and the implementation.
39
40We want to give the implementation visibility of the stripes such that some
41work can be shared across stripes.
42
43The stripes should be specified per render pass. Extending the render pass
44structures with that information is straightforward.
45
46There are more options for handling the synchronization.
47
48The intent is to synchronize with an external (to Vulkan) consumer, so the
49synchronization primitive must be exportable.
50
51If we use semaphores, when should the semaphore objects be provided?
52The options are:
53
54  . Provide the semaphores along with the queue submission commands (e.g. on vkQueueSubmit) as normal.
55  . Provide the semaphores along with the render pass information.
56
57Providing the semaphores with the render pass would give the implementation
58all the information in one place. But it is unusual to provide semaphores
59during recording, and resubmitting command buffers with semaphores embedded
60in adds complexity.
61
62This proposal picks the first option and describes a mapping between an array
63of semaphores and the render passes in the submitted command buffer.
64
65This proposal does not allow stripes to be specified per subpass for two reasons:
66
67  . It is not necessary since the expected use-case is post-processing of the final image
68  . If different subpasses specified different stripeAreas, any subpass merging of those passes would likely have to be disabled
69
70The expectation in this proposal is that the stripes are only used for the
71last subpass in a render pass instance.
72
73## Proposal
74
75### Features
76
77```c
78typedef struct VkPhysicalDeviceRenderPassStripedFeaturesARM
79{
80    VkStructureType sType;
81    void *pNext;
82    VkBool32 renderPassStriped;
83} VkPhysicalDeviceRenderPassStripedFeaturesARM;
84```
85
86This feature indicates that striped rendering is supported.
87
88### Properties
89
90```c
91typedef struct VkPhysicalDeviceRenderPassStripedPropertiesARM
92{
93    VkStructureType sType;
94    void *pNext;
95    VkExtent2D renderPassStripeGranularity;
96    uint32_t maxRenderPassStripes;
97} VkPhysicalDeviceRenderPassStripedPropertiesARM;
98```
99
100These properties indicate implementation-defined limits on the maximum number
101of stripes that are supported and how fine-grained they can be. For a
102tile-based GPU, the stripe granularity will typically be a multiple of the
103tile size.
104
105### Specifying stripes
106
107Stripes can be specified per render pass instance.
108The following structure can be added to the pNext chain of `VkRenderingInfoKHR`
109or `VkRenderPassBeginInfo`:
110
111```c
112typedef struct VkRenderPassStripeBeginInfoARM
113{
114    VkStructureType sType;
115    const void *pNext;
116    uint32_t stripeInfoCount;
117    VkRenderPassStripeInfoARM *pStripeInfos;
118} VkRenderPassStripeBeginInfoARM;
119```
120
121`stripeInfoCount` is the number of stripes and also the number of elements in
122the `pStripeInfos` array.
123
124The individual stripes are specified by the following structure:
125
126```c
127typedef struct VkRenderPassStripeInfoARM
128{
129    VkStructureType sType;
130    const void *pNext;
131    VkRect2D stripeArea;
132} VkRenderPassStripeInfoARM;
133```
134
135`stripeArea` is the region of the stripe.
136As a rule, the values of `stripeArea.offset.x` and `stripeArea.extent.width`
137must be a multiple of `renderPassStripeGranularity.width`. But it is difficult
138for an application to guarantee that the dimensions of each image is a
139multiple of the stripe granularity. We therefore allow an exception to the
140general rule for the stripe at the edge where `stripeArea.extent.width` does
141not need to be a multiple of `renderPassStripeGranularity.width` as long as
142the sum of `stripeArea.offset.x` and `stripeArea.extent.width` is equal to
143the `renderArea.extent.width` of the render pass instance.
144The same constraints apply to the values of `stripeArea.offset.y` and
145`stripeArea.extent.height`.
146
147In order to synchronize with an external consumer, a semaphore needs to be
148signaled when a stripe completes. These semaphores are specified on queue
149submit by including the following structure in the pNext chain of
150vkSubmitInfo2->VkCommandBufferSubmitInfo:
151
152```c
153typedef struct VkRenderPassStripeSubmitInfoARM
154{
155    VkStructureType sType;
156    const void *pNext;
157    uint32_t stripeSemaphoreInfoCount;
158    const VkSemaphoreSubmitInfo* pStripeSemaphoreInfos;
159} VkRenderPassStripeSubmitInfoARM;
160```
161
162`stripeSemaphoreInfoCount` is the number of elements in `pStripeSemaphoreInfos`.
163The value of `stripeSemaphoreInfoCount` must be equal to the sum of the
164`VkRenderPassStripeBeginInfoARM->stripeInfoCount` parameters that are recorded
165in the `VkCommandBufferSubmitInfo->commandBuffer`.
166
167`pStripeSemaphoreInfos` is a pointer to an array of `VkSemaphoreSubmitInfo`
168structures describing the render pass stripe signal operations.
169The elements of this array are mapped to striped render passes in submission
170order and in stripe order within each render pass.
171Each semaphore is signaled when the associated stripe is complete.
172
173For example, if `VkCommandBufferSubmitInfo->commandBuffer` contains three
174render passes, where the first has two stripes, the second is not striped,
175and the third has three stripes, then `stripeSemaphoreInfoCount` must have
176the value 5, the first two entries of `pStripeSemaphoreInfos` are associated
177with the first render pass and the last three entries are associated with
178the third render pass. The semaphore in the first entry of
179`pStripeSemaphoreInfos` is signaled when the first stripe of the first
180render pass is complete, etc.
181
182## Issues
183
184### PROPOSED: Naming. Do we use "stripe" or "slice"?
185
186The specification uses "slice" to refer to slices of a 3D image.
187This proposal wants to express partitions in 2D.
188The video extensions, e.g., "VK_EXT_video_decode_h264" uses "slice" to mean
189something very similar to what this proposal is aiming to achieve.
190
191We use "stripe" to disambiguate from the way slice is used to describe
192partitions in 3D.
193
194### PROPOSED: Could we use timeline semaphores for this?
195
196Probably. But timeline semaphores are not yet available on all platforms.
197
198Regular timeline semaphores are not shareable to foreign queues so are
199not a good fit for the target use-cases.
200
201External timeline semaphores require the native platform to support
202TIMELINE type native fence. There are two ways to export these:
203
204  . OPAQUE_FD: These are of limited use since they can only be shared by the same driver
205  . Platform timeline fences: There is currently no way to export these from Vulkan
206
207For now, we require binary semaphores.
208
209### PROPOSED: How do stripes interact with multiview or layered rendering?
210
211The main question is whether a stripe covers individual views (or layers of
212the framebuffer) or all of them.
213
214Each view could be post-processed separately - and in that case, one may want
215to start the post-processing for a stripe of one view before the next view is
216processed. In that case, we would want one semaphore per view.
217
218This extension is specified to complete a stripe for all views (or layers)
219before moving on to the next stripe. The main motivation for this choice is
220to reduce the number of synchronization operations required.
221
222### PROPOSED: Is striped rendering supported for all renderable formats?
223
224Yes.
225