• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1// Copyright (c) 2020-2024 Huawei Technologies Co. Ltd.
2//
3// SPDX-License-Identifier: CC-BY-4.0
4
5= VK_HUAWEI_cluster_culling_shader
6:toc: left
7:refpage: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/
8:sectnums:
9
10
11== Problem Statement
12
13When drawing a scene with a massive amount of geometry, it is necessary to remove the invisible geometry to decrease redundant drawing, a common approach used to remove invisible geometry is called cluster culling, a cluster is a subset of a mesh that has as many shared vertices as possible, alternatively, a cluster can also be an entire mesh, cluster will be pre-computed and stored with the geometry to avoid computation at runtime.
14
15The GPU has numerous of thread and parallel computing capabilities, so it is more suitable for culling many clusters than the CPU. Many developers use compute shader for GPU culling tasks. Because compute shader can not generate output which directly connected to the existing rendering pipeline. It is necessary to separate culling and rendering into two passes. First, the culling pass processes the whole scene and updates the MDI command, and then uses the MDI method to draw during the rendering pass.
16
17== Solution space
18Provide a new extension to connect the output of the compute shader to the existing rendering pipeline, in addition, when drawing a visible cluster, an appropriately shading rate can also be configured. developers who originally used compute shader for culling can easily migrate to this new extension and have better performance.
19
20
21
22== Proposal.
23=== Cluster culling shader
24This extension allowing application to use a new programmable shader type -- Cluster Culling Shader -- to execute geometry culling on GPU. This mechanism does not require pipeline barrier between compute shader and other rendering pipeline.
25
26This new shader type have execution environments similar to that of compute shaders, where a collection of shader invocations form a workgroup and cooperate to perform cluster based culling and level-of-detail selection, a shader invocation can emit a group of built-in output variables treated as a drawing command which can drives subsequent rendering pipeline to draw geometries of cluster with a specific shading rate, e.g. the distance between a cluster and the view point can be used to determine the shading rate of the cluster. These capabilities enables the cluster culling shader to reduce the rendering loading more effectively.
27
28It should be noted that the usage of per-cluster shading rate has the following restrictions:
291. CCS and Vertex Shader cannot output shading rate at the same time.
302. The per-cluster shading rate output by CCS will be regarded as per-primitive shading rate in combiner operation.
313. If CCS does not output per-cluster shading rate, the rules of combiner operation remain unchanged.
32
33=== API changes
34==== shader stage and synchronization
35Extending `VkShaderStageFlagBits:`::
36`VK_SHADER_STAGE_CLUSTER_CULLING_BIT_HUAWEI`
37specifies the cluster shader stage.
38
39Extending `VkPipelineStageFlagBits2:`::
40`VK_PIPELINE_STAGE2_CLUSTER_CULLING_SHADER_BIT_HUAWEI`
41 specifies the cluster pipeline stage for synchronization.
42
43==== New structure
44Extending `VkPhysicalDeviceFeatures2`, `VkDeviceCreateInfo:`::
45VkPhysicalDeviceClusterCullingShaderFeaturesHUAWEI
46
47Extending `VkPhysicalDeviceProperties2:`::
48`VkPhysicalDeviceClusterCullingShaderPropertiesHUAWEI`
49
50==== drawcall
51dispatching command are recording into a command buffer and when executed by a queue, it will produce work which executes according to the bound Cluster Culling Shader pipeline.
52
53To Record a Cluster Culling Shader command:
54```c
55void vkCmdDrawClusterHUAWEI(
56    VkCommandBuffer     commandBuffer,
57    uint32_t            groupCountX,
58    uint32_t            groupCountY,
59    uint32_t            groupCountZ );
60```
61* `commandBuffer` is the command buffer into which the command will be recorded.
62* `groupCountX` is the number of local workgroups to dispatch in the X dimension.
63* `groupCountY` is the number of local workgroups to dispatch in the Y dimension
64* `groupCountZ` is the number of local workgroups to dispatch in the Z dimension
65When the command is executed, a global workgroup consisting of  `groupCountX`  * `groupCountY` * `groupCountZ` local workgroup is assembled.
66
67
68To record an indirect Cluster Culling Shader command:
69```c
70void vkCmdDrawClusterIndirectHUAWEI(
71    VkCommandBuffer     commandBuffer,
72    vkBuffer            buffer,
73    vkDeviceSize        offset );
74```
75
76* `commandBuffer` is the command buffer into which the command will be recorded.
77* `buffer` is the buffer containing dispatch parameters.
78* `offset` is the byte offset into buffer where parameters begin.
79
80`vkCmdDrawClusterIndirectHUAWEI` behaves similarly to `vkCmdDrawClusterHUAWEI` except that the parameters are read by the device from a buffer during execution.
81
82==== feature
83`VkPhysicalDeviceClusterCullingShaderFeaturesHUAWEI` - Structure describing cluster culling shading features that can be supported by an implementation.
84
85`VkPhysicalDeviceClusterCullingShaderFeaturesHUAWEI` structure is defined as:
86```c
87Typedef struct VkPhysicalDeviceClusterCullingShaderFeaturesHUAWEI {
88    VkStructureType             sType;
89    void*                       pNext;
90    VkBool32                    clustercullingShader;
91    VkBool32                    multiviewClusterCullingShader;
92    VkBool32                    clusterShadingRate;
93}VkPhysicalDeviceClusterCullingShaderFeaturesHUAWEI;
94```
95
96* `sType` is the type of this structure.
97* `pNext` is NULL or a pointer to a structure extending this structure.
98* `clustercullingShader` indicates whether the cluster culling stage is supported.
99* `multiviewClusterCullingShader` indicates whether multiview can be used with cluster culling shader.
100* `clusterShadingRate` specifies whether the per-cluster shading rate is supported.
101
102If the `VkPhysicalDeviceClusterCullingShaderFeaturesHUAWEI` structure is included in the `pNext` chain of the `VkPhysicalDeviceFeature2` structure passed to `vkPhysicalDeviceFeature2`, it is filled in to indicate whether each corresponding feature is supported.
103`VkPhysicalDeviceClusterCullingShaderFeaturesHUAWEI` can also be used in the `pNext` chain of `VkDeviceCreateInfo` to selectively enable these features.
104
105
106`VkPhysicalDeviceClusterCullingShaderVrsFeaturesHUAWEI` - Structure describing whether cluster culling shading supported per-cluster shading rate.
107
108`VkPhysicalDeviceClusterCullingShaderVrsFeaturesHUAWEI` structure is defined as:
109```c
110Typedef struct VkPhysicalDeviceClusterCullingShaderVrsFeaturesHUAWEI {
111    VkStructureType		sType;
112    void*				pNext;
113    VkBool32            clusterShadingRate;
114}VkPhysicalDeviceClusterCullingShaderVrsFeaturesHUAWEI;
115```
116
117* `sType` is the type of this structure.
118* `pNext` is NULL or a pointer to a structure extending this structure.
119* `clusterShadingRate` specifies whether the per-cluster shading rate is supported.
120
121To query whether Cluster Culling Shader support per-cluster shading rate, include a `VkPhysicalDeviceClusterCullingShaderVrsFeaturesHUAWEI` structure in the pNext chain of the
122`VkPhysicalDeviceClusterCullingShaderFeaturesHUAWEI` structure passed to `vkPhysicalDeviceFeature2`.
123
124
125==== property
126`VkPhysicalDeviceClusterCullingShaderPropertiesHUAWEI` - Structure describing cluster culling shading properties.
127```c
128Typedef struct VkPhysicalDeviceClusterCullingShaderPropertiesHUAWEI {
129    VkStructureType             sType;
130    void*                       pNext;
131    uint32_t                    maxWorkGroupCount[3];
132    uint32_t                    maxWorkGroupSize[3];
133    uint32_t                    maxOutputClusterCount;
134}VkPhysicalDeviceClusterCullingShaderFeaturesHUAWEI;
135```
136
137* `sType` is the type of this structure.
138
139* `pNext` is NULL or a pointer to a structure extending this structure.
140* `maxWorkgroupCount` is the maximum number of local workgroups that can be launched by a single command. These three value represent the maximum local workgroup count in the X, Y and Z dimensions, respectively. In the current implementation, the values of Y and Z are both implicitly set as one. `groupCountX` of `DrawCluster*` command must be less than or equal to `maxWorkGroupCount[0]`.
141* `maxWorkGroupSize` is the maximum size of a local workgroup. These three value represent the maximum local workgroup size in the X, Y and Z dimensions, respectively. The x, y and z sizes, as specified by the LocalSize or LocalSizeId execution mode or by the object decorated by the WorkgroupSize decoration in shader modules, must be less than or equal to the corresponding limit.
142* `maxOutputClusterCount` is the maximum number of output clusters that a single workgroup may emit.
143
144If the `VkPhysicalDeviceClusterCullingShaderPropertiesHUAWEI` structure is included in the `pNext` chain of the `VkPhysicalDeviceProperties2` structure passed to `vkGetPhysicalDeviceProperties2`, it is filled in with each corresponding implementation-dependent property.
145
146=== SPIR-V changes
147==== new capability
148
149`ClusterCullingShadingHUAWEI`
150
151==== execution model
152`ClusterCullingHUAWEI`
153
154==== built-in
155
156cluster culling shader have the following built-in output variables, these variables form a aforementioned drawing command.
157
158* `IndexCountHUAWEI` is the number of vertices to draw.
159
160* `VertexCountHUAWEI` is the number of vertices to draw.
161* `InstanceCountHUAWEI` is the number of instances to draw.
162* `FirstIndexHUAWEI` is the base index within the index buffer.
163* `FirstVertexHUAWEI` is the index of the first vertex to draw.
164* `VertexOffsetHUAWEI` is the value added to the vertex index before indexing into the vertex buffer.
165* `FirstInstanceHUAWEI` is the instance ID of the first instance to draw.
166* `ClusterIdHUAWEI` is the index of cluster being rendered by this drawing command. Cluster Culling Shader passes this id to vertex shader for cluster related information fetching. When cluster culling shader enable, gl_DrawID will be replaced by gl_ClusterIDHUAWEI in Vertex Shader.
167* `ClusterShadingRateHUAWEI` is the shading rate of cluster being rendering by this drawing command. if `VkPhysicalDeviceClusterCullingShaderFeaturesHUAWEI::clusterShadingRate` is enabled, ClusterShadingRateHUAWEI is settable from Cluster Culling Shader which support coarse shading.
168
169==== new function.
170* `OpDispatchClusterHUAWEI`
171
172Any invocation in Cluster Culling Shader can execute this instruction more than once, after execution, it will emite the Cluster Culling Shader built-in output variables which describe in 3.3.3 to the subsequent rendering pipeline. While a workgroup is done, GPU creates warps for VS according to these output variables, all invocations in VertexShader are responsible for shading the vertices.
173
174=== GLSL changes
175New write-only output blocks are defined for built-in output variables:
176```c
177Type 1 (non-indexed mode):
178out gl_PerClusterHUAWEI
179{
180    uint gl_VertexCountHUAWEI;
181    uint gl_InstanceCountHUAWEI;
182    uint gl_FirstVertexHUAWEI;
183    uint gl_FirstInstanceHUAWEI;
184    uint gl_ClusterIdHUAWEI;
185    uint gl_ClusterShadingRateHUAWEI;
186}
187```
188
189```c
190Type 2 (indexed mode):
191 out gl_PerClusterHUAWEI
192{
193    uint gl_IndexCountHUAWEI;
194    uint gl_InstanceCountHUAWEI;
195    uint gl_FirstIndexHUAWEI ;
196    int  gl_VertexOffsetHUAWEI;
197    uint gl_FirstInstanceHUAWEI;
198    uint gl_ClusterIdHUAWEI;
199    uint gl_ClusterShadingRateHUAWEI;
200}
201```
202
203
204A new function is added:
205```c
206void dispatchClusterHUAWEI(void);
207```
208