1Name 2 3 QCOM_shading_rate 4 5Name Strings 6 7 GL_QCOM_shading_rate 8 9Contributors 10 11 Jeff Leger 12 Robert VanReenen 13 14Contact 15 16 Jeff Leger - jleger 'at' qti.qualcomm.com 17 18Status 19 20 Complete 21 22Version 23 24 Last Modified Date: April 22, 2020 25 Revision: #2 26 27Number 28 29 OpenGL ES Extension #279 30 31Dependencies 32 33 OpenGL ES 2.0 is required. This extension is written against OpenGL ES 3.2. 34 35 This extension interacts with OVR_Multiview. 36 This extension interacts with QCOM_framebuffer_foveated and QCOM_texture_foveated 37 38 When this extension is advertised, the implementation must also advertise GLSL 39 extension "GL_EXT_fragment_invocation_density" (documented separately), which 40 provides new built-in variables that allow fragment shaders to determine the 41 effective shading rate used for fragment invocations. 42 43Overview 44 45 By default, OpenGL runs a fragment shader once for each pixel covered by a 46 primitive being rasterized. When using multisampling, the outputs of that 47 fragment shader are broadcast to each covered sample of the fragment's 48 pixel. When using multisampling, applications can optionally request that 49 the fragment shader be run once per color sample (e.g., by using the "sample" 50 qualifier on one or more active fragment shader inputs), or run a minimum 51 number of times per pixel using SAMPLE_SHADING enable and the 52 MinSampleShading frequency value. 53 54 This extension allows applications to specify fragment shading rates of less 55 than 1 invocation per pixel. Instead of invoking the fragment shader 56 once for each covered pixel, the fragment shader can be run once for a 57 group of adjacent pixels in the framebuffer. The outputs of that fragment 58 shader invocation are broadcast to each covered samples for all of the pixels 59 in the group. The initial version of this extension allows for groups of 60 1, 2, 4, 8, and 16 pixels. 61 62 This can be useful for effects like motion volumetric rendering 63 where a portion of scene is processed at full shading rate and a portion can 64 be processed at a reduced shading rate, saving power and processing resources. 65 The requested rate can vary from (finest and default) 1 fragment shader 66 invocation per pixel to (coarsest) one fragment shader invocation for each 67 4x4 block of pixels. Implementations are given wide latitude to rasterize 68 at the requested rate or any other rate that is less coarse. 69 70New Tokens 71 72 Accepted by the <pname> parameter of GetIntegerv, GetInterger64v 73 and GetFloatv: 74 75 SHADING_RATE_QCOM 0x96A4 76 77 Accepted by the <cap> parameter of Enable, Disable, IsEnabled: 78 79 SHADING_RATE_PRESERVE_ASPECT_RATIO_QCOM 0x96A5 80 81 Allowed in the <rate> parameter in ShadingRateQCOM: 82 SHADING_RATE_1X1_PIXELS_QCOM 0x96A6 83 SHADING_RATE_1X2_PIXELS_QCOM 0x96A7 84 SHADING_RATE_2X1_PIXELS_QCOM 0x96A8 85 SHADING_RATE_2X2_PIXELS_QCOM 0x96A9 86 SHADING_RATE_4X2_PIXELS_QCOM 0x96AC 87 SHADING_RATE_4X4_PIXELS_QCOM 0x96AE 88 89New Procedures and Functions 90 91 void ShadingRateQCOM(enum rate); 92 93Modifications to the OpenGL ES 3.2 Specification 94 95 Modify Section 8.14.1, Scale Factor and Level of Detail, p. 196 96 97 (Modify the function approximating Scale Factor (P), to allow implementations 98 to scale implicit derivatives based on the shading rate. The scale occurs before 99 the LOD bias and before LOD clamping). 100 101 Modify the definitions of (mu, mv, mw): 102 103 | du du | 104 mu = max | ----- , ----- | 105 | dx dy | 106 107 | dv dv | 108 mv = max | ----- , ----- | 109 | dx dy | 110 111 | dw dw | 112 mw = max | ----- , ----- | 113 | dx dy | 114 to: 115 | du du | 116 mu = max | ---- * sx , ---- * sy | 117 | dx dy | 118 119 | dv dv | 120 mv = max | ---- * sx , ---- * sy | 121 | dx dy | 122 123 | dw dw | 124 mw = max | ---- * sx , ---- * sy | 125 | dx dy | 126 127 where (sx, sy) refer to _effective shading rate_ (w', h') specified in 128 section 13.X.2. 129 130 Modify Section 13.4, Multisampling, p. 353 131 132 (add to the end of the section) 133 134 When SHADING_RATE_QCOM is set to a value other than SHADING_RATE_1x1_PIXELS_QCOM, 135 the rasterization will occur at the _effective shading rate_ (Section 13.X) and 136 will result in fragments covering a <W>x<H> group of pixels. 137 138 When multisample rasterization is enabled, the samples of the fragment will consist 139 of the samples for each of the pixels in the group. The fragment center will be 140 the center of this group of pixels. Each fragment will include a coverage value 141 with (W x H x SAMPLES) bits. For example, if GL_SHADING_RATE_QCOM is is 2X2 and the 142 currently bound framebuffer object has SAMPLES equal to 4 (4xMSAA), then the fragment 143 will consist of 4 pixels and 16 samples. Similarly, each fragment will have 144 (W * H * SAMPLES) depth values and associated data. 145 146 The contents of Section 13.4.1, Sample Shading, p. 355 is moved to the new Section 13.X.3, "Sample Shading". 147 148 Add new section 13.X before Section 13.5, Points, p. 355 149 150 Section 13.X, Shading Rate 151 152 By default, each fragment processed by programmable fragment processing 153 corresponds to a single pixel with a single (x,y) coordinate. When using 154 multisampling, implementations are permitted to run separate fragment shader 155 invocations for each sample, but often only run a single invocation for all 156 samples of the fragment. We will refer to the density of fragment shader 157 invocations as the _shading rate_. 158 Applications can use the shading rate to increase the size of fragments to 159 cover multiple pixels and reduce the amount of fragment shader work. 160 Applications can also use the shading rate to explicitly control the minimum 161 number of fragment shader invocations when multisampling. 162 163 Section 13.X.1, Shading Rate Control 164 165 The shading rate can be controlled with the command 166 167 void ShadingRateQCOM(enum rate); 168 169 <rate> specifies the value of SHADING_RATE_QCOM, and defines the 170 _shading rate_. Valid values for <rate> are described in 171 table X.1 172 173 Shading Rate Size 174 ---------------------------- ----- 175 SHADING_RATE_1X1_PIXELS_QCOM 1x1 176 SHADING_RATE_1X2_PIXELS_QCOM 1x2 177 SHADING_RATE_2X1_PIXELS_QCOM 2x1 178 SHADING_RATE_2X2_PIXELS_QCOM 2x2 179 SHADING_RATE_4X2_PIXELS_QCOM 4x2 180 SHADING_RATE_4X4_PIXELS_QCOM 4x4 181 182 Table X.1: Shading rates accepted by ShadingRateQCOM. An 183 entry of "<W>x<H>" in the "Size" column indicates that the shading 184 rate request for fragments with a width and height (in pixels) of <W> 185 and <H>, respectively. 186 187 If the shading rate is specified with ShadingRateCOM, it will apply to all 188 draw buffers. If the shading rate has not been set , the shading rate 189 will be SHADING_RATE_1x1_PIXELS_QCOM. In either case, the shading rate will 190 be further adjusted as described in the following sections. 191 192 Section 13.X.2, Effective Shading Rate 193 194 The value of SHADING_RATE_QCOM, in combination with other GL state, 195 is used to derive an adjusted rate or _effective shading rate_, as 196 as described in this section. 197 198 Where possible, implementations should provide an _effective shading rate_ 199 equal to the SHADING_RATE_QCOM. When this is not possible, an adjusted 200 _effective shading rate_ may be used as described in this section. While 201 there is no API for querying the _effective shading rate_, the value of this 202 parameter exists, can be queried from the fragment shader built-in gl_FragSizeEXT, 203 and is referred to in a number of places in the specification. Implementations 204 may also adjust the shading rate for other reasons not listed here. 205 206 Implementations derive the _effective shading rate_ in an implementation-dependent 207 manner. When rendering to the default framebuffer, the rate may be adjusted 208 to 1x1. When sample shading (section 13.X.3 Sample Shading) is enabled, the 209 rate may be adjusted to 1x1. When the fragment shader uses GLSL built-in 210 input variables gl_SampleMaskIn[], gl_SampleMask[], or uses variables 211 declared with "centroid in", the rate may be adjusted to 1x1. When sample coverage 212 or sample mask operations are enabled (Section 13.8.3 Multisample Fragment 213 Operations), the rate may be adjusted to 1x1. 214 215 The shading rate may be adjusted to limit the number of samples covered by a 216 fragment. For example, if the implementation supports a maximum of 16 samples 217 per fragment and if GL_SHADING_RATE_QCOM is 4X4 and the currently bound 218 framebuffer object has SAMPLES equal to 4 (4xMSAA), then the number of samples 219 per coarse fragment would be 64. In such an example, an implementation may 220 adjust the shading rate to a rate with 16 or fewer samples (e.g., 2x2). 221 222 If the active fragment shader uses any inputs that are qualified with 223 "sample" (unique values per sample), including the built-ins "gl_SampleID" 224 and "gl_SamplePosition", or the built-in function "interpolateAtSample", 225 the shader code is written to expect a separate shader invocation for each 226 shaded sample. For such fragment shaders, the shading rate is adjusted to 227 1x1. 228 229 If the <W>x<H> value of SHADING_RATE_QCOM is expressed as <w, h> then the 230 adjusted rate may be any <w', h'> as long as (w' * h') <= (w * h). If 231 PRESERVE_SHADING_RATE_ASPECT_RATIO is TRUE, then the implementation further 232 guarantees that (w'/h') equals (w/h) or that w'=1 and h'=1. 233 234 Section 13.X.3 Sample Shading 235 236 [[The contents from Section 13.4.1, Sample Shading, p. 355 is copied here]] 237 238 Modifications to Section 13.8.2, Scissor Test (p. 367) 239 (add to the end of the section) 240 241 When the _effective shading rate_ results in fragments covering more than one pixel, 242 the scissor tests are performed separately for each pixel in the fragment. 243 If a pixel covered by a fragment fails the scissor test, that pixel is 244 treated as though it was not covered by the primitive. If all pixels covered 245 by a fragment are either not covered by the primitive being rasterized or fail 246 the scissor test, the fragment is discarded. 247 248 Modifications to Section 13.8.3, Multisample Fragment Operations (p. 368) 249 250 (modify the last sentence of the the first paragraph to indicate that sample mask 251 operations are performed when shading rate is used, even if multisampling is not 252 enabled which can produce fragments covering more than one pixel where each pixel 253 is considered a "sample") 254 255 Change the following sentence from: 256 "If the value of SAMPLE_BUFFERS is not one, this step is skipped." 257 to: 258 "This step is skipped if SAMPLE_BUFFERS is not one, unless SHADING_RATE_QCOM 259 is set to a value other than SHADING_RATE_1x1_PIXELS_QCOM." 260 261 (add to the end of the section) 262 263 When the _effective shading rate_ results in fragments covering more than one pixel, 264 each fragment will generate a composite coverage mask that includes separate 265 coverage bits for each sample in each pixel covered by the fragment. This 266 composite coverage mask will be used by the GLSL built-in input variable 267 gl_SampleMaskIn[] and updated according to the built-in output variable 268 gl_SampleMask[]. The number of composite coverage mask bits in the built-in 269 variables and their mapping to a specific pixel and sample number 270 within that pixel is implementation-defined. 271 272 Modify Section 14.1, Fragment Shader Variables (p. 370) 273 274 (modify sixth paragraph, p. 371, specifying that the "centroid" location 275 for multi-pixel fragments is implementation-dependent, and is allowed to 276 be outside the primitive) 277 278 After the following sentence: 279 "When interpolating variables declared using "centroid in", 280 the variable is sampled at a location within the pixel covered 281 by the primitive generating the fragment." 282 Add the following sentence: 283 "When the _effective shading rate_ results in fragments covering more than one 284 pixel, variables declared using "centroid in" are sampled from an 285 implementation-dependent location within any one of the covered pixels." 286 287 Modify Section 15.1, Per-Fragment Operations (p. 378) 288 289 (insert a new paragraph after the first paragraph of the section) 290 291 When the _effective shading rate_ results in fragments covering multiple pixels, 292 the operations described in the section are performed independently for 293 each pixel covered by the fragment. The set of samples covered by each pixel 294 is determined by extracting the portion of the fragment's composite coverage 295 that applies to that pixel, as described in section 13.8.3. 296 297Errors 298 299 INVALID_ENUM is generated by ShadingRateQCOM if <rate> is not 300 a valid shading rate from table X.1 301 302New State 303 304Add to table 21.7, Rasterization 305 306Get Value Type Get Command Initial Value Description Sec 307------------------------------------- ---- ----------- -------------------------------- -------------- ------ 308SHADING_RATE_QCOM E GetIntegerV SHADING_RATE_1x1_PIXELS_BIT_QCOM shading rate 13.X.1 309PRESERVE_SHADING_RATE_ASPECT_RATIO_QCOM B IsEnabled FALSE maintain aspect 13.X.2 310 311Interactions with OVR_Multiview 312 313 If OVR_Multiview is supported, SHADING_RATE_QCOM applies to all views. 314 315Interactions with QCOM_framebuffer_foveated and QCOM_texture_foveated 316 317 QCOM_framebuffer_foveated and QCOM_texture_foveated specify a pixel 318 density which is exposed as a fragment size via the fragment 319 shader built-in gl_FragSizeEXT. This extension defines an effective 320 shading rate which is also exposed as a fragment size using the via the 321 same built-in. If either foveation extension is enabled in conjunction with 322 this extension, then the value of gl_FragSizeEXT is the component-wise product 323 of both fragment sizes. 324 325Issues 326 327 (1) Should the application-specified rate in ShadingRateCOM() be a "hint" 328 that can be ignored by the driver, or is the driver reqired to honor 329 the requested rate? 330 331 RESOLVED: The driver should honor the application-specified rate where 332 possible, but is allowed to use an adjusted rate due to implementation- 333 depdendent reasons. The specific rates supported in the hardware and the 334 specific conditions when the rates needs to be adjusted can differ across 335 different Adreno GPU families. This extension gives drivers the flexibility to 336 expose this extension on early hardware that may have restrictions and oddities 337 while providing applications some (admittedly limited) control over the adjusted 338 rate that will be selected. The actual rate is always exposed via the fragment 339 shader built-in. 340 341 (2) If the application-specified rate is only a hint, can developers expect that all the 342 shading rates exposed by this extension are supported natively by the HW? 343 344 RESOLVED: The initial version of this extension exposes token values for 345 shading rates of 1x1, 1x2, 2x1, 2x2, 4x2, and 4x4. Most Adreno GPUs supporting 346 this extension are expected to support all those rates, although some early HW 347 may support fewer rates. Note that this extension does not include shading 348 rates of 1x4, 4x1, nor 2x4 because Adreno GPUs may never support those rates. 349 Because a future version of this extension could support those rates, 350 we have reserved the token values (0x96AA, 0x96AB, and 0x96AD) for those rates. 351 352 (3) How does this feature work with per-sample shading? 353 354 RESOLVED: When using per-sample shading, an application is expecting a 355 fragment shader to run with a separate invocation per sample. The 356 shading rate might allow for a "coarsening" that would break such 357 shaders. Furthermore, some Adreno families may not support this 358 combination. We've chosen not to explicitly disallow this combination, 359 while giving implementions the flexibility to use an adjusted 1x1 sample 360 rate. 361 362 (4) How do centroid-sampled variables work with fragments larger than one 363 pixel? 364 365 RESOLVED: For single-pixel fragments, attributes declared with 366 "centroid" are sampled at an implementation-dependent location in the 367 intersection of the area of the primitive being rasterized and the area 368 of the pixel that corresponds to the fragment. With multi-pixel 369 fragments, attributes declared with "centroid" are sampled from an 370 implementation-dependent location within any of the covered pixels. 371 This wide allowance for implementation-dependent behavior 372 enables the extension to be exposed on early Adreno hardware. 373 374 (5) How do built-in variables gl_SampleMask[] and gl_SampleMaskIn[] work with 375 fragments larger than one pixel? 376 377 RESOLVED: For single-pixel fragments, gl_SampleMaskIn[] and gl_SampleMask[] 378 specify the input and output coverage bits for a single pixel, where bit 'B' 379 corresonds to SampleID 'B'. With this extension enabled, these built-ins would 380 specify the coverage bits for all the samples in all the pixels covered by the 381 fragment. In this extension, the exact behavior of gl_SampleMaskIn[] and 382 gl_SampleMask[] is implementation-dependent. For some Adreno GPUs, use of these 383 built-in variables will cause the driver to use a 1x1 adjusted sample rate. 384 In other cases, the exact mapping of bits to samples/pixels is implementation- 385 defined. This wide allowance for implementation-dependent behavior enables the 386 extension to be exposed on early Adreno hardware. 387 388 (6) Are there any restrictions on framebuffer formats used with this feature? 389 For example, are EglImages that may contain multi-plane YUV formats supported? 390 391 RESOLVED: It is implementation-dependent whether shading rate is supported for 392 all formats, or only certain formats. Implementations are allowed to adjust 393 the _effective sample rate_ based on the format. 394 395 (7) Does the value of SHADING_RATE_QCOM affect the built in variable gl_Fragcoord? 396 397 RESOLVED: Yes, when the shading rate results in fragments covering multiple pixels, 398 gl_Fragcoord will be the window relative coordinates (x,y,z,1/w) of the center of 399 the fragment. For non multisample cases this may not be at a pixel center. This may 400 break shaders that assume pixel center (0.5, 0.5) values for fragcoord. 401 402 (8) Does the shading rate affect the value of gl_SamplePosition or gl_NumSamples? 403 404 RESOLVED: No, neither built-in is affected. If the shader usess gl_SamplePosition, the 405 shader runs at sample-rate causing the shading rate to be ignored. gl_NumSamples is 406 is the number of samples in the framebuffer object which is unaffected by the value of 407 shading rate. 408 409 (9) Should shading rate affect screen-space derivatives? 410 411 RESOLVED: This extension scales the gradients between ajacent fragments by 412 the effecive shading rate (w', h'). The resulting increase in computed LOD 413 aligns well with the reduced fragment shader invocations in most use cases; 414 in other cases the shader author may want to bias the LOD to compensate. 415 Shader built-in instructions that return gradient values (dFdx, dFdy, and fwidth) 416 are similarly scaled for the same reason. 417 418 419Revision History 420 421 Rev. Date Author Changes 422 ---- -------- -------- ---------------------------------------------- 423 1 03/17/20 jleger Initial draft. 424 2 04/22/20 jleger Relaxed the <w', h'> guarantee from "w'<=w and 425 h'<=h" to "w’*h’ <= w*h". 426