• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Name
2
3    QCOM_shading_rate
4
5Name Strings
6
7    GL_QCOM_shading_rate
8
9Contributors
10
11    Jeff Leger
12    Robert VanReenen
13
14Contact
15
16    Jeff Leger - jleger 'at' qti.qualcomm.com
17
18Status
19
20    Complete
21
22Version
23
24    Last Modified Date: April 22, 2020
25    Revision: #2
26
27Number
28
29    OpenGL ES Extension #279
30
31Dependencies
32
33    OpenGL ES 2.0 is required.  This extension is written against OpenGL ES 3.2.
34
35    This extension interacts with OVR_Multiview.
36    This extension interacts with QCOM_framebuffer_foveated and QCOM_texture_foveated
37
38    When this extension is advertised, the implementation must also advertise GLSL
39    extension "GL_EXT_fragment_invocation_density" (documented separately), which
40    provides new built-in variables that allow fragment shaders to determine the
41    effective shading rate used for fragment invocations.
42
43Overview
44
45    By default, OpenGL runs a fragment shader once for each pixel covered by a
46    primitive being rasterized.  When using multisampling, the outputs of that
47    fragment shader are broadcast to each covered sample of the fragment's
48    pixel.  When using multisampling, applications can optionally request that
49    the fragment shader be run once per color sample (e.g., by using the "sample"
50    qualifier on one or more active fragment shader inputs), or run a minimum
51    number of times per pixel using SAMPLE_SHADING enable and the
52    MinSampleShading frequency value.
53
54    This extension allows applications to specify fragment shading rates of less
55    than 1 invocation per pixel.  Instead of invoking the fragment shader
56    once for each covered pixel, the fragment shader can be run once for a
57    group of adjacent pixels in the framebuffer.  The outputs of that fragment
58    shader invocation are broadcast to each covered samples for all of the pixels
59    in the group.  The initial version of this extension allows for groups of
60    1, 2, 4, 8, and 16 pixels.
61
62    This can be useful for effects like motion volumetric rendering
63    where a portion of scene is processed at full shading rate and a portion can
64    be processed at a reduced shading rate, saving power and processing resources.
65    The requested rate can vary from (finest and default) 1 fragment shader
66    invocation per pixel to (coarsest) one fragment shader invocation for each
67    4x4 block of pixels.  Implementations are given wide latitude to rasterize
68    at the requested rate or any other rate that is less coarse.
69
70New Tokens
71
72    Accepted by the <pname> parameter of GetIntegerv, GetInterger64v
73    and GetFloatv:
74
75         SHADING_RATE_QCOM                        0x96A4
76
77    Accepted by the <cap> parameter of Enable, Disable, IsEnabled:
78
79         SHADING_RATE_PRESERVE_ASPECT_RATIO_QCOM  0x96A5
80
81    Allowed in the <rate> parameter in ShadingRateQCOM:
82         SHADING_RATE_1X1_PIXELS_QCOM             0x96A6
83         SHADING_RATE_1X2_PIXELS_QCOM             0x96A7
84         SHADING_RATE_2X1_PIXELS_QCOM             0x96A8
85         SHADING_RATE_2X2_PIXELS_QCOM             0x96A9
86         SHADING_RATE_4X2_PIXELS_QCOM             0x96AC
87         SHADING_RATE_4X4_PIXELS_QCOM             0x96AE
88
89New Procedures and Functions
90
91    void ShadingRateQCOM(enum rate);
92
93Modifications to the OpenGL ES 3.2 Specification
94
95    Modify Section 8.14.1, Scale Factor and Level of Detail, p. 196
96
97    (Modify the function approximating Scale Factor (P), to allow implementations
98     to scale implicit derivatives based on the shading rate.  The scale occurs before
99     the LOD bias and before LOD clamping).
100
101     Modify the definitions of (mu, mv, mw):
102
103                    |   du       du    |
104          mu = max  |  -----  , -----  |
105                    |   dx       dy    |
106
107                    |   dv       dv    |
108          mv = max  |  -----  , -----  |
109                    |   dx       dy    |
110
111                    |   dw       dw    |
112          mw = max  |  -----  , -----  |
113                    |   dx       dy    |
114     to:
115                    |   du          du        |
116          mu = max  |  ---- * sx , ---- * sy  |
117                    |   dx          dy        |
118
119                    |   dv          dv        |
120          mv = max  |  ---- * sx , ---- * sy  |
121                    |   dx          dy        |
122
123                    |   dw          dw        |
124          mw = max  |  ---- * sx , ---- * sy  |
125                    |   dx          dy        |
126
127          where (sx, sy) refer to _effective shading rate_ (w', h') specified in
128          section 13.X.2.
129
130    Modify Section 13.4, Multisampling, p. 353
131
132   (add to the end of the section)
133
134        When SHADING_RATE_QCOM is set to a value other than SHADING_RATE_1x1_PIXELS_QCOM,
135        the rasterization will occur at the _effective shading rate_ (Section 13.X) and
136        will result in fragments covering a <W>x<H> group of pixels.
137
138        When multisample rasterization is enabled, the samples of the fragment will consist
139        of the samples for each of the pixels in the group.  The fragment center will be
140        the center of this group of pixels.  Each fragment will include a coverage value
141        with (W x H x SAMPLES) bits.  For example, if GL_SHADING_RATE_QCOM is is 2X2 and the
142        currently bound framebuffer object has SAMPLES equal to 4 (4xMSAA), then the fragment
143        will consist of 4 pixels and 16 samples.  Similarly, each fragment will have
144        (W * H * SAMPLES) depth values and associated data.
145
146    The contents of Section 13.4.1, Sample Shading, p. 355 is moved to the new Section 13.X.3, "Sample Shading".
147
148    Add new section 13.X before Section 13.5, Points, p. 355
149
150        Section 13.X, Shading Rate
151
152        By default, each fragment processed by programmable fragment processing
153        corresponds to a single pixel with a single (x,y) coordinate. When using
154        multisampling, implementations are permitted to run separate fragment shader
155        invocations for each sample, but often only run a single invocation for all
156        samples of the fragment.  We will refer to the density of fragment shader
157        invocations as the _shading rate_.
158        Applications can use the shading rate to increase the size of fragments to
159        cover multiple pixels and reduce the amount of fragment shader work.
160        Applications can also use the shading rate to explicitly control the minimum
161        number of fragment shader invocations when multisampling.
162
163        Section 13.X.1, Shading Rate Control
164
165        The shading rate can be controlled with the command
166
167           void ShadingRateQCOM(enum rate);
168
169        <rate> specifies the value of SHADING_RATE_QCOM, and defines the
170        _shading rate_.  Valid values for <rate> are described in
171        table X.1
172
173            Shading Rate                   Size
174            ----------------------------   -----
175            SHADING_RATE_1X1_PIXELS_QCOM   1x1
176            SHADING_RATE_1X2_PIXELS_QCOM   1x2
177            SHADING_RATE_2X1_PIXELS_QCOM   2x1
178            SHADING_RATE_2X2_PIXELS_QCOM   2x2
179            SHADING_RATE_4X2_PIXELS_QCOM   4x2
180            SHADING_RATE_4X4_PIXELS_QCOM   4x4
181
182            Table X.1:  Shading rates accepted by ShadingRateQCOM.  An
183            entry of "<W>x<H>" in the "Size" column indicates that the shading
184            rate request for fragments with a width and height (in pixels) of <W>
185            and <H>, respectively.
186
187        If the shading rate is specified with ShadingRateCOM, it will apply to all
188        draw buffers.  If the shading rate has not been set , the shading rate
189        will be SHADING_RATE_1x1_PIXELS_QCOM.  In either case, the shading rate will
190        be further adjusted as described in the following sections.
191
192        Section 13.X.2, Effective Shading Rate
193
194        The value of SHADING_RATE_QCOM, in combination with other GL state,
195        is used to derive an adjusted rate or _effective shading rate_, as
196        as described in this section.
197
198        Where possible, implementations should provide an _effective shading rate_
199        equal to the SHADING_RATE_QCOM.  When this is not possible, an adjusted
200        _effective shading rate_ may be used as described in this section.  While
201        there is no API for querying the _effective shading rate_, the value of this
202        parameter exists, can be queried from the fragment shader built-in gl_FragSizeEXT,
203        and is referred to in a number of places in the specification.  Implementations
204        may also adjust the shading rate for other reasons not listed here.
205
206        Implementations derive the _effective shading rate_ in an implementation-dependent
207        manner.  When rendering to the default framebuffer, the rate may be adjusted
208        to 1x1.  When sample shading (section 13.X.3 Sample Shading) is enabled, the
209        rate may be adjusted to 1x1.  When the fragment shader uses GLSL built-in
210        input variables gl_SampleMaskIn[], gl_SampleMask[], or uses variables
211        declared with "centroid in", the rate may be adjusted to 1x1.  When sample coverage
212        or sample mask operations are enabled (Section 13.8.3 Multisample Fragment
213        Operations), the rate may be adjusted to 1x1.
214
215        The shading rate may be adjusted to limit the number of samples covered by a
216        fragment.  For example, if the implementation supports a maximum of 16 samples
217        per fragment and if GL_SHADING_RATE_QCOM is 4X4 and the currently bound
218        framebuffer object has SAMPLES equal to 4 (4xMSAA), then the number of samples
219        per coarse fragment would be 64.  In such an example, an implementation may
220        adjust the shading rate to a rate with 16 or fewer samples (e.g., 2x2).
221
222        If the active fragment shader uses any inputs that are qualified with
223        "sample" (unique values per sample), including the built-ins "gl_SampleID"
224        and "gl_SamplePosition", or the built-in function "interpolateAtSample",
225        the shader code is written to expect a separate shader invocation for each
226        shaded sample.  For such fragment shaders, the shading rate is adjusted to
227        1x1.
228
229        If the <W>x<H> value of SHADING_RATE_QCOM is expressed as <w, h> then the
230        adjusted rate may be any <w', h'> as long as (w' * h') <= (w * h).  If
231        PRESERVE_SHADING_RATE_ASPECT_RATIO is TRUE, then the implementation further
232        guarantees that (w'/h') equals (w/h) or that w'=1 and h'=1.
233
234        Section 13.X.3 Sample Shading
235
236        [[The contents from Section 13.4.1, Sample Shading, p. 355 is copied here]]
237
238    Modifications to Section 13.8.2, Scissor Test (p. 367)
239    (add to the end of the section)
240
241    When the _effective shading rate_ results in fragments covering more than one pixel,
242    the scissor tests are performed separately for each pixel in the fragment.
243    If a pixel covered by a fragment fails the scissor test, that pixel is
244    treated as though it was not covered by the primitive.  If all pixels covered
245    by a fragment are either not covered by the primitive being rasterized or fail
246    the scissor test, the fragment is discarded.
247
248    Modifications to Section 13.8.3, Multisample Fragment Operations (p. 368)
249
250   (modify the last sentence of the the first paragraph to indicate that sample mask
251    operations are performed when shading rate is used, even if multisampling is not
252    enabled which can produce fragments covering more than one pixel where each pixel
253    is considered a "sample")
254
255    Change the following sentence from:
256        "If the value of SAMPLE_BUFFERS is not one, this step is skipped."
257    to:
258        "This step is skipped if SAMPLE_BUFFERS is not one, unless SHADING_RATE_QCOM
259        is set to a value other than SHADING_RATE_1x1_PIXELS_QCOM."
260
261    (add to the end of the section)
262
263    When the _effective shading rate_ results in fragments covering more than one pixel,
264    each fragment will generate a composite coverage mask that includes separate
265    coverage bits for each sample in each pixel covered by the fragment.  This
266    composite coverage mask will be used by the GLSL built-in input variable
267    gl_SampleMaskIn[] and updated according to the built-in output variable
268    gl_SampleMask[].  The number of composite coverage mask bits in the built-in
269    variables and their mapping to a specific pixel and sample number
270    within that pixel is implementation-defined.
271
272    Modify Section 14.1, Fragment Shader Variables (p. 370)
273
274    (modify sixth paragraph, p. 371, specifying that the "centroid" location
275     for multi-pixel fragments is implementation-dependent, and is allowed to
276     be outside the primitive)
277
278    After the following sentence:
279        "When interpolating variables declared using "centroid in",
280         the variable is sampled at a location within the pixel covered
281         by the primitive generating the fragment."
282    Add the following sentence:
283        "When the _effective shading rate_ results in fragments covering more than one
284        pixel, variables declared using "centroid in" are sampled from an
285        implementation-dependent location within any one of the covered pixels."
286
287    Modify Section 15.1, Per-Fragment Operations (p. 378)
288
289    (insert a new paragraph after the first paragraph of the section)
290
291    When the _effective shading rate_ results in fragments covering multiple pixels,
292    the operations described in the section are performed independently for
293    each pixel covered by the fragment.  The set of samples covered by each pixel
294    is determined by extracting the portion of the fragment's composite coverage
295    that applies to that pixel, as described in section 13.8.3.
296
297Errors
298
299    INVALID_ENUM is generated by ShadingRateQCOM if <rate> is not
300    a valid shading rate from table X.1
301
302New State
303
304Add to table 21.7, Rasterization
305
306Get Value                               Type  Get Command  Initial Value                     Description     Sec
307-------------------------------------   ----  -----------  --------------------------------  --------------  ------
308SHADING_RATE_QCOM                       E     GetIntegerV  SHADING_RATE_1x1_PIXELS_BIT_QCOM  shading rate    13.X.1
309PRESERVE_SHADING_RATE_ASPECT_RATIO_QCOM B     IsEnabled    FALSE                             maintain aspect 13.X.2
310
311Interactions with OVR_Multiview
312
313    If OVR_Multiview is supported, SHADING_RATE_QCOM applies to all views.
314
315Interactions with QCOM_framebuffer_foveated and QCOM_texture_foveated
316
317    QCOM_framebuffer_foveated and QCOM_texture_foveated specify a pixel
318    density which is exposed as a fragment size via the fragment
319    shader built-in gl_FragSizeEXT.  This extension defines an effective
320    shading rate which is also exposed as a fragment size using the via the
321    same built-in.  If either foveation extension is enabled in conjunction with
322    this extension, then the value of gl_FragSizeEXT is the component-wise product
323    of both fragment sizes.
324
325Issues
326
327  (1) Should the application-specified rate in ShadingRateCOM() be a "hint"
328      that can be ignored by the driver, or is the driver reqired to honor
329      the requested rate?
330
331      RESOLVED: The driver should honor the application-specified rate where
332      possible, but is allowed to use an adjusted rate due to implementation-
333      depdendent reasons.  The specific rates supported in the hardware and the
334      specific conditions when the rates needs to be adjusted can differ across
335      different Adreno GPU families.  This extension gives drivers the flexibility to
336      expose this extension on early hardware that may have restrictions and oddities
337      while providing applications some (admittedly limited) control over the adjusted
338      rate that will be selected.  The actual rate is always exposed via the fragment
339      shader built-in.
340
341  (2) If the application-specified rate is only a hint, can developers expect that all the
342      shading rates exposed by this extension are supported natively by the HW?
343
344      RESOLVED: The initial version of this extension exposes token values for
345      shading rates of 1x1, 1x2, 2x1, 2x2, 4x2, and 4x4.  Most Adreno GPUs supporting
346      this extension are expected to support all those rates, although some early HW
347      may support fewer rates.  Note that this extension does not include shading
348      rates of 1x4, 4x1, nor 2x4 because Adreno GPUs may never support those rates.
349      Because a future version of this extension could support those rates,
350      we have reserved the token values (0x96AA, 0x96AB, and 0x96AD) for those rates.
351
352  (3) How does this feature work with per-sample shading?
353
354      RESOLVED:  When using per-sample shading, an application is expecting a
355      fragment shader to run with a separate invocation per sample.  The
356      shading rate might allow for a "coarsening" that would break such
357      shaders.  Furthermore, some Adreno families may not support this
358      combination.  We've chosen not to explicitly disallow this combination,
359      while giving implementions the flexibility to use an adjusted 1x1 sample
360      rate.
361
362  (4) How do centroid-sampled variables work with fragments larger than one
363      pixel?
364
365      RESOLVED:  For single-pixel fragments, attributes declared with
366      "centroid" are sampled at an implementation-dependent location in the
367      intersection of the area of the primitive being rasterized and the area
368      of the pixel that corresponds to the fragment.  With multi-pixel
369      fragments, attributes declared with "centroid" are sampled from an
370      implementation-dependent location within any of the covered pixels.
371      This wide allowance for implementation-dependent behavior
372      enables the extension to be exposed on early Adreno hardware.
373
374  (5) How do built-in variables gl_SampleMask[] and gl_SampleMaskIn[] work with
375      fragments larger than one pixel?
376
377      RESOLVED: For single-pixel fragments, gl_SampleMaskIn[] and gl_SampleMask[]
378      specify the input and output coverage bits for a single pixel, where bit 'B'
379      corresonds to SampleID 'B'.  With this extension enabled, these built-ins would
380      specify the coverage bits for all the samples in all the pixels covered by the
381      fragment.  In this extension, the exact behavior of gl_SampleMaskIn[] and
382      gl_SampleMask[] is implementation-dependent.  For some Adreno GPUs, use of these
383      built-in variables will cause the driver to use a 1x1 adjusted sample rate.
384      In other cases, the exact mapping of bits to samples/pixels is implementation-
385      defined.  This wide allowance for implementation-dependent behavior enables the
386      extension to be exposed on early Adreno hardware.
387
388  (6) Are there any restrictions on framebuffer formats used with this feature?
389      For example, are EglImages that may contain multi-plane YUV formats supported?
390
391      RESOLVED:  It is implementation-dependent whether shading rate is supported for
392      all formats, or only certain formats.  Implementations are allowed to adjust
393      the _effective sample rate_ based on the format.
394
395  (7) Does the value of SHADING_RATE_QCOM affect the built in variable gl_Fragcoord?
396
397      RESOLVED: Yes, when the shading rate results in fragments covering multiple pixels,
398      gl_Fragcoord will be the window relative coordinates (x,y,z,1/w) of the center of
399      the fragment.  For non multisample cases this may not be at a pixel center.  This may
400      break shaders that assume pixel center (0.5, 0.5) values for fragcoord.
401
402  (8) Does the shading rate affect the value of gl_SamplePosition or gl_NumSamples?
403
404      RESOLVED:  No, neither built-in is affected.  If the shader usess gl_SamplePosition, the
405      shader runs at sample-rate causing the shading rate to be ignored.  gl_NumSamples is
406      is the number of samples in the framebuffer object which is unaffected by the value of
407      shading rate.
408
409  (9) Should shading rate affect screen-space derivatives?
410
411      RESOLVED: This extension scales the gradients between ajacent fragments by
412      the effecive shading rate (w', h').  The resulting increase in computed LOD
413      aligns well with the reduced fragment shader invocations in most use cases;
414      in other cases the shader author may want to bias the LOD to compensate.
415      Shader built-in instructions that return gradient values (dFdx, dFdy, and fwidth)
416      are similarly scaled for the same reason.
417
418
419Revision History
420
421    Rev.    Date    Author    Changes
422    ----  --------  --------  ----------------------------------------------
423     1    03/17/20  jleger    Initial draft.
424     2    04/22/20  jleger    Relaxed the <w', h'> guarantee from "w'<=w and
425                              h'<=h" to "w’*h’ <= w*h".
426