extensions/QCOM/QCOM_shading_rate.txt

Name

    QCOM_shading_rate

Name Strings

    GL_QCOM_shading_rate

Contributors

    Jeff Leger
    Robert VanReenen

Contact

    Jeff Leger - jleger 'at' qti.qualcomm.com

Status

    Complete

Version

    Last Modified Date: April 22, 2020
    Revision: #2

Number

    OpenGL ES Extension #279

Dependencies

    OpenGL ES 2.0 is required.  This extension is written against OpenGL ES 3.2.

    This extension interacts with OVR_Multiview.
    This extension interacts with QCOM_framebuffer_foveated and QCOM_texture_foveated

    When this extension is advertised, the implementation must also advertise GLSL
    extension "GL_EXT_fragment_invocation_density" (documented separately), which
    provides new built-in variables that allow fragment shaders to determine the
    effective shading rate used for fragment invocations.

Overview

    By default, OpenGL runs a fragment shader once for each pixel covered by a
    primitive being rasterized.  When using multisampling, the outputs of that
    fragment shader are broadcast to each covered sample of the fragment's
    pixel.  When using multisampling, applications can optionally request that
    the fragment shader be run once per color sample (e.g., by using the "sample"
    qualifier on one or more active fragment shader inputs), or run a minimum
    number of times per pixel using SAMPLE_SHADING enable and the
    MinSampleShading frequency value.

    This extension allows applications to specify fragment shading rates of less
    than 1 invocation per pixel.  Instead of invoking the fragment shader
    once for each covered pixel, the fragment shader can be run once for a
    group of adjacent pixels in the framebuffer.  The outputs of that fragment
    shader invocation are broadcast to each covered samples for all of the pixels
    in the group.  The initial version of this extension allows for groups of
    1, 2, 4, 8, and 16 pixels.

    This can be useful for effects like motion volumetric rendering
    where a portion of scene is processed at full shading rate and a portion can
    be processed at a reduced shading rate, saving power and processing resources.
    The requested rate can vary from (finest and default) 1 fragment shader
    invocation per pixel to (coarsest) one fragment shader invocation for each
    4x4 block of pixels.  Implementations are given wide latitude to rasterize
    at the requested rate or any other rate that is less coarse.

New Tokens

    Accepted by the <pname> parameter of GetIntegerv, GetInterger64v
    and GetFloatv:

         SHADING_RATE_QCOM                        0x96A4

    Accepted by the <cap> parameter of Enable, Disable, IsEnabled:

         SHADING_RATE_PRESERVE_ASPECT_RATIO_QCOM  0x96A5

    Allowed in the <rate> parameter in ShadingRateQCOM:
         SHADING_RATE_1X1_PIXELS_QCOM             0x96A6
         SHADING_RATE_1X2_PIXELS_QCOM             0x96A7
         SHADING_RATE_2X1_PIXELS_QCOM             0x96A8
         SHADING_RATE_2X2_PIXELS_QCOM             0x96A9
         SHADING_RATE_4X2_PIXELS_QCOM             0x96AC
         SHADING_RATE_4X4_PIXELS_QCOM             0x96AE

New Procedures and Functions

    void ShadingRateQCOM(enum rate);

Modifications to the OpenGL ES 3.2 Specification

    Modify Section 8.14.1, Scale Factor and Level of Detail, p. 196

    (Modify the function approximating Scale Factor (P), to allow implementations
     to scale implicit derivatives based on the shading rate.  The scale occurs before
     the LOD bias and before LOD clamping).

     Modify the definitions of (mu, mv, mw):

                    |   du       du    |
          mu = max  |  -----  , -----  |
                    |   dx       dy    |

                    |   dv       dv    |
          mv = max  |  -----  , -----  |
                    |   dx       dy    |

                    |   dw       dw    |
          mw = max  |  -----  , -----  |
                    |   dx       dy    |
     to:
                    |   du          du        |
          mu = max  |  ---- * sx , ---- * sy  |
                    |   dx          dy        |

                    |   dv          dv        |
          mv = max  |  ---- * sx , ---- * sy  |
                    |   dx          dy        |

                    |   dw          dw        |
          mw = max  |  ---- * sx , ---- * sy  |
                    |   dx          dy        |

          where (sx, sy) refer to _effective shading rate_ (w', h') specified in
          section 13.X.2.

    Modify Section 13.4, Multisampling, p. 353

   (add to the end of the section)

        When SHADING_RATE_QCOM is set to a value other than SHADING_RATE_1x1_PIXELS_QCOM,
        the rasterization will occur at the _effective shading rate_ (Section 13.X) and
        will result in fragments covering a <W>x<H> group of pixels.

        When multisample rasterization is enabled, the samples of the fragment will consist
        of the samples for each of the pixels in the group.  The fragment center will be
        the center of this group of pixels.  Each fragment will include a coverage value
        with (W x H x SAMPLES) bits.  For example, if GL_SHADING_RATE_QCOM is is 2X2 and the
        currently bound framebuffer object has SAMPLES equal to 4 (4xMSAA), then the fragment
        will consist of 4 pixels and 16 samples.  Similarly, each fragment will have
        (W * H * SAMPLES) depth values and associated data.

    The contents of Section 13.4.1, Sample Shading, p. 355 is moved to the new Section 13.X.3, "Sample Shading".

    Add new section 13.X before Section 13.5, Points, p. 355

        Section 13.X, Shading Rate

        By default, each fragment processed by programmable fragment processing
        corresponds to a single pixel with a single (x,y) coordinate. When using
        multisampling, implementations are permitted to run separate fragment shader
        invocations for each sample, but often only run a single invocation for all
        samples of the fragment.  We will refer to the density of fragment shader
        invocations as the _shading rate_.
        Applications can use the shading rate to increase the size of fragments to
        cover multiple pixels and reduce the amount of fragment shader work.
        Applications can also use the shading rate to explicitly control the minimum
        number of fragment shader invocations when multisampling.

        Section 13.X.1, Shading Rate Control

        The shading rate can be controlled with the command

           void ShadingRateQCOM(enum rate);

        <rate> specifies the value of SHADING_RATE_QCOM, and defines the
        _shading rate_.  Valid values for <rate> are described in
        table X.1

            Shading Rate                   Size
            ----------------------------   -----
            SHADING_RATE_1X1_PIXELS_QCOM   1x1
            SHADING_RATE_1X2_PIXELS_QCOM   1x2
            SHADING_RATE_2X1_PIXELS_QCOM   2x1
            SHADING_RATE_2X2_PIXELS_QCOM   2x2
            SHADING_RATE_4X2_PIXELS_QCOM   4x2
            SHADING_RATE_4X4_PIXELS_QCOM   4x4

            Table X.1:  Shading rates accepted by ShadingRateQCOM.  An
            entry of "<W>x<H>" in the "Size" column indicates that the shading
            rate request for fragments with a width and height (in pixels) of <W>
            and <H>, respectively.

        If the shading rate is specified with ShadingRateCOM, it will apply to all
        draw buffers.  If the shading rate has not been set , the shading rate
        will be SHADING_RATE_1x1_PIXELS_QCOM.  In either case, the shading rate will
        be further adjusted as described in the following sections.

        Section 13.X.2, Effective Shading Rate

        The value of SHADING_RATE_QCOM, in combination with other GL state,
        is used to derive an adjusted rate or _effective shading rate_, as
        as described in this section.

        Where possible, implementations should provide an _effective shading rate_
        equal to the SHADING_RATE_QCOM.  When this is not possible, an adjusted
        _effective shading rate_ may be used as described in this section.  While
        there is no API for querying the _effective shading rate_, the value of this
        parameter exists, can be queried from the fragment shader built-in gl_FragSizeEXT,
        and is referred to in a number of places in the specification.  Implementations
        may also adjust the shading rate for other reasons not listed here.

        Implementations derive the _effective shading rate_ in an implementation-dependent
        manner.  When rendering to the default framebuffer, the rate may be adjusted
        to 1x1.  When sample shading (section 13.X.3 Sample Shading) is enabled, the
        rate may be adjusted to 1x1.  When the fragment shader uses GLSL built-in
        input variables gl_SampleMaskIn[], gl_SampleMask[], or uses variables
        declared with "centroid in", the rate may be adjusted to 1x1.  When sample coverage
        or sample mask operations are enabled (Section 13.8.3 Multisample Fragment
        Operations), the rate may be adjusted to 1x1.

        The shading rate may be adjusted to limit the number of samples covered by a
        fragment.  For example, if the implementation supports a maximum of 16 samples
        per fragment and if GL_SHADING_RATE_QCOM is 4X4 and the currently bound
        framebuffer object has SAMPLES equal to 4 (4xMSAA), then the number of samples
        per coarse fragment would be 64.  In such an example, an implementation may
        adjust the shading rate to a rate with 16 or fewer samples (e.g., 2x2).

        If the active fragment shader uses any inputs that are qualified with
        "sample" (unique values per sample), including the built-ins "gl_SampleID"
        and "gl_SamplePosition", or the built-in function "interpolateAtSample",
        the shader code is written to expect a separate shader invocation for each
        shaded sample.  For such fragment shaders, the shading rate is adjusted to
        1x1.

        If the <W>x<H> value of SHADING_RATE_QCOM is expressed as <w, h> then the
        adjusted rate may be any <w', h'> as long as (w' * h') <= (w * h).  If
        PRESERVE_SHADING_RATE_ASPECT_RATIO is TRUE, then the implementation further
        guarantees that (w'/h') equals (w/h) or that w'=1 and h'=1.

        Section 13.X.3 Sample Shading

        [[The contents from Section 13.4.1, Sample Shading, p. 355 is copied here]]

    Modifications to Section 13.8.2, Scissor Test (p. 367)
    (add to the end of the section)

    When the _effective shading rate_ results in fragments covering more than one pixel,
    the scissor tests are performed separately for each pixel in the fragment.
    If a pixel covered by a fragment fails the scissor test, that pixel is
    treated as though it was not covered by the primitive.  If all pixels covered
    by a fragment are either not covered by the primitive being rasterized or fail
    the scissor test, the fragment is discarded.

    Modifications to Section 13.8.3, Multisample Fragment Operations (p. 368)

   (modify the last sentence of the the first paragraph to indicate that sample mask
    operations are performed when shading rate is used, even if multisampling is not
    enabled which can produce fragments covering more than one pixel where each pixel
    is considered a "sample")

    Change the following sentence from:
        "If the value of SAMPLE_BUFFERS is not one, this step is skipped."
    to:
        "This step is skipped if SAMPLE_BUFFERS is not one, unless SHADING_RATE_QCOM
        is set to a value other than SHADING_RATE_1x1_PIXELS_QCOM."

    (add to the end of the section)

    When the _effective shading rate_ results in fragments covering more than one pixel,
    each fragment will generate a composite coverage mask that includes separate
    coverage bits for each sample in each pixel covered by the fragment.  This
    composite coverage mask will be used by the GLSL built-in input variable
    gl_SampleMaskIn[] and updated according to the built-in output variable
    gl_SampleMask[].  The number of composite coverage mask bits in the built-in
    variables and their mapping to a specific pixel and sample number
    within that pixel is implementation-defined.

    Modify Section 14.1, Fragment Shader Variables (p. 370)

    (modify sixth paragraph, p. 371, specifying that the "centroid" location
     for multi-pixel fragments is implementation-dependent, and is allowed to
     be outside the primitive)

    After the following sentence:
        "When interpolating variables declared using "centroid in",
         the variable is sampled at a location within the pixel covered
         by the primitive generating the fragment."
    Add the following sentence:
        "When the _effective shading rate_ results in fragments covering more than one
        pixel, variables declared using "centroid in" are sampled from an
        implementation-dependent location within any one of the covered pixels."

    Modify Section 15.1, Per-Fragment Operations (p. 378)

    (insert a new paragraph after the first paragraph of the section)

    When the _effective shading rate_ results in fragments covering multiple pixels,
    the operations described in the section are performed independently for
    each pixel covered by the fragment.  The set of samples covered by each pixel
    is determined by extracting the portion of the fragment's composite coverage
    that applies to that pixel, as described in section 13.8.3.

Errors

    INVALID_ENUM is generated by ShadingRateQCOM if <rate> is not
    a valid shading rate from table X.1

New State

Add to table 21.7, Rasterization

Get Value                               Type  Get Command  Initial Value                     Description     Sec
-------------------------------------   ----  -----------  --------------------------------  --------------  ------
SHADING_RATE_QCOM                       E     GetIntegerV  SHADING_RATE_1x1_PIXELS_BIT_QCOM  shading rate    13.X.1
PRESERVE_SHADING_RATE_ASPECT_RATIO_QCOM B     IsEnabled    FALSE                             maintain aspect 13.X.2

Interactions with OVR_Multiview

    If OVR_Multiview is supported, SHADING_RATE_QCOM applies to all views.

Interactions with QCOM_framebuffer_foveated and QCOM_texture_foveated

    QCOM_framebuffer_foveated and QCOM_texture_foveated specify a pixel
    density which is exposed as a fragment size via the fragment
    shader built-in gl_FragSizeEXT.  This extension defines an effective
    shading rate which is also exposed as a fragment size using the via the
    same built-in.  If either foveation extension is enabled in conjunction with
    this extension, then the value of gl_FragSizeEXT is the component-wise product
    of both fragment sizes.

Issues

  (1) Should the application-specified rate in ShadingRateCOM() be a "hint"
      that can be ignored by the driver, or is the driver reqired to honor
      the requested rate?

      RESOLVED: The driver should honor the application-specified rate where
      possible, but is allowed to use an adjusted rate due to implementation-
      depdendent reasons.  The specific rates supported in the hardware and the
      specific conditions when the rates needs to be adjusted can differ across
      different Adreno GPU families.  This extension gives drivers the flexibility to
      expose this extension on early hardware that may have restrictions and oddities
      while providing applications some (admittedly limited) control over the adjusted
      rate that will be selected.  The actual rate is always exposed via the fragment
      shader built-in.

  (2) If the application-specified rate is only a hint, can developers expect that all the
      shading rates exposed by this extension are supported natively by the HW?

      RESOLVED: The initial version of this extension exposes token values for
      shading rates of 1x1, 1x2, 2x1, 2x2, 4x2, and 4x4.  Most Adreno GPUs supporting
      this extension are expected to support all those rates, although some early HW
      may support fewer rates.  Note that this extension does not include shading
      rates of 1x4, 4x1, nor 2x4 because Adreno GPUs may never support those rates.
      Because a future version of this extension could support those rates,
      we have reserved the token values (0x96AA, 0x96AB, and 0x96AD) for those rates.

  (3) How does this feature work with per-sample shading?

      RESOLVED:  When using per-sample shading, an application is expecting a
      fragment shader to run with a separate invocation per sample.  The
      shading rate might allow for a "coarsening" that would break such
      shaders.  Furthermore, some Adreno families may not support this
      combination.  We've chosen not to explicitly disallow this combination,
      while giving implementions the flexibility to use an adjusted 1x1 sample
      rate.

  (4) How do centroid-sampled variables work with fragments larger than one
      pixel?

      RESOLVED:  For single-pixel fragments, attributes declared with
      "centroid" are sampled at an implementation-dependent location in the
      intersection of the area of the primitive being rasterized and the area
      of the pixel that corresponds to the fragment.  With multi-pixel
      fragments, attributes declared with "centroid" are sampled from an
      implementation-dependent location within any of the covered pixels.
      This wide allowance for implementation-dependent behavior
      enables the extension to be exposed on early Adreno hardware.

  (5) How do built-in variables gl_SampleMask[] and gl_SampleMaskIn[] work with
      fragments larger than one pixel?

      RESOLVED: For single-pixel fragments, gl_SampleMaskIn[] and gl_SampleMask[]
      specify the input and output coverage bits for a single pixel, where bit 'B'
      corresonds to SampleID 'B'.  With this extension enabled, these built-ins would
      specify the coverage bits for all the samples in all the pixels covered by the
      fragment.  In this extension, the exact behavior of gl_SampleMaskIn[] and
      gl_SampleMask[] is implementation-dependent.  For some Adreno GPUs, use of these
      built-in variables will cause the driver to use a 1x1 adjusted sample rate.
      In other cases, the exact mapping of bits to samples/pixels is implementation-
      defined.  This wide allowance for implementation-dependent behavior enables the
      extension to be exposed on early Adreno hardware.

  (6) Are there any restrictions on framebuffer formats used with this feature?
      For example, are EglImages that may contain multi-plane YUV formats supported?

      RESOLVED:  It is implementation-dependent whether shading rate is supported for
      all formats, or only certain formats.  Implementations are allowed to adjust
      the _effective sample rate_ based on the format.

  (7) Does the value of SHADING_RATE_QCOM affect the built in variable gl_Fragcoord?

      RESOLVED: Yes, when the shading rate results in fragments covering multiple pixels,
      gl_Fragcoord will be the window relative coordinates (x,y,z,1/w) of the center of
      the fragment.  For non multisample cases this may not be at a pixel center.  This may
      break shaders that assume pixel center (0.5, 0.5) values for fragcoord.

  (8) Does the shading rate affect the value of gl_SamplePosition or gl_NumSamples?

      RESOLVED:  No, neither built-in is affected.  If the shader usess gl_SamplePosition, the
      shader runs at sample-rate causing the shading rate to be ignored.  gl_NumSamples is
      is the number of samples in the framebuffer object which is unaffected by the value of
      shading rate.

  (9) Should shading rate affect screen-space derivatives?

      RESOLVED: This extension scales the gradients between ajacent fragments by
      the effecive shading rate (w', h').  The resulting increase in computed LOD
      aligns well with the reduced fragment shader invocations in most use cases;
      in other cases the shader author may want to bias the LOD to compensate.
      Shader built-in instructions that return gradient values (dFdx, dFdy, and fwidth)
      are similarly scaled for the same reason.


Revision History

    Rev.    Date    Author    Changes
    ----  --------  --------  ----------------------------------------------
     1    03/17/20  jleger    Initial draft.
     2    04/22/20  jleger    Relaxed the <w', h'> guarantee from "w'<=w and
                              h'<=h" to "w’*h’ <= w*h".