• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1// Copyright (c) 2018-2020 NVIDIA Corporation
2//
3// SPDX-License-Identifier: CC-BY-4.0
4
5include::{generated}/meta/{refprefix}VK_NV_shader_image_footprint.adoc[]
6
7=== Other Extension Metadata
8
9*Last Modified Date*::
10    2018-09-13
11*IP Status*::
12    No known IP claims.
13*Interactions and External Dependencies*::
14  - This extension provides API support for
15    {GLSLregistry}/nv/GLSL_NV_shader_texture_footprint.txt[`GL_NV_shader_texture_footprint`]
16*Contributors*::
17  - Pat Brown, NVIDIA
18  - Chris Lentini, NVIDIA
19  - Daniel Koch, NVIDIA
20  - Jeff Bolz, NVIDIA
21
22
23=== Description
24
25This extension adds Vulkan support for the
26{spirv}/NV/SPV_NV_shader_image_footprint.html[`SPV_NV_shader_image_footprint`]
27SPIR-V extension.
28That SPIR-V extension provides a new instruction
29code:OpImageSampleFootprintNV allowing shaders to determine the set of
30texels that would be accessed by an equivalent filtered texture lookup.
31
32Instead of returning a filtered texture value, the instruction returns a
33structure that can be interpreted by shader code to determine the footprint
34of a filtered texture lookup.
35This structure includes integer values that identify a small neighborhood of
36texels in the image being accessed and a bitfield that indicates which
37texels in that neighborhood would be used.
38The structure also includes a bitfield where each bit identifies whether any
39texel in a small aligned block of texels would be fetched by the texture
40lookup.
41The size of each block is specified by an access _granularity_ provided by
42the shader.
43The minimum granularity supported by this extension is 2x2 (for 2D textures)
44and 2x2x2 (for 3D textures); the maximum granularity is 256x256 (for 2D
45textures) or 64x32x32 (for 3D textures).
46Each footprint query returns the footprint from a single texture level.
47When using minification filters that combine accesses from multiple mipmap
48levels, shaders must perform separate queries for the two levels accessed
49("`fine`" and "`coarse`").
50The footprint query also returns a flag indicating if the texture lookup
51would access texels from only one mipmap level or from two neighboring
52levels.
53
54This extension should be useful for multi-pass rendering operations that do
55an initial expensive rendering pass to produce a first image that is then
56used as a texture for a second pass.
57If the second pass ends up accessing only portions of the first image (e.g.,
58due to visibility), the work spent rendering the non-accessed portion of the
59first image was wasted.
60With this feature, an application can limit this waste using an initial pass
61over the geometry in the second image that performs a footprint query for
62each visible pixel to determine the set of pixels that it needs from the
63first image.
64This pass would accumulate an aggregate footprint of all visible pixels into
65a separate "`footprint image`" using shader atomics.
66Then, when rendering the first image, the application can kill all shading
67work for pixels not in this aggregate footprint.
68
69This extension has a number of limitations.
70The code:OpImageSampleFootprintNV instruction only supports for two- and
71three-dimensional textures.
72Footprint evaluation only supports the CLAMP_TO_EDGE wrap mode; results are
73undefined: for all other wrap modes.
74Only a limited set of granularity values and that set does not support
75separate coverage information for each texel in the original image.
76
77When using SPIR-V generated from the OpenGL Shading Language, the new
78instruction will be generated from code using the new
79code:textureFootprint*NV built-in functions from the
80`GL_NV_shader_texture_footprint` shading language extension.
81
82include::{generated}/interfaces/VK_NV_shader_image_footprint.adoc[]
83
84=== New SPIR-V Capability
85
86  * <<spirvenv-capabilities-table-ImageFootprintNV, code:ImageFootprintNV>>
87
88=== Issues
89
90(1) The footprint returned by the SPIR-V instruction is a structure that
91    includes an anchor, an offset, and a mask that represents a 8x8 or 4x4x4
92    neighborhood of texel groups.
93    But the bits of the mask are not stored in simple pitch order.
94    Why is the footprint built this way?
95
96*RESOLVED*: We expect that applications using this feature will want to use
97a fixed granularity and accumulate coverage information from the returned
98footprints into an aggregate "`footprint image`" that tracks the portions of
99an image that would be needed by regular texture filtering.
100If an application is using a two-dimensional image with 4x4 pixel
101granularity, we expect that the footprint image will use 64-bit texels where
102each bit in an 8x8 array of bits corresponds to coverage for a 4x4 block in
103the original image.
104Texel (0,0) in the footprint image would correspond to texels (0,0) through
105(31,31) in the original image.
106
107In the usual case, the footprint for a single access will fully contained in
108a 32x32 aligned region of the original texture, which corresponds to a
109single 64-bit texel in the footprint image.
110In that case, the implementation will return an anchor coordinate pointing
111at the single footprint image texel, an offset vector of (0,0), and a mask
112whose bits are aligned with the bits in the footprint texel.
113For this case, the shader can simply atomically OR the mask bits into the
114contents of the footprint texel to accumulate footprint coverage.
115
116In the worst case, the footprint for a single access spans multiple 32x32
117aligned regions and may require updates to four separate footprint image
118texels.
119In this case, the implementation will return an anchor coordinate pointing
120at the lower right footprint image texel and an offset will identify how
121many "`columns`" and "`rows`" of the returned 8x8 mask correspond to
122footprint texels to the left and above the anchor texel.
123If the anchor is (2,3), the 64 bits of the returned mask are arranged
124spatially as follows, where each 4x4 block is assigned a bit number that
125matches its bit number in the footprint image texels:
126
127----
128    +-------------------------+-------------------------+
129    | -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
130    | -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
131    | -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
132    | -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
133    | -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
134    | -- -- -- -- -- -- 46 47 | 40 41 42 43 44 45 -- -- |
135    | -- -- -- -- -- -- 54 55 | 48 49 50 51 52 53 -- -- |
136    | -- -- -- -- -- -- 62 63 | 56 57 58 59 60 61 -- -- |
137    +-------------------------+-------------------------+
138    | -- -- -- -- -- -- 06 07 | 00 01 02 03 04 05 -- -- |
139    | -- -- -- -- -- -- 14 15 | 08 09 10 11 12 13 -- -- |
140    | -- -- -- -- -- -- 22 23 | 16 17 18 19 20 21 -- -- |
141    | -- -- -- -- -- -- 30 31 | 24 25 26 27 28 29 -- -- |
142    | -- -- -- -- -- -- 38 39 | 32 33 34 35 36 37 -- -- |
143    | -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
144    | -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
145    | -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
146    +-------------------------+-------------------------+
147----
148
149To accumulate coverage for each of the four footprint image texels, a shader
150can AND the returned mask with simple masks derived from the x and y offset
151values and then atomically OR the updated mask bits into the contents of the
152corresponding footprint texel.
153
154[source,c++]
155----
156    uint64_t returnedMask = (uint64_t(footprint.mask.x) | (uint64_t(footprint.mask.y) << 32));
157    uint64_t rightMask    = ((0xFF >> footprint.offset.x) * 0x0101010101010101UL);
158    uint64_t bottomMask   = 0xFFFFFFFFFFFFFFFFUL >> (8 * footprint.offset.y);
159    uint64_t bottomRight  = returnedMask & bottomMask & rightMask;
160    uint64_t bottomLeft   = returnedMask & bottomMask & (~rightMask);
161    uint64_t topRight     = returnedMask & (~bottomMask) & rightMask;
162    uint64_t topLeft      = returnedMask & (~bottomMask) & (~rightMask);
163----
164
165(2) What should an application do to ensure maximum performance when
166accumulating footprints into an aggregate footprint image?
167
168*RESOLVED*: We expect that the most common usage of this feature will be to
169accumulate aggregate footprint coverage, as described in the previous issue.
170Even if you ignore the anisotropic filtering case where the implementation
171may return a granularity larger than that requested by the caller, each
172shader invocation will need to use atomic functions to update up to four
173footprint image texels for each LOD accessed.
174Having each active shader invocation perform multiple atomic operations can
175be expensive, particularly when neighboring invocations will want to update
176the same footprint image texels.
177
178Techniques can be used to reduce the number of atomic operations performed
179when accumulating coverage include:
180
181  * Have logic that detects returned footprints where all components of the
182    returned offset vector are zero.
183    In that case, the mask returned by the footprint function is guaranteed
184    to be aligned with the footprint image texels and affects only a single
185    footprint image texel.
186  * Have fragment shaders communicate using built-in functions from the
187    `VK_NV_shader_subgroup_partitioned` extension or other shader subgroup
188    extensions.
189    If you have multiple invocations in a subgroup that need to update the
190    same texel (x,y) in the footprint image, compute an aggregate footprint
191    mask across all invocations in the subgroup updating that texel and have
192    a single invocation perform an atomic operation using that aggregate
193    mask.
194  * When the returned footprint spans multiple texels in the footprint
195    image, each invocation need to perform four atomic operations.
196    In the previous issue, we had an example that computed separate masks
197    for "`topLeft`", "`topRight`", "`bottomLeft`", and "`bottomRight`".
198    When the invocations in a subgroup have good locality, it might be the
199    case the "`top left`" for some invocations might refer to footprint
200    image texel (10,10), while neighbors might have their "`top left`"
201    texels at (11,10), (10,11), and (11,11).
202    If you compute separate masks for even/odd x and y values instead of
203    left/right or top/bottom, the "`odd/odd`" mask for all invocations in
204    the subgroup hold coverage for footprint image texel (11,11), which can
205    be updated by a single atomic operation for the entire subgroup.
206
207=== Examples
208
209TBD
210
211=== Version History
212
213  * Revision 2, 2018-09-13 (Pat Brown)
214  ** Add issue (2) with performance tips.
215
216  * Revision 1, 2018-08-12 (Pat Brown)
217  ** Initial draft
218