• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1// Copyright (c) 2018-2020 NVIDIA Corporation
2//
3// SPDX-License-Identifier: CC-BY-4.0
4
5include::{generated}/meta/{refprefix}VK_NV_shader_image_footprint.txt[]
6
7=== Other Extension Metadata
8
9*Last Modified Date*::
10    2018-09-13
11*IP Status*::
12    No known IP claims.
13*Interactions and External Dependencies*::
14  - This extension requires
15    {spirv}/NV/SPV_NV_shader_image_footprint.html[`SPV_NV_shader_image_footprint`]
16  - This extension provides API support for
17    https://github.com/KhronosGroup/GLSL/blob/master/extensions/nv/GLSL_NV_shader_texture_footprint.txt[`GL_NV_shader_texture_footprint`]
18*Contributors*::
19  - Pat Brown, NVIDIA
20  - Chris Lentini, NVIDIA
21  - Daniel Koch, NVIDIA
22  - Jeff Bolz, NVIDIA
23
24=== Description
25
26This extension adds Vulkan support for the
27{spirv}/NV/SPV_NV_shader_image_footprint.html[`SPV_NV_shader_image_footprint`]
28SPIR-V extension.
29That SPIR-V extension provides a new instruction
30code:OpImageSampleFootprintNV allowing shaders to determine the set of
31texels that would be accessed by an equivalent filtered texture lookup.
32
33Instead of returning a filtered texture value, the instruction returns a
34structure that can be interpreted by shader code to determine the footprint
35of a filtered texture lookup.
36This structure includes integer values that identify a small neighborhood of
37texels in the image being accessed and a bitfield that indicates which
38texels in that neighborhood would be used.
39The structure also includes a bitfield where each bit identifies whether any
40texel in a small aligned block of texels would be fetched by the texture
41lookup.
42The size of each block is specified by an access _granularity_ provided by
43the shader.
44The minimum granularity supported by this extension is 2x2 (for 2D textures)
45and 2x2x2 (for 3D textures); the maximum granularity is 256x256 (for 2D
46textures) or 64x32x32 (for 3D textures).
47Each footprint query returns the footprint from a single texture level.
48When using minification filters that combine accesses from multiple mipmap
49levels, shaders must perform separate queries for the two levels accessed
50("`fine`" and "`coarse`").
51The footprint query also returns a flag indicating if the texture lookup
52would access texels from only one mipmap level or from two neighboring
53levels.
54
55This extension should be useful for multi-pass rendering operations that do
56an initial expensive rendering pass to produce a first image that is then
57used as a texture for a second pass.
58If the second pass ends up accessing only portions of the first image (e.g.,
59due to visbility), the work spent rendering the non-accessed portion of the
60first image was wasted.
61With this feature, an application can limit this waste using an initial pass
62over the geometry in the second image that performs a footprint query for
63each visible pixel to determine the set of pixels that it needs from the
64first image.
65This pass would accumulate an aggregate footprint of all visible pixels into
66a separate "`footprint image`" using shader atomics.
67Then, when rendering the first image, the application can kill all shading
68work for pixels not in this aggregate footprint.
69
70This extension has a number of limitations.
71The code:OpImageSampleFootprintNV instruction only supports for two- and
72three-dimensional textures.
73Footprint evaluation only supports the CLAMP_TO_EDGE wrap mode; results are
74undefined: for all other wrap modes.
75Only a limited set of granularity values and that set does not support
76separate coverage information for each texel in the original image.
77
78When using SPIR-V generated from the OpenGL Shading Language, the new
79instruction will be generated from code using the new
80code:textureFootprint*NV built-in functions from the
81`GL_NV_shader_texture_footprint` shading language extension.
82
83include::{generated}/interfaces/VK_NV_shader_image_footprint.txt[]
84
85=== New SPIR-V Capability
86
87  * <<spirvenv-capabilities-table-ImageFootprintNV,ImageFootprintNV>>
88
89=== Issues
90
91(1) The footprint returned by the SPIR-V instruction is a structure that
92    includes an anchor, an offset, and a mask that represents a 8x8 or 4x4x4
93    neighborhood of texel groups.
94    But the bits of the mask are not stored in simple pitch order.
95    Why is the footprint built this way?
96
97*RESOLVED*: We expect that applications using this feature will want to use
98a fixed granularity and accumulate coverage information from the returned
99footprints into an aggregate "`footprint image`" that tracks the portions of
100an image that would be needed by regular texture filtering.
101If an application is using a two-dimensional image with 4x4 pixel
102granularity, we expect that the footprint image will use 64-bit texels where
103each bit in an 8x8 array of bits corresponds to coverage for a 4x4 block in
104the original image.
105Texel (0,0) in the footprint image would correspond to texels (0,0) through
106(31,31) in the original image.
107
108In the usual case, the footprint for a single access will fully contained in
109a 32x32 aligned region of the original texture, which corresponds to a
110single 64-bit texel in the footprint image.
111In that case, the implementation will return an anchor coordinate pointing
112at the single footprint image texel, an offset vector of (0,0), and a mask
113whose bits are aligned with the bits in the footprint texel.
114For this case, the shader can simply atomically OR the mask bits into the
115contents of the footprint texel to accumulate footprint coverage.
116
117In the worst case, the footprint for a single access spans multiple 32x32
118aligned regions and may require updates to four separate footprint image
119texels.
120In this case, the implementation will return an anchor coordinate pointing
121at the lower right footprint image texel and an offset will identify how
122many "`columns`" and "`rows`" of the returned 8x8 mask correspond to
123footprint texels to the left and above the anchor texel.
124If the anchor is (2,3), the 64 bits of the returned mask are arranged
125spatially as follows, where each 4x4 block is assigned a bit number that
126matches its bit number in the footprint image texels:
127
128----
129    +-------------------------+-------------------------+
130    | -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
131    | -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
132    | -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
133    | -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
134    | -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
135    | -- -- -- -- -- -- 46 47 | 40 41 42 43 44 45 -- -- |
136    | -- -- -- -- -- -- 54 55 | 48 49 50 51 52 53 -- -- |
137    | -- -- -- -- -- -- 62 63 | 56 57 58 59 60 61 -- -- |
138    +-------------------------+-------------------------+
139    | -- -- -- -- -- -- 06 07 | 00 01 02 03 04 05 -- -- |
140    | -- -- -- -- -- -- 14 15 | 08 09 10 11 12 13 -- -- |
141    | -- -- -- -- -- -- 22 23 | 16 17 18 19 20 21 -- -- |
142    | -- -- -- -- -- -- 30 31 | 24 25 26 27 28 29 -- -- |
143    | -- -- -- -- -- -- 38 39 | 32 33 34 35 36 37 -- -- |
144    | -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
145    | -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
146    | -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
147    +-------------------------+-------------------------+
148----
149
150To accumulate coverage for each of the four footprint image texels, a shader
151can AND the returned mask with simple masks derived from the x and y offset
152values and then atomically OR the updated mask bits into the contents of the
153corresponding footprint texel.
154
155[source,c++]
156----
157    uint64_t returnedMask = (uint64_t(footprint.mask.x) | (uint64_t(footprint.mask.y) << 32));
158    uint64_t rightMask    = ((0xFF >> footprint.offset.x) * 0x0101010101010101UL);
159    uint64_t bottomMask   = 0xFFFFFFFFFFFFFFFFUL >> (8 * footprint.offset.y);
160    uint64_t bottomRight  = returnedMask & bottomMask & rightMask;
161    uint64_t bottomLeft   = returnedMask & bottomMask & (~rightMask);
162    uint64_t topRight     = returnedMask & (~bottomMask) & rightMask;
163    uint64_t topLeft      = returnedMask & (~bottomMask) & (~rightMask);
164----
165
166(2) What should an application do to ensure maximum performance when
167accumulating footprints into an aggregate footprint image?
168
169*RESOLVED*: We expect that the most common usage of this feature will be to
170accumulate aggregate footprint coverage, as described in the previous issue.
171Even if you ignore the anisotropic filtering case where the implementation
172may return a granularity larger than that requested by the caller, each
173shader invocation will need to use atomic functions to update up to four
174footprint image texels for each level of detail accessed.
175Having each active shader invocation perform multiple atomic operations can
176be expensive, particularly when neighboring invocations will want to update
177the same footprint image texels.
178
179Techniques can be used to reduce the number of atomic operations performed
180when accumulating coverage include:
181
182  * Have logic that detects returned footprints where all components of the
183    returned offset vector are zero.
184    In that case, the mask returned by the footprint function is guaranteed
185    to be aligned with the footprint image texels and affects only a single
186    footprint image texel.
187  * Have fragment shaders communicate using built-in functions from the
188    `VK_NV_shader_subgroup_partitioned` extension or other shader subgroup
189    extensions.
190    If you have multiple invocations in a subgroup that need to update the
191    same texel (x,y) in the footprint image, compute an aggregate footprint
192    mask across all invocations in the subgroup updating that texel and have
193    a single invocation perform an atomic operation using that aggregate
194    mask.
195  * When the returned footprint spans multiple texels in the footprint
196    image, each invocation need to perform four atomic operations.
197    In the previous issue, we had an example that computed separate masks
198    for "`topLeft`", "`topRight`", "`bottomLeft`", and "`bottomRight`".
199    When the invocations in a subgroup have good locality, it might be the
200    case the "`top left`" for some invocations might refer to footprint
201    image texel (10,10), while neighbors might have their "`top left`"
202    texels at (11,10), (10,11), and (11,11).
203    If you compute separate masks for even/odd x and y values instead of
204    left/right or top/bottom, the "`odd/odd`" mask for all invocations in
205    the subgroup hold coverage for footprint image texel (11,11), which can
206    be updated by a single atomic operation for the entire subgroup.
207
208=== Examples
209
210TBD
211
212=== Version History
213
214 * Revision 2, 2018-09-13 (Pat Brown)
215   - Add issue (2) with performance tips.
216
217 * Revision 1, 2018-08-12 (Pat Brown)
218   - Initial draft
219