• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Name
2
3    QCOM_tiled_rendering
4
5Name Strings
6
7    GL_QCOM_tiled_rendering
8
9Contributors
10
11    Colin Sharp
12    Jeff Leger
13
14Contacts
15
16    Chuck Smith, Qualcomm (chucks 'at' qualcomm.com)
17    Maurice Ribble, Qualcomm (mribble 'at' qualcomm.com)
18
19Notice
20
21    Copyright Qualcomm 2009.
22
23IP Status
24
25    Qualcomm Proprietary.
26
27Status
28
29    Complete.
30
31Version
32
33    Last Modified Date: August 20, 2009
34    Revision: #1.6
35
36Number
37
38    OpenGL ES Extension #70
39
40Dependencies
41
42    OpenGL ES 1.0 or higher is required.
43
44    This extension interacts with QCOM_write_only_rendering.
45
46    This extension is written based on the wording of the OpenGL ES 2.0
47    specification.
48
49Overview
50
51    In the handheld graphics space, a typical challenge is achieving efficient
52    rendering performance given the different characteristics of the various
53    types of graphics memory.  Some types of memory ("slow" memory) are less
54    expensive but have low bandwidth, higher latency, and/or higher power
55    consumption, while other types ("fast" memory) are more expensive but have
56    higher bandwidth, lower latency, and/or lower power consumption.  In many
57    cases, it is more efficient for a graphics processing unit (GPU) to render
58    directly to fast memory, but at most common display resolutions it is not
59    practical for a device to contain enough fast memory to accommodate both the
60    full color and depth/stencil buffers (the frame buffer).  In some devices,
61    this problem can be addressed by providing both types of memory; a large
62    amount of slow memory that is sufficient to store the entire frame buffer,
63    and a small, dedicated amount of fast memory that allows the GPU to render
64    with optimal performance.  The challenge lies in finding a way for the GPU
65    to render to fast memory when it is not large enough to contain the actual
66    frame buffer.
67
68    One approach to solving this problem is to design the GPU and/or driver
69    using a tiled rendering architecture.  With this approach the render target
70    is subdivided into a number of individual tiles, which are sized to fit
71    within the available amount of fast memory.  Under normal operation, the
72    entire scene will be rendered to each individual tile using a multi-pass
73    technique, in which primitives that lie entirely outside of the tile being
74    rendered are trivially discarded.  After each tile has been rendered, its
75    contents are saved out to the actual frame buffer in slow memory (a process
76    referred to as the "resolve").  The resolve introduces significant overhead,
77    both for the CPU and the GPU.  However, even with this additional overhead,
78    rendering using this method is usually more efficient than rendering
79    directly to slow memory.
80
81    This extension allows the application to specify a rectangular tile
82    rendering area and have full control over the resolves for that area.  The
83    information given to the driver through this API can be used to perform
84    various optimizations in the driver and hardware.  One example optimization
85    is being able to reduce the size or number of the resolves.  Another
86    optimization might be to reduce the number of passes needed in the tiling
87    approach mentioned above.  Even traditional rendering GPUs that don't use
88    tiles may benefit from this extension depending on their implemention of
89    certain common GPU operations.
90
91    One typical use case could involve an application only rendering to select
92    portions of the render target using this technique (which shall be referred
93    to as "application tiling"), leaving all other portions of the render target
94    untouched.  Therefore, in order to preserve the contents of the untouched
95    portions of the render target, the application must request an EGL (or other
96    context management API) configuration with a non-destructive swap. A
97    destructive swap may only be used safely if the application renders to the
98    entire area of the render target during each frame (otherwise the contents
99    of the untouched portions of the frame buffer will be undefined).
100
101    Additionally, care must be taken to avoid the cost of mixing rendering with
102    and without application tiling within a single frame.  Rendering without
103    application tiling ("normal" rendering) is most efficient when all of the
104    rendering for the entire scene can be encompassed within a single resolve.
105    If any portions of the scene are rendered prior to that resolve (such as via
106    a prior resolve, or via application tiling), then that resolve becomes much
107    more heavyweight.  When this occurs, prior to rendering each tile the fast
108    memory must be populated with the existing contents of the frame buffer
109    region corresponding to that tile.  This operation can double the cost of
110    resolves, so it is recommended that applications avoid mixing application
111    tiling and normal rendering within a single frame.  If both rendering
112    methods must be used in the same frame, then the most efficient approach is
113    to perform all normal rendering first, followed by rendering done with
114    application tiling.  An implicit resolve will occur (if needed) at the start
115    of application tiling, so any pending normal rendering operations will be
116    flushed at the time application tiling is initiated.  This extension
117    provides interfaces for the application to communicate to the driver whether
118    or not rendering done with application tiling depends on the existing
119    contents of the specified tile, and whether or not the rendered contents of
120    the specified tile need to be preserved upon completion.  This mechanism can
121    be used to obtain optimal performance, e.g. when the application knows that
122    every pixel in a tile will be completely rendered or when the resulting
123    contents of the depth/stencil buffers do not need to be preserved.
124
125Issues
126
127    (1)  How do Viewport and Scissor interact with this extension?
128
129    RESOLVED:  They don't.  When application tiling is used, the viewport and
130    scissor retain their existing values, relative to the render target, not the
131    specified tile.  Therefore, all rendering commands issued between
132    StartTilingQCOM and EndTilingQCOM will be subject to the same scissor, and
133    will undergo the same viewport transformation, as normal rendering commands.
134
135    (2)  How do Flush and Finish interact with this extension?
136
137    RESOLVED:  When Flush or Finish is called while application tiling is
138    active, the behavior will be as if EndTilingQCOM was called, except that the
139    application tiling state will remain unchanged (meaning the active tile will
140    not be reset).  This means that any pending rendering commands will be
141    performed to the active tile, and application tiling will continue to be
142    active for any following rendering commands.
143
144    (3)  How does SwapBuffers interact with this extension?
145
146    RESOLVED:  It doesn't.  If SwapBuffers is called while application tiling is
147    active, the contents of the entire back buffer will be copied to the visible
148    window, ignoring the active tile.  SwapBuffers will have no effect on the
149    application tiling state.
150
151    (4)  What happens if the render target is changed while application tiling
152         is active?
153
154    RESOLVED:  If the current render target is changed, either by binding a new
155    framebuffer object or changing the write surface of the active framebuffer
156    (either explicitly or by deleting the currently bound framebuffer or write
157    surface), an implicit EndTilingQCOM will occur.  The active tile will be
158    reset and application tiling will be deactivated.  This is necessary because
159    the active tile may not be valid for the new render target.
160
161    (5)  Should this extension provide a query mechanism for determining things
162         such as tile offset, alignment, and size requirements so a developer
163         can intelligently choose tile regions?
164
165    RESOLVED:  No.  This information is very device-dependent and difficult to
166    present in an easily understood manner.  Instead, this extension will let
167    developers specify an arbitrary rectangular tile region and all these
168    requirements, including subdividing the given tile into multiple tiles if
169    necessary, will be handled by the driver and hardware.
170
171    (6)  Should this extension allow multiple tiles?
172
173    RESOLVED:  No.  While earlier versions of this extension allowed for this,
174    after support for arbitrary tile sizes was added the benefit of multiple
175    tiles became negligible.  Allowing multiple tiles complicated the API and
176    made it much more difficult for traditional rendering and some tile-based
177    rendering GPUs to support this extension.
178
179    (7)  Should multiple render targets be supported?  They are not supported
180         by either the OpenGL ES core specification or any existing OpenGL ES
181         extensions.  Support could be added with some new bitmasks for the
182         <preserveMask> parameter.  Should this be added now, or deferred for
183         inclusion in any possible future MRT extension?
184
185    RESOLVED:  Yes.  It is not difficult to add now and doing it now makes
186    supporting MRTs in the future easier.
187
188New Procedures and Functions
189
190    void StartTilingQCOM(uint x, uint y, uint width, uint height,
191                         bitfield preserveMask);
192
193    void EndTilingQCOM(bitfield preserveMask);
194
195New Tokens
196
197    Accepted by the <preserveMask> parameter of StartTilingQCOM and
198    EndTilingQCOM
199
200        GL_COLOR_BUFFER_BIT0_QCOM                     0x00000001
201        GL_COLOR_BUFFER_BIT1_QCOM                     0x00000002
202        GL_COLOR_BUFFER_BIT2_QCOM                     0x00000004
203        GL_COLOR_BUFFER_BIT3_QCOM                     0x00000008
204        GL_COLOR_BUFFER_BIT4_QCOM                     0x00000010
205        GL_COLOR_BUFFER_BIT5_QCOM                     0x00000020
206        GL_COLOR_BUFFER_BIT6_QCOM                     0x00000040
207        GL_COLOR_BUFFER_BIT7_QCOM                     0x00000080
208        GL_DEPTH_BUFFER_BIT0_QCOM                     0x00000100
209        GL_DEPTH_BUFFER_BIT1_QCOM                     0x00000200
210        GL_DEPTH_BUFFER_BIT2_QCOM                     0x00000400
211        GL_DEPTH_BUFFER_BIT3_QCOM                     0x00000800
212        GL_DEPTH_BUFFER_BIT4_QCOM                     0x00001000
213        GL_DEPTH_BUFFER_BIT5_QCOM                     0x00002000
214        GL_DEPTH_BUFFER_BIT6_QCOM                     0x00004000
215        GL_DEPTH_BUFFER_BIT7_QCOM                     0x00008000
216        GL_STENCIL_BUFFER_BIT0_QCOM                   0x00010000
217        GL_STENCIL_BUFFER_BIT1_QCOM                   0x00020000
218        GL_STENCIL_BUFFER_BIT2_QCOM                   0x00040000
219        GL_STENCIL_BUFFER_BIT3_QCOM                   0x00080000
220        GL_STENCIL_BUFFER_BIT4_QCOM                   0x00100000
221        GL_STENCIL_BUFFER_BIT5_QCOM                   0x00200000
222        GL_STENCIL_BUFFER_BIT6_QCOM                   0x00400000
223        GL_STENCIL_BUFFER_BIT7_QCOM                   0x00800000
224        GL_MULTISAMPLE_BUFFER_BIT0_QCOM               0x01000000
225        GL_MULTISAMPLE_BUFFER_BIT1_QCOM               0x02000000
226        GL_MULTISAMPLE_BUFFER_BIT2_QCOM               0x04000000
227        GL_MULTISAMPLE_BUFFER_BIT3_QCOM               0x08000000
228        GL_MULTISAMPLE_BUFFER_BIT4_QCOM               0x10000000
229        GL_MULTISAMPLE_BUFFER_BIT5_QCOM               0x20000000
230        GL_MULTISAMPLE_BUFFER_BIT6_QCOM               0x40000000
231        GL_MULTISAMPLE_BUFFER_BIT7_QCOM               0x80000000
232
233Additions to Chapter 2 of the OpenGL ES 2.0 Specification (OpenGL Operation)
234
235    Add a new section "Rendering with Application Tiling" after section 2.13:
236
237    "2.14 Rendering with Application Tiling
238
239    The application may specify an arbitrary rectangular region (a 'tile') to
240    which rendering commands should be restricted.
241
242    The command
243
244        void StartTilingQCOM(uint x, uint y, uint width, uint height,
245                             bitfield preserveMask);
246
247    specifies the tile described by <x>, <y>, <width>, <height>.  Until the next
248    call to EndTilingQCOM, all rendering commands (including clears) will only
249    update the contents of the render target defined by the extents of this
250    tile.  The parameters <x> and <y> specify the screen-space origin of the
251    tile, and <width> and <height> specify the screen-space width and height of
252    the tile.  The tile origin is located at the lower left corner of the tile.
253    If the size of the tile is too large for the fast memory on the device then
254    it will be internally subdivided into multiple tiles.  The parameter
255    <preserveMask> is the bitwise OR of a number of values indicating which
256    buffers need to be initialized with the existing contents of the frame
257    buffer region corresponding to the specified tile prior to rendering, or the
258    single value NONE.  The values allowed are COLOR_BUFFER_BIT*_QCOM,
259    DEPTH_BUFFER_BIT*_QCOM, STENCIL_BUFFER_BIT*_QCOM, and
260    MULTISAMPLE_BUFFER_BIT*_QCOM.  These indicate the color buffer, the depth
261    buffer, the stencil buffer, and a multisample buffer modifier, respectively.
262    The multisample bits are different since they modify the meaning of the
263    color, depth, and stencil bits if the active surface is a multisample
264    surface.  If a multisample bit is set then the corresponding color, depth,
265    and/or stencil bit will cause all the samples to be copied across the memory
266    bus in devices that are using fast tiled memory, but if the multisample bit
267    is not set then only a single resolved sample is copied across the bus.  In
268    practice, not setting the multisample bit when rendering to a multisample
269    buffer can greatly improve performance, but could cause small rendering
270    artifacts in some multiple-pass rendering algorithms.  The 0-7 number is to
271    specify which render target is being used.  If multiple render targets are
272    not being used then 0 should be specified.  Any buffers specifed in
273    <preserveMask> that do not exist in the current rendering state will be
274    silently ignored (simlilar to the behavior of Clear).  If NONE is specified,
275    then no buffers will be initialized.  For any buffers not initialized in
276    this manner, the initial contents will be undefined.
277
278    The values of <x>, <y>, <width> and <height> are silently clamped to the
279    extents of the render target.
280
281    The command
282
283        void EndTilingQCOM(bitfield preserveMask);
284
285    notifies the driver that the application has completed all desired rendering
286    to the tile specified by StartTilingQCOM.  This allows the driver to flush
287    the contents of the specified tile to the corresponding region of the render
288    target, and disables application tiling (resuming normal rendering).  The
289    parameter <preserveMask> is specified using the same values as the
290    equivalent argument of StartTilingQCOM, but indicates which buffers need to
291    be preserved upon completion of all rendering commands issued with
292    application tiling.  For any buffers not preserved in this manner, the
293    resulting contents of the buffer regions corresponding to the active tile
294    will be undefined.
295
296GLX Protocol
297
298    None.
299
300Errors
301
302    INVALID_OPERATION error is generated if StartTilingQCOM is called while
303    WRITEONLY_RENDERING_QCOM is enabled or the current framebuffer is not
304    framebuffer complete
305
306    INVALID_OPERATION error is generated if EndTilingQCOM is called without a
307    corresponding call to StartTilingQCOM
308
309    INVALID_OPERATION error is generated if StartTilingQCOM is called after
310    calling StartTilingQCOM without a corresponding call to EndTilingQCOM
311
312    INVALID_OPERATION error is generated if Enable(WRITEONLY_RENDERING_QCOM)
313    is called between StartTilingQCOM and EndTilingQCOM
314
315New State
316
317    None.
318
319Sample Usage
320
321    GLboolean renderTiledTriangle(GLuint x, GLuint y, GLuint width, GLuint height)
322    {
323        // set the active tile and initialize the color and depth buffers with
324        // the existing contents
325        glStartTilingQCOM(x, y, width, height,
326                          GL_COLOR_BUFFER_BIT0_QCOM | GL_DEPTH_BUFFER_BIT0_QCOM);
327
328        // draw the triangle
329        glDrawArrays(GL_TRIANGLES, 0, 3);
330
331        // finished with this tile -- preserve the color buffer
332        glEndTilingQCOM(GL_COLOR_BUFFER_BIT0_QCOM);
333
334        // return success
335        return GL_TRUE;
336    }
337
338Revision History
339
340    #09    08/20/2009    Chuck Smith     Cosmetic changes
341    #08    08/19/2009    Maurice Ribble  Add support for multiple render targets
342    #07    07/28/2009    Maurice Ribble  Clean up spec
343                                         Remove multiple tile support
344    #06    07/23/2009    Maurice Ribble  Updated overview to match latest spec
345    #05    07/15/2009    Maurice Ribble  Changed from spec to subdivide tiles
346                                         instead of returning out of memory
347    #04    07/06/2009    Maurice Ribble  Update due to the AMD->Qualcomm move;
348                                         general extension cleanup.
349    #03    11/17/2008    Chuck Smith     Clarified the results of EndTilingQCOM
350                                         for unpreserved buffers.
351    #02    11/10/2008    Chuck Smith     Updates to clarify behavior; additions
352                                         to the Issues section.
353    #01    11/04/2008    Chuck Smith     First draft.
354