1Name 2 3 QCOM_tiled_rendering 4 5Name Strings 6 7 GL_QCOM_tiled_rendering 8 9Contributors 10 11 Colin Sharp 12 Jeff Leger 13 14Contacts 15 16 Chuck Smith, Qualcomm (chucks 'at' qualcomm.com) 17 Maurice Ribble, Qualcomm (mribble 'at' qualcomm.com) 18 19Notice 20 21 Copyright Qualcomm 2009. 22 23IP Status 24 25 Qualcomm Proprietary. 26 27Status 28 29 Complete. 30 31Version 32 33 Last Modified Date: August 20, 2009 34 Revision: #1.6 35 36Number 37 38 OpenGL ES Extension #70 39 40Dependencies 41 42 OpenGL ES 1.0 or higher is required. 43 44 This extension interacts with QCOM_write_only_rendering. 45 46 This extension is written based on the wording of the OpenGL ES 2.0 47 specification. 48 49Overview 50 51 In the handheld graphics space, a typical challenge is achieving efficient 52 rendering performance given the different characteristics of the various 53 types of graphics memory. Some types of memory ("slow" memory) are less 54 expensive but have low bandwidth, higher latency, and/or higher power 55 consumption, while other types ("fast" memory) are more expensive but have 56 higher bandwidth, lower latency, and/or lower power consumption. In many 57 cases, it is more efficient for a graphics processing unit (GPU) to render 58 directly to fast memory, but at most common display resolutions it is not 59 practical for a device to contain enough fast memory to accommodate both the 60 full color and depth/stencil buffers (the frame buffer). In some devices, 61 this problem can be addressed by providing both types of memory; a large 62 amount of slow memory that is sufficient to store the entire frame buffer, 63 and a small, dedicated amount of fast memory that allows the GPU to render 64 with optimal performance. The challenge lies in finding a way for the GPU 65 to render to fast memory when it is not large enough to contain the actual 66 frame buffer. 67 68 One approach to solving this problem is to design the GPU and/or driver 69 using a tiled rendering architecture. With this approach the render target 70 is subdivided into a number of individual tiles, which are sized to fit 71 within the available amount of fast memory. Under normal operation, the 72 entire scene will be rendered to each individual tile using a multi-pass 73 technique, in which primitives that lie entirely outside of the tile being 74 rendered are trivially discarded. After each tile has been rendered, its 75 contents are saved out to the actual frame buffer in slow memory (a process 76 referred to as the "resolve"). The resolve introduces significant overhead, 77 both for the CPU and the GPU. However, even with this additional overhead, 78 rendering using this method is usually more efficient than rendering 79 directly to slow memory. 80 81 This extension allows the application to specify a rectangular tile 82 rendering area and have full control over the resolves for that area. The 83 information given to the driver through this API can be used to perform 84 various optimizations in the driver and hardware. One example optimization 85 is being able to reduce the size or number of the resolves. Another 86 optimization might be to reduce the number of passes needed in the tiling 87 approach mentioned above. Even traditional rendering GPUs that don't use 88 tiles may benefit from this extension depending on their implemention of 89 certain common GPU operations. 90 91 One typical use case could involve an application only rendering to select 92 portions of the render target using this technique (which shall be referred 93 to as "application tiling"), leaving all other portions of the render target 94 untouched. Therefore, in order to preserve the contents of the untouched 95 portions of the render target, the application must request an EGL (or other 96 context management API) configuration with a non-destructive swap. A 97 destructive swap may only be used safely if the application renders to the 98 entire area of the render target during each frame (otherwise the contents 99 of the untouched portions of the frame buffer will be undefined). 100 101 Additionally, care must be taken to avoid the cost of mixing rendering with 102 and without application tiling within a single frame. Rendering without 103 application tiling ("normal" rendering) is most efficient when all of the 104 rendering for the entire scene can be encompassed within a single resolve. 105 If any portions of the scene are rendered prior to that resolve (such as via 106 a prior resolve, or via application tiling), then that resolve becomes much 107 more heavyweight. When this occurs, prior to rendering each tile the fast 108 memory must be populated with the existing contents of the frame buffer 109 region corresponding to that tile. This operation can double the cost of 110 resolves, so it is recommended that applications avoid mixing application 111 tiling and normal rendering within a single frame. If both rendering 112 methods must be used in the same frame, then the most efficient approach is 113 to perform all normal rendering first, followed by rendering done with 114 application tiling. An implicit resolve will occur (if needed) at the start 115 of application tiling, so any pending normal rendering operations will be 116 flushed at the time application tiling is initiated. This extension 117 provides interfaces for the application to communicate to the driver whether 118 or not rendering done with application tiling depends on the existing 119 contents of the specified tile, and whether or not the rendered contents of 120 the specified tile need to be preserved upon completion. This mechanism can 121 be used to obtain optimal performance, e.g. when the application knows that 122 every pixel in a tile will be completely rendered or when the resulting 123 contents of the depth/stencil buffers do not need to be preserved. 124 125Issues 126 127 (1) How do Viewport and Scissor interact with this extension? 128 129 RESOLVED: They don't. When application tiling is used, the viewport and 130 scissor retain their existing values, relative to the render target, not the 131 specified tile. Therefore, all rendering commands issued between 132 StartTilingQCOM and EndTilingQCOM will be subject to the same scissor, and 133 will undergo the same viewport transformation, as normal rendering commands. 134 135 (2) How do Flush and Finish interact with this extension? 136 137 RESOLVED: When Flush or Finish is called while application tiling is 138 active, the behavior will be as if EndTilingQCOM was called, except that the 139 application tiling state will remain unchanged (meaning the active tile will 140 not be reset). This means that any pending rendering commands will be 141 performed to the active tile, and application tiling will continue to be 142 active for any following rendering commands. 143 144 (3) How does SwapBuffers interact with this extension? 145 146 RESOLVED: It doesn't. If SwapBuffers is called while application tiling is 147 active, the contents of the entire back buffer will be copied to the visible 148 window, ignoring the active tile. SwapBuffers will have no effect on the 149 application tiling state. 150 151 (4) What happens if the render target is changed while application tiling 152 is active? 153 154 RESOLVED: If the current render target is changed, either by binding a new 155 framebuffer object or changing the write surface of the active framebuffer 156 (either explicitly or by deleting the currently bound framebuffer or write 157 surface), an implicit EndTilingQCOM will occur. The active tile will be 158 reset and application tiling will be deactivated. This is necessary because 159 the active tile may not be valid for the new render target. 160 161 (5) Should this extension provide a query mechanism for determining things 162 such as tile offset, alignment, and size requirements so a developer 163 can intelligently choose tile regions? 164 165 RESOLVED: No. This information is very device-dependent and difficult to 166 present in an easily understood manner. Instead, this extension will let 167 developers specify an arbitrary rectangular tile region and all these 168 requirements, including subdividing the given tile into multiple tiles if 169 necessary, will be handled by the driver and hardware. 170 171 (6) Should this extension allow multiple tiles? 172 173 RESOLVED: No. While earlier versions of this extension allowed for this, 174 after support for arbitrary tile sizes was added the benefit of multiple 175 tiles became negligible. Allowing multiple tiles complicated the API and 176 made it much more difficult for traditional rendering and some tile-based 177 rendering GPUs to support this extension. 178 179 (7) Should multiple render targets be supported? They are not supported 180 by either the OpenGL ES core specification or any existing OpenGL ES 181 extensions. Support could be added with some new bitmasks for the 182 <preserveMask> parameter. Should this be added now, or deferred for 183 inclusion in any possible future MRT extension? 184 185 RESOLVED: Yes. It is not difficult to add now and doing it now makes 186 supporting MRTs in the future easier. 187 188New Procedures and Functions 189 190 void StartTilingQCOM(uint x, uint y, uint width, uint height, 191 bitfield preserveMask); 192 193 void EndTilingQCOM(bitfield preserveMask); 194 195New Tokens 196 197 Accepted by the <preserveMask> parameter of StartTilingQCOM and 198 EndTilingQCOM 199 200 GL_COLOR_BUFFER_BIT0_QCOM 0x00000001 201 GL_COLOR_BUFFER_BIT1_QCOM 0x00000002 202 GL_COLOR_BUFFER_BIT2_QCOM 0x00000004 203 GL_COLOR_BUFFER_BIT3_QCOM 0x00000008 204 GL_COLOR_BUFFER_BIT4_QCOM 0x00000010 205 GL_COLOR_BUFFER_BIT5_QCOM 0x00000020 206 GL_COLOR_BUFFER_BIT6_QCOM 0x00000040 207 GL_COLOR_BUFFER_BIT7_QCOM 0x00000080 208 GL_DEPTH_BUFFER_BIT0_QCOM 0x00000100 209 GL_DEPTH_BUFFER_BIT1_QCOM 0x00000200 210 GL_DEPTH_BUFFER_BIT2_QCOM 0x00000400 211 GL_DEPTH_BUFFER_BIT3_QCOM 0x00000800 212 GL_DEPTH_BUFFER_BIT4_QCOM 0x00001000 213 GL_DEPTH_BUFFER_BIT5_QCOM 0x00002000 214 GL_DEPTH_BUFFER_BIT6_QCOM 0x00004000 215 GL_DEPTH_BUFFER_BIT7_QCOM 0x00008000 216 GL_STENCIL_BUFFER_BIT0_QCOM 0x00010000 217 GL_STENCIL_BUFFER_BIT1_QCOM 0x00020000 218 GL_STENCIL_BUFFER_BIT2_QCOM 0x00040000 219 GL_STENCIL_BUFFER_BIT3_QCOM 0x00080000 220 GL_STENCIL_BUFFER_BIT4_QCOM 0x00100000 221 GL_STENCIL_BUFFER_BIT5_QCOM 0x00200000 222 GL_STENCIL_BUFFER_BIT6_QCOM 0x00400000 223 GL_STENCIL_BUFFER_BIT7_QCOM 0x00800000 224 GL_MULTISAMPLE_BUFFER_BIT0_QCOM 0x01000000 225 GL_MULTISAMPLE_BUFFER_BIT1_QCOM 0x02000000 226 GL_MULTISAMPLE_BUFFER_BIT2_QCOM 0x04000000 227 GL_MULTISAMPLE_BUFFER_BIT3_QCOM 0x08000000 228 GL_MULTISAMPLE_BUFFER_BIT4_QCOM 0x10000000 229 GL_MULTISAMPLE_BUFFER_BIT5_QCOM 0x20000000 230 GL_MULTISAMPLE_BUFFER_BIT6_QCOM 0x40000000 231 GL_MULTISAMPLE_BUFFER_BIT7_QCOM 0x80000000 232 233Additions to Chapter 2 of the OpenGL ES 2.0 Specification (OpenGL Operation) 234 235 Add a new section "Rendering with Application Tiling" after section 2.13: 236 237 "2.14 Rendering with Application Tiling 238 239 The application may specify an arbitrary rectangular region (a 'tile') to 240 which rendering commands should be restricted. 241 242 The command 243 244 void StartTilingQCOM(uint x, uint y, uint width, uint height, 245 bitfield preserveMask); 246 247 specifies the tile described by <x>, <y>, <width>, <height>. Until the next 248 call to EndTilingQCOM, all rendering commands (including clears) will only 249 update the contents of the render target defined by the extents of this 250 tile. The parameters <x> and <y> specify the screen-space origin of the 251 tile, and <width> and <height> specify the screen-space width and height of 252 the tile. The tile origin is located at the lower left corner of the tile. 253 If the size of the tile is too large for the fast memory on the device then 254 it will be internally subdivided into multiple tiles. The parameter 255 <preserveMask> is the bitwise OR of a number of values indicating which 256 buffers need to be initialized with the existing contents of the frame 257 buffer region corresponding to the specified tile prior to rendering, or the 258 single value NONE. The values allowed are COLOR_BUFFER_BIT*_QCOM, 259 DEPTH_BUFFER_BIT*_QCOM, STENCIL_BUFFER_BIT*_QCOM, and 260 MULTISAMPLE_BUFFER_BIT*_QCOM. These indicate the color buffer, the depth 261 buffer, the stencil buffer, and a multisample buffer modifier, respectively. 262 The multisample bits are different since they modify the meaning of the 263 color, depth, and stencil bits if the active surface is a multisample 264 surface. If a multisample bit is set then the corresponding color, depth, 265 and/or stencil bit will cause all the samples to be copied across the memory 266 bus in devices that are using fast tiled memory, but if the multisample bit 267 is not set then only a single resolved sample is copied across the bus. In 268 practice, not setting the multisample bit when rendering to a multisample 269 buffer can greatly improve performance, but could cause small rendering 270 artifacts in some multiple-pass rendering algorithms. The 0-7 number is to 271 specify which render target is being used. If multiple render targets are 272 not being used then 0 should be specified. Any buffers specifed in 273 <preserveMask> that do not exist in the current rendering state will be 274 silently ignored (simlilar to the behavior of Clear). If NONE is specified, 275 then no buffers will be initialized. For any buffers not initialized in 276 this manner, the initial contents will be undefined. 277 278 The values of <x>, <y>, <width> and <height> are silently clamped to the 279 extents of the render target. 280 281 The command 282 283 void EndTilingQCOM(bitfield preserveMask); 284 285 notifies the driver that the application has completed all desired rendering 286 to the tile specified by StartTilingQCOM. This allows the driver to flush 287 the contents of the specified tile to the corresponding region of the render 288 target, and disables application tiling (resuming normal rendering). The 289 parameter <preserveMask> is specified using the same values as the 290 equivalent argument of StartTilingQCOM, but indicates which buffers need to 291 be preserved upon completion of all rendering commands issued with 292 application tiling. For any buffers not preserved in this manner, the 293 resulting contents of the buffer regions corresponding to the active tile 294 will be undefined. 295 296GLX Protocol 297 298 None. 299 300Errors 301 302 INVALID_OPERATION error is generated if StartTilingQCOM is called while 303 WRITEONLY_RENDERING_QCOM is enabled or the current framebuffer is not 304 framebuffer complete 305 306 INVALID_OPERATION error is generated if EndTilingQCOM is called without a 307 corresponding call to StartTilingQCOM 308 309 INVALID_OPERATION error is generated if StartTilingQCOM is called after 310 calling StartTilingQCOM without a corresponding call to EndTilingQCOM 311 312 INVALID_OPERATION error is generated if Enable(WRITEONLY_RENDERING_QCOM) 313 is called between StartTilingQCOM and EndTilingQCOM 314 315New State 316 317 None. 318 319Sample Usage 320 321 GLboolean renderTiledTriangle(GLuint x, GLuint y, GLuint width, GLuint height) 322 { 323 // set the active tile and initialize the color and depth buffers with 324 // the existing contents 325 glStartTilingQCOM(x, y, width, height, 326 GL_COLOR_BUFFER_BIT0_QCOM | GL_DEPTH_BUFFER_BIT0_QCOM); 327 328 // draw the triangle 329 glDrawArrays(GL_TRIANGLES, 0, 3); 330 331 // finished with this tile -- preserve the color buffer 332 glEndTilingQCOM(GL_COLOR_BUFFER_BIT0_QCOM); 333 334 // return success 335 return GL_TRUE; 336 } 337 338Revision History 339 340 #09 08/20/2009 Chuck Smith Cosmetic changes 341 #08 08/19/2009 Maurice Ribble Add support for multiple render targets 342 #07 07/28/2009 Maurice Ribble Clean up spec 343 Remove multiple tile support 344 #06 07/23/2009 Maurice Ribble Updated overview to match latest spec 345 #05 07/15/2009 Maurice Ribble Changed from spec to subdivide tiles 346 instead of returning out of memory 347 #04 07/06/2009 Maurice Ribble Update due to the AMD->Qualcomm move; 348 general extension cleanup. 349 #03 11/17/2008 Chuck Smith Clarified the results of EndTilingQCOM 350 for unpreserved buffers. 351 #02 11/10/2008 Chuck Smith Updates to clarify behavior; additions 352 to the Issues section. 353 #01 11/04/2008 Chuck Smith First draft. 354