Name QCOM_motion_estimation Name Strings GL_QCOM_motion_estimation Contributors Jonathan Wicks Sam Holmes Jeff Leger Contacts Jeff Leger Status Complete Version Last Modified Date: March 19, 2020 Revision: 1.0 Number OpenGL ES Extension #326 Dependencies Requires OpenGL ES 2.0 This extension is written against the OpenGL ES 3.2 Specification. This extension interacts with OES_EGL_image_external. Overview Motion estimation, also referred to as optical flow, is the process of producing motion vectors that convey the 2D transformation from a reference image to a target image. There are various uses of motion estimation, such as frame extrapolation, compression, object tracking, etc. This extension adds support for motion estimation in OpenGL ES by adding functions which take the reference and target images and populate an output texture containing the corresponding motion vectors. New Procedures and Functions void TexEstimateMotionQCOM(uint ref, uint target, uint output); void TexEstimateMotionRegionsQCOM(uint ref, uint target, uint output, uint mask); New Tokens Accepted by the parameter of GetIntegerv, GetInteger64v, and GetFloatv: MOTION_ESTIMATION_SEARCH_BLOCK_X_QCOM 0x8C90 MOTION_ESTIMATION_SEARCH_BLOCK_Y_QCOM 0x8C91 Additions to the OpenGL ES 3.2 Specification Add two new rows in Table 21.40 "Implementation Dependent Values" Get Value Type Get Command Minimum Value Description Sec --------- ---- ----------- ------------- ----------- ------ MOTION_ESTIMATION_SEARCH_BLOCK_X_QCOM Z+ GetIntegerv 1 The block size in X 8.19 MOTION_ESTIMATION_SEARCH_BLOCK_Y_QCOM Z+ GetIntegerv 1 The block size in Y 8.19 Additions to Chapter 8 of the OpenGL ES 3.2 Specification The commands void TexEstimateMotionQCOM(uint ref, uint target, uint output); void TexEstimateMotionRegionsQCOM(uint ref, uint target, uint output, uint mask); are called to perfom the motion estimation based on the contents of the two input textures, and . The results of the motion estimation are stored in the texture. The and must be either be GL_R8 2D textures, or backed by EGLImages where the underlying format contain a luminance plane. The and dimension must be identical and must be an exact multiple of the search block size. While and can have multiple levels, the implementation only reads from the base level. The resulting motion vectors are stored in a 2D texture of the format GL_RGBA16F, ready to be used by other application shaders and stages. While can have multiple levels, the implementation only writes to the base level. The dimensions must be set as follows so that it can hold one vector per search block: output.width = ref.width / MOTION_ESTIMATION_SEARCH_BLOCK_X_QCOM output.height = ref.height / MOTION_ESTIMATION_SEARCH_BLOCK_Y_QCOM Each texel in the texture represents the estimated motion in pixels, for the supported search block size, from the texture to the target texture. Implementations may generate sub-pixel motion vectors, in which case the returned vector components may have fractional values. The motion vector X and Y components are provided in the R and G channels respectively. The B and A components are currently undefined and left for future expansion. If no motion is detected for a block, or if the texture indicates that the block should be skipped, then the R and G channels will be set to zero, indicating no motion. The texture is used to control the region-of-interest which can help to reduce the overall workload. The texture dimensions must exactly match that of the texture and the format must be GL_R8UI. While can have multiple levels, the implementation only reads from the base level. For any texel with a value of 0 in the motion estimation will not be performed for the corresponding block. Any non-zero texel value will produce a motion vector result in the result. The only controls the vector basepoint. Therefore it is possible for an unmasked block to produce a vector that lands in the masked block. Errors INVALID_OPERATION is generated if any of the textures passed in are invalid INVALID_OPERATION is generated if the texture types are not TEXTURE_2D or TEXTURE_EXTERNAL_OES INVALID_OPERATION is generated if is not of the format GL_R8, or when backed by an EGLImage, when the underlying internal format does not contain a luminance plane. INVALID_OPERATION is generated if is not of the format GL_R8, or when backed by an EGLImage, when the underlying internal format does not contain a luminance plane. INVALID_OPERATION is generated if the and textures do not have identical dimensions INVALID_OPERATION is generated if the texture is not of the format GL_RGBA16F INVALID_OPERATION is generated if the texture is not of the format GL_R8UI INVALID_OPERATION is generated if the or dimensions are not ref.[width/height] / MOTION_ESTIMATION_SEARCH_BLOCK_[X,Y]_QCOM Interactions with OES_EGL_image_external If OES_EGL_image_external is supported, then the and/or parameters to TexEstimateMotionQCOM and TexEstimateMotionRegionsQCOM may be backed by an EGLImage. Issues (1) What should the pixel data of the input textures and contain? Resolved: Motion estimation tracks the brightness across the input textures. To produce the best results, it is recommended that the texels in the and textures represent some measure of the luminance/luma. OpenGL ES does not currently expose a Y8 or Y plane only format, so GL_R8 can be used. Alternatively, a texture backed by and EGLImage, which has an underlying format where luminance is contained in a separate plane, can also be used. If starting with an RGBA8 texture one way to convert it to GL_R8 would be to perform a copy and use code such as the following: fragColor = rgb_2_yuv(texture(tex, texcoord).rgb, itu_601_full_range).r;\n" (2) Why use GL_RGBA16F instead of GL_RG16F for storing the motion vector output? Resolved: While only the R and G channels are currently used, it was decided to use a format with more channels for future expansion. A floating point format was chosen to support implementations with sub-pixel precision without enforcing any particular precision requirements other than what can be represented in a 16-bit floating point number. (3) Why is the motion estimation quality not defined? Resolved: The intention of this specification is to estimate the motion between the two input textures. Implementations should aim to produce the highest quality estimations but since the results are estimations there are no prescribed steps for how the vectors must be generated. Revision History Rev. Date Author Changes ---- ---------- -------- ----------------------------------------- 1.0 03/19/2020 Jonathan Wicks Initial public version