1# Temporally Scalable Video Coding 2 3<!--Kit: AVCodec Kit--> 4<!--Subsystem: Multimedia--> 5<!--Owner: @zhanghongran--> 6<!--Designer: @dpy2650---> 7<!--Tester: @cyakee--> 8<!--Adviser: @zengyawen--> 9 10## Basic Concepts 11 12### Introduction to Temporally Scalable Video Coding 13 14Scalable video coding is an extended standard for video coding. SVC (short for Scalable Video Coding, an extension of the H.264 standard) and SHVC (short for Scalable High Efficiency Video Coding, an extension of the H.265 standard) are popular nowadays. 15 16Scalable video coding allows conveyance of information structured in a hierarchical manner of spatial scalability, temporal scalability, and quality scalability. 17 18Temporally scalable video coding refers to the process of encoding a video sequence into a set of layers that provide an increasing temporal resolution. The following figure shows the structure of a bitstream that contains four temporal layers and is constructed based on the reference relationship. 19 20 21 22In scenarios where the channel condition is poor, frames can be dropped layer by layer in descending order (L3- > L2- > L1) to meet the changing requirements of transmission and decoding capabilities. 23 24The figure below shows the new bitstream structure when the frames at L3 are dropped. The bitstream can be normally decoded while the frame rate is reduced by half. Dropping can occur at other layers in a similar way. 25 26 27 28### Structure of a Temporally Scalable Bitstream 29A bitstream is organized by one or more Group of Pictures (GOPs). A GOP is a collection of consecutive pictures that can be independently decoded. It measures the distance between two I-frames (also named key frames). 30 31A GOP can be further divided into one or more Temporal Group of Pictures (TGOPs), and each TGOP is composed by a base layer (BL) and one or more associated enhancement layers (ELs). For example, frame 0 to frame 7 in the foregoing four-layer temporally scalable bitstream form a TGOP. 32 33- BL: bottom layer (L0) in the GOP. In temporal scalability, this layer is encoded at the lowest frame rate. 34 35- EL: layers above the BL. There are L1, L2, and L3 in ascending order. In temporal scalability, the lowest EL encodes, based on the BL, the frames at a higher frame rate; a higher EL encodes, based on the BL or a lower EL, the frames at a higher frame rate. 36 37### How to Implement the Structure of a Temporally Scalable Bitstream 38 39The temporally scalable bitstream structure is implemented by specifying reference frames, which are classified into the following types based on the duration of residence in a Decoded Picture Buffer (DPB): 40 41- Short-Term Reference (STR): a reference frame that cannot reside in the DPB for a long period of time. It adopts the First In First Out (FIFO) approach, which means that the oldest STR is removed from the DPB once the DPB is full. 42 43- Long-Term Reference (LTR): a reference frame that can reside in the DPB for a long period of time. It stays in the DPB until it is replaced by another decoded picture with the same ID. 44 45Although a specific cross-frame reference structure can be implemented when there is more than one STR, the span supported by temporal scalability is limited due to an excessively short validity period. This problem does not exist when coming to the LTR, which also covers the cross-frame scenario of the STR. Therefore, the LTR is preferably used to implement the structure of a temporally scalable bitstream. 46 47## When to Use 48You are advised to use temporal scalability in the following scenarios: 49 50- Real-time encoding and transmission scenarios with no cache or low cache on the playback side, for example, video conferencing, live streaming, and collaborative office. 51 52- Video encoding and recording scenario that requires video preview or multi-speed playback. 53 54If your application does not involve dynamic adjustment of the temporal reference structure and the hierarchical structure is simple, you are advised to use [global temporal scalability](#global-temporal-scalability). Otherwise, enable [LTR](#ltr). 55 56## Constraints 57 58- The global temporal scalability and LTR features are mutually exclusive. 59 60 The two features cannot be both enabled because they have normalized bottom-layer implementation. 61 62- When using the forcible IDR configuration along with the two features, use the frame channel configuration. 63 64 The reference frame is valid only in the GOP. After an I-frame is refreshed, the DPB is cleared, so does the reference frame. In other words, the I-frame refresh location has a great impact on the reference relationship. 65 66 When temporal scalability is enabled, to temporarily request the I-frame through **OH_MD_KEY_REQUEST_I_FRAME**, you must configure the frame channel with a determined effective time to notify the system of the I-frame refresh location, so as to avoid disorder of the reference relationship. For details, see the configuration guide of the frame channel. Do not use **OH_VideoEncoder_SetParameter**, which uses an uncertain effective time. For details, see "Step 4: Call **OH_VideoEncoder_RegisterParameterCallback()** to register the frame-specific parameter callback function" in [Video Encoding in Surface Mode](video-encoding.md#surface-mode). 67 68- The callback using **OH_AVBuffer** is supported, but the callback using **OH_AVMemory** is not. 69 70 Temporal scalability depends on the frame feature. Do not use **OH_AVMemory** to trigger **OH_AVCodecAsyncCallback**. Instead, use **OH_AVBuffer** to trigger **OH_AVCodecCallback**. 71 72- Temporal scalability employs P-pictures, but not B-pictures. 73 74 Temporal scalability can be hierarchical-P or hierarchical-B. Currently, this feature can only be hierarchical-P. 75 76- In the case of **UNIFORMLY_SCALED_REFERENCE**, TGOP can only be 2 or 4. 77 78## Global Temporal Scalability 79 80### Available APIs 81 82Global temporal scalability is suitable for encoding frames into a stable and simple temporal structure. Its initial configuration takes effect globally and cannot be dynamically modified. The configuration parameters are as follows: 83 84| Parameter| Description | 85| -------- | ---------------------------- | 86| OH_MD_KEY_VIDEO_ENCODER_ENABLE_TEMPORAL_SCALABILITY | Enabled status of the global temporal scalability feature.| 87| OH_MD_KEY_VIDEO_ENCODER_TEMPORAL_GOP_SIZE | TGOP size of the global temporal scalability feature.| 88| OH_MD_KEY_VIDEO_ENCODER_TEMPORAL_GOP_REFERENCE_MODE | TGOP reference mode of the global temporal scalability feature.| 89 90- **OH_MD_KEY_VIDEO_ENCODER_ENABLE_TEMPORAL_SCALABILITY**: This parameter is set in the configuration phase. The feature can be successfully enabled only when it is supported. 91 92- **OH_MD_KEY_VIDEO_ENCODER_TEMPORAL_GOP_SIZE**: This parameter is optional and specifies the distance between two I-frames. You need to customize the I-frame density based on the frame extraction requirements. The value range is [2, GopSize). If no value is passed in, the default value is used. 93 94- **OH_MD_KEY_VIDEO_ENCODER_TEMPORAL_GOP_REFERENCE_MODE**: This parameter is optional and affects the reference mode of non-I-frames. The value can be **ADJACENT_REFERENCE**, **JUMP_REFERENCE**, or **UNIFORMLY_SCALED_REFERENCE**. **ADJACENT_REFERENCE** provides better compression performance, whereas **JUMP_REFERENCE** is more flexible in dropping frames. **UNIFORMLY_SCALED_REFERENCE** enables streams to be distributed more evenly in the case of frame loss. If no value is passed in, the default value is used. 95 96 > **NOTE** 97 > 98 > In the case of **UNIFORMLY_SCALED_REFERENCE**, TGOP can only be 2 or 4. 99 100Example 1: ADJACENT_REFERENCE in TGOP=4 101 102 103 104Example 2: JUMP_REFERENCE in TGOP=4 105 106 107 108Example 3: UNIFORMLY_SCALED_REFERENCE in TGOP = 4 109 110 111 112### How to Develop 113 114You can learn the basic encoding process in [Video Encoding](video-encoding.md). This section describes the differences from the basic video encoding process. 115 1161. When creating an encoder instance, check whether the video encoder supports the global temporal scalability feature. 117 118 ```c++ 119 // 1.1 Obtain the video encoder capability instance. The following uses H.264 as an example. 120 OH_AVCapability *cap = OH_AVCodec_GetCapability(OH_AVCODEC_MIMETYPE_VIDEO_AVC, true); 121 // 1.2 Check whether the global temporal scalability feature is supported. 122 bool isSupported = OH_AVCapability_IsFeatureSupported(cap, VIDEO_ENCODER_TEMPORAL_SCALABILITY); 123 ``` 124 125 If the feature is supported, it can be enabled. 126 127 ```c++ 128 // Create a hardware encoder instance. 129 OH_AVCodec *videoEnc = OH_VideoEncoder_CreateByMime(OH_AVCODEC_MIMETYPE_VIDEO_AVC); 130 ``` 131 1322. In the configuration phase, configure the parameters related to the global temporal scalability feature. 133 134 ```c++ 135 constexpr int32_t TGOP_SIZE = 3; 136 // 2.1 Create a temporary AVFormat used for configuration. 137 auto format = std::shared_ptr<OH_AVFormat>(OH_AVFormat_Create(), OH_AVFormat_Destroy); 138 if (format == nullptr) { 139 // Handle exceptions. 140 } 141 // 2.2 Fill in the key-value pair of the parameter used to enable the feature. 142 OH_AVFormat_SetIntValue(format.get(), OH_MD_KEY_VIDEO_ENCODER_ENABLE_TEMPORAL_SCALABILITY, 1); 143 // 2.3 (Optional) Fill in the key-value pairs of the parameters that specify the TGOP size and reference mode. 144 OH_AVFormat_SetIntValue(format.get(), OH_MD_KEY_VIDEO_ENCODER_TEMPORAL_GOP_SIZE, TGOP_SIZE); 145 OH_AVFormat_SetIntValue(format.get(), OH_MD_KEY_VIDEO_ENCODER_TEMPORAL_GOP_REFERENCE_MODE, ADJACENT_REFERENCE); 146 // 2.4 Configure the parameters. 147 int32_t ret = OH_VideoEncoder_Configure(videoEnc, format.get()); 148 if (ret != AV_ERR_OK) { 149 // Handle exceptions. 150 } 151 ``` 152 1533. (Optional) During output rotation, obtain the temporal layer information corresponding to the bitstream. 154 155 You can obtain the temporal layer information based on the configured TGOP parameters and the number of encoded frames. 156 157 The sample code is as follows: 158 159 ```c++ 160 constexpr int32_t TGOP_SIZE = 3; 161 uint32_t outPoc = 0; 162 // Obtain the relative position in the TGOP based on the number of valid frames in the output callback and determine the layer based on the configuration. 163 static void OnNewOutputBuffer(OH_AVCodec *codec, uint32_t index, OH_AVBuffer *buffer, void *userData) 164 { 165 struct OH_AVCodecBufferAttr attr; 166 OH_AVErrCode ret = OH_AVBuffer_GetBufferAttr(buffer, &attr); 167 if (ret != AV_ERR_OK) { 168 // Handle exceptions. 169 } 170 // Set POC to 0 after the I-frame is refreshed. 171 if (attr.flags & AVCODEC_BUFFER_FLAGS_SYNC_FRAME) { 172 outPoc = 0; 173 } 174 // Skip this step only for the XPS output. 175 if (attr.flags != AVCODEC_BUFFER_FLAGS_CODEC_DATA) { 176 int32_t tGopInner = outPoc % TGOP_SIZE; 177 if (tGopInner == 0) { 178 // I-frames cannot be dropped in subsequent transmission and decoding processes. 179 } else { 180 // Non-I-frames can be dropped in subsequent transmission and decoding processes. 181 } 182 outPoc++; 183 } 184 } 185 ``` 186 1874. (Optional) During output rotation, use the temporal layer information obtained for adaptive transmission or decoding. 188 189 Based on the temporally scalable bitstream and layer information, select a required layer for transmission, or carry the information to the peer for adaptive decoding. 190 191## LTR 192 193### Available APIs 194 195The LTR feature provides a configuration of the frame-level reference relationship. It is suitable for flexible and complex temporally hierarchical structures. 196 197| Parameter| Description | 198| -------- | ---------------------------- | 199| OH_MD_KEY_VIDEO_ENCODER_LTR_FRAME_COUNT | Number of LTR frames.| 200| OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_MARK_LTR | Marked as an LTR frame.| 201| OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_USE_LTR | POC number of the LTR frame referenced by the current frame. | 202 203- **OH_MD_KEY_VIDEO_ENCODER_LTR_FRAME_COUNT**: This parameter is set in the configuration phase. The value of this parameter must not be greater than the maximum number of supported LTR frames. For details, see Step 3 below. 204- **OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_MARK_LTR **: The BL layer and the EL layer to skip are marked as an LTR frame. 205- **OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_USE_LTR**: POC number of the frame marked as the LTR frame. 206 207For example, to implement the four-layer temporally hierarchical structure described in [Introduction to Temporally Scalable Video Coding](#introduction-to-temporally-scalable-video-coding), perform the following steps: 208 2091. In the configuration phase, set **OH_MD_KEY_VIDEO_ENCODER_LTR_FRAME_COUNT** to **5**. 210 2112. In the input rotation of the running phase, configure the LTR parameters according to the following table, where **\** means that no configuration is required. 212 213 | Configuration\POC| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 214 | -------- |---|---|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----| 215 | MARK_LTR | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 216 | USE_LTR | \ | \ | 0 | \ | 0 | \ | 4 | \ | 0 | \ | 8 | \ | 8 | \ | 12 | \ | 8 | 217 218### How to Develop 219 220This section describes only the steps that are different from the basic encoding process. You can learn the basic encoding process in [Video Encoding](video-encoding.md). 221 2221. When creating an encoder instance, check whether the video encoder supports the LTR feature. 223 224 ```c++ 225 constexpr int32_t NEEDED_LTR_COUNT = 5; 226 bool isSupported = false; 227 int32_t supportedLTRCount = 0; 228 // 1.1 Obtain the encoder capability instance. The following uses H.264 as an example. 229 OH_AVCapability *cap = OH_AVCodec_GetCapability(OH_AVCODEC_MIMETYPE_VIDEO_AVC, true); 230 // 1.2 Check whether the LTR feature is supported. 231 isSupported = OH_AVCapability_IsFeatureSupported(cap, VIDEO_ENCODER_LONG_TERM_REFERENCE); 232 // 1.3 Determine the number of supported LTR frames. 233 if (isSupported) { 234 OH_AVFormat *properties = OH_AVCapability_GetFeatureProperties(cap, VIDEO_ENCODER_LONG_TERM_REFERENCE); 235 if (!OH_AVFormat_GetIntValue(properties, OH_FEATURE_PROPERTY_KEY_VIDEO_ENCODER_MAX_LTR_FRAME_COUNT, &supportedLTRCount)) { 236 // Handle exceptions. 237 } 238 OH_AVFormat_Destroy(properties); 239 // 1.4 Check whether the number of supported LTR frames meets the structure requirements. 240 isSupported = supportedLTRCount >= NEEDED_LTR_COUNT; 241 } 242 ``` 243 244 If the LTR feature is supported and the number of LTR frames meets the requirements, the feature can be enabled. 245 2462. Register the frame channel callback functions. 247 248 The following is an example of the configuration in buffer input mode: 249 250 ```c++ 251 // 2.1 Implement the OH_AVCodecOnNeedInputBuffer callback function. 252 static void OnNeedInputBuffer(OH_AVCodec *codec, uint32_t index, OH_AVBuffer *buffer, void *userData) 253 { 254 // The index of the input frame buffer is sent to InIndexQueue. 255 // The input frame data (specified by buffer) is sent to InBufferQueue. 256 // Perform data processing. For details, see: 257 // - Write the stream to encode. 258 // - Notify the encoder of EOS. 259 // - Write the frame parameter. 260 auto format = std::shared_ptr<OH_AVFormat>(OH_AVBuffer_GetParameter(buffer), OH_AVFormat_Destroy); 261 if (format == nullptr) { 262 // Handle exceptions. 263 } 264 OH_AVFormat_SetIntValue(format.get(), OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_MARK_LTR, 1); 265 OH_AVFormat_SetIntValue(format.get(), OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_USE_LTR, 4); 266 OH_AVBuffer_SetParameter(buffer, format.get()); 267 // Notify the encoder that the buffer input is complete. 268 OH_VideoEncoder_PushInputBuffer(codec, index); 269 } 270 271 // 2.2 Implement the OH_AVCodecOnNewOutputBuffer callback function. 272 static void OnNewOutputBuffer(OH_AVCodec *codec, uint32_t index, OH_AVBuffer *buffer, void *userData) 273 { 274 // The index of the output frame buffer is sent to outIndexQueue. 275 // The encoded frame data (specified by buffer) is sent to outBufferQueue. 276 // Perform data processing. For details, see: 277 // - Release the encoded frame. 278 // - Record POC and the enabled status of LTR. 279 } 280 281 // 2.3 Register the callback functions. 282 OH_AVCodecCallback cb; 283 cb.onNeedInputBuffer = OnNeedInputBuffer; 284 cb.onNewOutputBuffer = OnNewOutputBuffer; 285 OH_VideoEncoder_RegisterCallback(videoEnc, cb, nullptr); 286 ``` 287 288 The following is an example of the configuration in surface input mode: 289 290 ```c++ 291 // 2.1 Implement the OH_VideoEncoder_OnNeedInputParameter callback function. 292 static void OnNeedInputParameter(OH_AVCodec *codec, uint32_t index, OH_AVFormat *parameter, void *userData) 293 { 294 // The index of the input frame buffer is sent to InIndexQueue. 295 // The input frame data (specified by avformat) is sent to InFormatQueue. 296 // Perform data processing. For details, see: 297 // - Write the stream to encode. 298 // - Notify the encoder of EOS. 299 // - Write the frame parameter. 300 OH_AVFormat_SetIntValue(parameter, OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_MARK_LTR, 1); 301 OH_AVFormat_SetIntValue(parameter, OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_USE_LTR, 4); 302 // Notify the encoder that the frame input is complete. 303 OH_VideoEncoder_PushInputParameter(codec, index); 304 } 305 306 // 2.2 Implement the OH_AVCodecOnNewOutputBuffer callback function. 307 static void OnNewOutputBuffer(OH_AVCodec *codec, uint32_t index, OH_AVBuffer *buffer, void *userData) 308 { 309 // The index of the output frame buffer is sent to outIndexQueue. 310 // The encoded frame data (specified by buffer) is sent to outBufferQueue. 311 // Perform data processing. For details, see: 312 // - Release the encoded frame. 313 // - Record POC and the enabled status of LTR. 314 } 315 316 // 2.3 Register the callback functions. 317 OH_AVCodecCallback cb; 318 cb.onNewOutputBuffer = OnNewOutputBuffer; 319 OH_VideoEncoder_RegisterCallback(videoEnc, cb, nullptr); 320 // 2.4 Register the frame channel callback functions. 321 OH_VideoEncoder_OnNeedInputParameter inParaCb = OnNeedInputParameter; 322 OH_VideoEncoder_RegisterParameterCallback(videoEnc, inParaCb, nullptr); 323 ``` 324 3253. In the configuration phase, configure the maximum number of LTR frames. 326 327 ```c++ 328 constexpr int32_t NEEDED_LTR_COUNT = 5; 329 // 3.1 Create a temporary AVFormat used for configuration. 330 auto format = std::shared_ptr<OH_AVFormat>(OH_AVFormat_Create(), OH_AVFormat_Destroy); 331 if (format == nullptr) { 332 // Handle exceptions. 333 } 334 // 3.2 Fill in the key-value pair of the parameter that specifies the number of LTR frames. 335 OH_AVFormat_SetIntValue(format.get(), OH_MD_KEY_VIDEO_ENCODER_LTR_FRAME_COUNT, NEEDED_LTR_COUNT); 336 // 3.3 Configure the parameters. 337 int32_t ret = OH_VideoEncoder_Configure(videoEnc, format.get()); 338 if (ret != AV_ERR_OK) { 339 // Handle exceptions. 340 } 341 ``` 342 3434. (Optional) During output rotation, obtain the temporal layer information corresponding to the bitstream. 344 345 This procedure is the same as that described in the global temporal scalability feature. 346 347 The LTR parameters are configured in the input rotation. You can also record the LTR parameters in the input rotation and find the corresponding input parameters in the output rotation. 348 3495. (Optional) During output rotation, use the temporal layer information obtained for adaptive transmission or decoding. 350 351 This procedure is the same as that described in the global temporal scalability feature. 352