• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Temporally Scalable Video Coding
2
3<!--Kit: AVCodec Kit-->
4<!--Subsystem: Multimedia-->
5<!--Owner: @zhanghongran-->
6<!--Designer: @dpy2650--->
7<!--Tester: @cyakee-->
8<!--Adviser: @zengyawen-->
9
10## Basic Concepts
11
12### Introduction to Temporally Scalable Video Coding
13
14Scalable video coding is an extended standard for video coding. SVC (short for Scalable Video Coding, an extension of the H.264 standard) and SHVC (short for Scalable High Efficiency Video Coding, an extension of the H.265 standard) are popular nowadays.
15
16Scalable video coding allows conveyance of information structured in a hierarchical manner of spatial scalability, temporal scalability, and quality scalability.
17
18Temporally scalable video coding refers to the process of encoding a video sequence into a set of layers that provide an increasing temporal resolution. The following figure shows the structure of a bitstream that contains four temporal layers and is constructed based on the reference relationship.
19
20![Temporal scalability 4 layers](figures/temporal-scalability-4layers.png)
21
22In scenarios where the channel condition is poor, frames can be dropped layer by layer in descending order (L3- > L2- > L1) to meet the changing requirements of transmission and decoding capabilities.
23
24The figure below shows the new bitstream structure when the frames at L3 are dropped. The bitstream can be normally decoded while the frame rate is reduced by half. Dropping can occur at other layers in a similar way.
25
26![Temporal scalability 4 layers L3 dropped](figures/temporal-scalability-4layers-L3-dropped.png)
27
28### Structure of a Temporally Scalable Bitstream
29A bitstream is organized by one or more Group of Pictures (GOPs). A GOP is a collection of consecutive pictures that can be independently decoded. It measures the distance between two I-frames (also named key frames).
30
31A GOP can be further divided into one or more Temporal Group of Pictures (TGOPs), and each TGOP is composed by a base layer (BL) and one or more associated enhancement layers (ELs). For example, frame 0 to frame 7 in the foregoing four-layer temporally scalable bitstream form a TGOP.
32
33- BL: bottom layer (L0) in the GOP. In temporal scalability, this layer is encoded at the lowest frame rate.
34
35- EL: layers above the BL. There are L1, L2, and L3 in ascending order. In temporal scalability, the lowest EL encodes, based on the BL, the frames at a higher frame rate; a higher EL encodes, based on the BL or a lower EL, the frames at a higher frame rate.
36
37### How to Implement the Structure of a Temporally Scalable Bitstream
38
39The temporally scalable bitstream structure is implemented by specifying reference frames, which are classified into the following types based on the duration of residence in a Decoded Picture Buffer (DPB):
40
41- Short-Term Reference (STR): a reference frame that cannot reside in the DPB for a long period of time. It adopts the First In First Out (FIFO) approach, which means that the oldest STR is removed from the DPB once the DPB is full.
42
43- Long-Term Reference (LTR): a reference frame that can reside in the DPB for a long period of time. It stays in the DPB until it is replaced by another decoded picture with the same ID.
44
45Although a specific cross-frame reference structure can be implemented when there is more than one STR, the span supported by temporal scalability is limited due to an excessively short validity period. This problem does not exist when coming to the LTR, which also covers the cross-frame scenario of the STR. Therefore, the LTR is preferably used to implement the structure of a temporally scalable bitstream.
46
47## When to Use
48You are advised to use temporal scalability in the following scenarios:
49
50- Real-time encoding and transmission scenarios with no cache or low cache on the playback side, for example, video conferencing, live streaming, and collaborative office.
51
52- Video encoding and recording scenario that requires video preview or multi-speed playback.
53
54If your application does not involve dynamic adjustment of the temporal reference structure and the hierarchical structure is simple, you are advised to use [global temporal scalability](#global-temporal-scalability). Otherwise, enable [LTR](#ltr).
55
56## Constraints
57
58- The global temporal scalability and LTR features are mutually exclusive.
59
60  The two features cannot be both enabled because they have normalized bottom-layer implementation.
61
62- When using the forcible IDR configuration along with the two features, use the frame channel configuration.
63
64  The reference frame is valid only in the GOP. After an I-frame is refreshed, the DPB is cleared, so does the reference frame. In other words, the I-frame refresh location has a great impact on the reference relationship.
65
66  When temporal scalability is enabled, to temporarily request the I-frame through **OH_MD_KEY_REQUEST_I_FRAME**, you must configure the frame channel with a determined effective time to notify the system of the I-frame refresh location, so as to avoid disorder of the reference relationship. For details, see the configuration guide of the frame channel. Do not use **OH_VideoEncoder_SetParameter**, which uses an uncertain effective time. For details, see "Step 4: Call **OH_VideoEncoder_RegisterParameterCallback()** to register the frame-specific parameter callback function" in [Video Encoding in Surface Mode](video-encoding.md#surface-mode).
67
68- The callback using **OH_AVBuffer** is supported, but the callback using **OH_AVMemory** is not.
69
70  Temporal scalability depends on the frame feature. Do not use **OH_AVMemory** to trigger **OH_AVCodecAsyncCallback**. Instead, use **OH_AVBuffer** to trigger **OH_AVCodecCallback**.
71
72- Temporal scalability employs P-pictures, but not B-pictures.
73
74  Temporal scalability can be hierarchical-P or hierarchical-B. Currently, this feature can only be hierarchical-P.
75
76- In the case of **UNIFORMLY_SCALED_REFERENCE**, TGOP can only be 2 or 4.
77
78## Global Temporal Scalability
79
80### Available APIs
81
82Global temporal scalability is suitable for encoding frames into a stable and simple temporal structure. Its initial configuration takes effect globally and cannot be dynamically modified. The configuration parameters are as follows:
83
84| Parameter| Description                        |
85| -------- | ---------------------------- |
86| OH_MD_KEY_VIDEO_ENCODER_ENABLE_TEMPORAL_SCALABILITY  | Enabled status of the global temporal scalability feature.|
87| OH_MD_KEY_VIDEO_ENCODER_TEMPORAL_GOP_SIZE  | TGOP size of the global temporal scalability feature.|
88| OH_MD_KEY_VIDEO_ENCODER_TEMPORAL_GOP_REFERENCE_MODE  | TGOP reference mode of the global temporal scalability feature.|
89
90- **OH_MD_KEY_VIDEO_ENCODER_ENABLE_TEMPORAL_SCALABILITY**: This parameter is set in the configuration phase. The feature can be successfully enabled only when it is supported.
91
92- **OH_MD_KEY_VIDEO_ENCODER_TEMPORAL_GOP_SIZE**: This parameter is optional and specifies the distance between two I-frames. You need to customize the I-frame density based on the frame extraction requirements. The value range is [2, GopSize). If no value is passed in, the default value is used.
93
94- **OH_MD_KEY_VIDEO_ENCODER_TEMPORAL_GOP_REFERENCE_MODE**: This parameter is optional and affects the reference mode of non-I-frames. The value can be **ADJACENT_REFERENCE**, **JUMP_REFERENCE**, or **UNIFORMLY_SCALED_REFERENCE**. **ADJACENT_REFERENCE** provides better compression performance, whereas **JUMP_REFERENCE** is more flexible in dropping frames. **UNIFORMLY_SCALED_REFERENCE** enables streams to be distributed more evenly in the case of frame loss. If no value is passed in, the default value is used.
95
96    > **NOTE**
97    >
98    > In the case of **UNIFORMLY_SCALED_REFERENCE**, TGOP can only be 2 or 4.
99
100Example 1: ADJACENT_REFERENCE in TGOP=4
101
102![Temporal gop 4 adjacent reference](figures/temporal-scalability-tgop4-adjacent.png)
103
104Example 2: JUMP_REFERENCE in TGOP=4
105
106![TGOP4 jump reference](figures/temporal-scalability-tgop4-jump.png)
107
108Example 3: UNIFORMLY_SCALED_REFERENCE in TGOP = 4
109
110![TGOP4 uniformly scaled reference](figures/temporal-scalability-tgop4-uniformly.png)
111
112### How to Develop
113
114You can learn the basic encoding process in [Video Encoding](video-encoding.md). This section describes the differences from the basic video encoding process.
115
1161. When creating an encoder instance, check whether the video encoder supports the global temporal scalability feature.
117
118    ```c++
119    // 1.1 Obtain the video encoder capability instance. The following uses H.264 as an example.
120    OH_AVCapability *cap = OH_AVCodec_GetCapability(OH_AVCODEC_MIMETYPE_VIDEO_AVC, true);
121    // 1.2 Check whether the global temporal scalability feature is supported.
122    bool isSupported = OH_AVCapability_IsFeatureSupported(cap, VIDEO_ENCODER_TEMPORAL_SCALABILITY);
123    ```
124
125    If the feature is supported, it can be enabled.
126
127    ```c++
128    // Create a hardware encoder instance.
129    OH_AVCodec *videoEnc = OH_VideoEncoder_CreateByMime(OH_AVCODEC_MIMETYPE_VIDEO_AVC);
130    ```
131
1322. In the configuration phase, configure the parameters related to the global temporal scalability feature.
133
134    ```c++
135    constexpr int32_t TGOP_SIZE = 3;
136    // 2.1 Create a temporary AVFormat used for configuration.
137    auto format = std::shared_ptr<OH_AVFormat>(OH_AVFormat_Create(), OH_AVFormat_Destroy);
138    if (format == nullptr) {
139        // Handle exceptions.
140    }
141    // 2.2 Fill in the key-value pair of the parameter used to enable the feature.
142    OH_AVFormat_SetIntValue(format.get(), OH_MD_KEY_VIDEO_ENCODER_ENABLE_TEMPORAL_SCALABILITY, 1);
143    // 2.3 (Optional) Fill in the key-value pairs of the parameters that specify the TGOP size and reference mode.
144    OH_AVFormat_SetIntValue(format.get(), OH_MD_KEY_VIDEO_ENCODER_TEMPORAL_GOP_SIZE, TGOP_SIZE);
145    OH_AVFormat_SetIntValue(format.get(), OH_MD_KEY_VIDEO_ENCODER_TEMPORAL_GOP_REFERENCE_MODE, ADJACENT_REFERENCE);
146    // 2.4 Configure the parameters.
147    int32_t ret = OH_VideoEncoder_Configure(videoEnc, format.get());
148    if (ret != AV_ERR_OK) {
149        // Handle exceptions.
150    }
151    ```
152
1533. (Optional) During output rotation, obtain the temporal layer information corresponding to the bitstream.
154
155    You can obtain the temporal layer information based on the configured TGOP parameters and the number of encoded frames.
156
157    The sample code is as follows:
158
159    ```c++
160    constexpr int32_t TGOP_SIZE = 3;
161    uint32_t outPoc = 0;
162    // Obtain the relative position in the TGOP based on the number of valid frames in the output callback and determine the layer based on the configuration.
163    static void OnNewOutputBuffer(OH_AVCodec *codec, uint32_t index, OH_AVBuffer *buffer, void *userData)
164    {
165        struct OH_AVCodecBufferAttr attr;
166        OH_AVErrCode ret = OH_AVBuffer_GetBufferAttr(buffer, &attr);
167        if (ret != AV_ERR_OK) {
168            // Handle exceptions.
169        }
170        // Set POC to 0 after the I-frame is refreshed.
171        if (attr.flags & AVCODEC_BUFFER_FLAGS_SYNC_FRAME) {
172            outPoc = 0;
173        }
174        // Skip this step only for the XPS output.
175        if (attr.flags != AVCODEC_BUFFER_FLAGS_CODEC_DATA) {
176            int32_t tGopInner = outPoc % TGOP_SIZE;
177            if (tGopInner == 0) {
178                // I-frames cannot be dropped in subsequent transmission and decoding processes.
179            } else {
180                // Non-I-frames can be dropped in subsequent transmission and decoding processes.
181            }
182            outPoc++;
183        }
184    }
185    ```
186
1874. (Optional) During output rotation, use the temporal layer information obtained for adaptive transmission or decoding.
188
189    Based on the temporally scalable bitstream and layer information, select a required layer for transmission, or carry the information to the peer for adaptive decoding.
190
191## LTR
192
193### Available APIs
194
195The LTR feature provides a configuration of the frame-level reference relationship. It is suitable for flexible and complex temporally hierarchical structures.
196
197| Parameter| Description                |
198| -------- | ---------------------------- |
199| OH_MD_KEY_VIDEO_ENCODER_LTR_FRAME_COUNT  |  Number of LTR frames.|
200| OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_MARK_LTR  | Marked as an LTR frame.|
201| OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_USE_LTR   | POC number of the LTR frame referenced by the current frame. |
202
203- **OH_MD_KEY_VIDEO_ENCODER_LTR_FRAME_COUNT**: This parameter is set in the configuration phase. The value of this parameter must not be greater than the maximum number of supported LTR frames. For details, see Step 3 below.
204- **OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_MARK_LTR **: The BL layer and the EL layer to skip are marked as an LTR frame.
205- **OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_USE_LTR**: POC number of the frame marked as the LTR frame.
206
207For example, to implement the four-layer temporally hierarchical structure described in [Introduction to Temporally Scalable Video Coding](#introduction-to-temporally-scalable-video-coding), perform the following steps:
208
2091. In the configuration phase, set **OH_MD_KEY_VIDEO_ENCODER_LTR_FRAME_COUNT** to **5**.
210
2112. In the input rotation of the running phase, configure the LTR parameters according to the following table, where **\** means that no configuration is required.
212
213    | Configuration\POC| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
214    | -------- |---|---|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|
215    | MARK_LTR | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0  | 0  | 1  | 0  | 0  | 0  | 1  |
216    | USE_LTR  | \ | \ | 0 | \ | 0 | \ | 4 | \ | 0 | \ | 8  | \  | 8  | \  | 12 | \  | 8  |
217
218### How to Develop
219
220This section describes only the steps that are different from the basic encoding process. You can learn the basic encoding process in [Video Encoding](video-encoding.md).
221
2221. When creating an encoder instance, check whether the video encoder supports the LTR feature.
223
224    ```c++
225    constexpr int32_t NEEDED_LTR_COUNT = 5;
226    bool isSupported = false;
227    int32_t supportedLTRCount = 0;
228    // 1.1 Obtain the encoder capability instance. The following uses H.264 as an example.
229    OH_AVCapability *cap = OH_AVCodec_GetCapability(OH_AVCODEC_MIMETYPE_VIDEO_AVC, true);
230    // 1.2 Check whether the LTR feature is supported.
231    isSupported = OH_AVCapability_IsFeatureSupported(cap, VIDEO_ENCODER_LONG_TERM_REFERENCE);
232    // 1.3 Determine the number of supported LTR frames.
233    if (isSupported) {
234        OH_AVFormat *properties = OH_AVCapability_GetFeatureProperties(cap, VIDEO_ENCODER_LONG_TERM_REFERENCE);
235        if (!OH_AVFormat_GetIntValue(properties, OH_FEATURE_PROPERTY_KEY_VIDEO_ENCODER_MAX_LTR_FRAME_COUNT, &supportedLTRCount)) {
236            // Handle exceptions.
237        }
238        OH_AVFormat_Destroy(properties);
239        // 1.4 Check whether the number of supported LTR frames meets the structure requirements.
240        isSupported = supportedLTRCount >= NEEDED_LTR_COUNT;
241    }
242    ```
243
244    If the LTR feature is supported and the number of LTR frames meets the requirements, the feature can be enabled.
245
2462. Register the frame channel callback functions.
247
248    The following is an example of the configuration in buffer input mode:
249
250    ```c++
251    // 2.1 Implement the OH_AVCodecOnNeedInputBuffer callback function.
252    static void OnNeedInputBuffer(OH_AVCodec *codec, uint32_t index, OH_AVBuffer *buffer, void *userData)
253    {
254        // The index of the input frame buffer is sent to InIndexQueue.
255        // The input frame data (specified by buffer) is sent to InBufferQueue.
256        // Perform data processing. For details, see:
257        // - Write the stream to encode.
258        // - Notify the encoder of EOS.
259        // - Write the frame parameter.
260        auto format = std::shared_ptr<OH_AVFormat>(OH_AVBuffer_GetParameter(buffer), OH_AVFormat_Destroy);
261        if (format == nullptr) {
262            // Handle exceptions.
263        }
264        OH_AVFormat_SetIntValue(format.get(), OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_MARK_LTR, 1);
265        OH_AVFormat_SetIntValue(format.get(), OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_USE_LTR, 4);
266        OH_AVBuffer_SetParameter(buffer, format.get());
267        // Notify the encoder that the buffer input is complete.
268        OH_VideoEncoder_PushInputBuffer(codec, index);
269    }
270
271    // 2.2 Implement the OH_AVCodecOnNewOutputBuffer callback function.
272    static void OnNewOutputBuffer(OH_AVCodec *codec, uint32_t index, OH_AVBuffer *buffer, void *userData)
273    {
274        // The index of the output frame buffer is sent to outIndexQueue.
275        // The encoded frame data (specified by buffer) is sent to outBufferQueue.
276        // Perform data processing. For details, see:
277        // - Release the encoded frame.
278        // - Record POC and the enabled status of LTR.
279    }
280
281    // 2.3 Register the callback functions.
282    OH_AVCodecCallback cb;
283    cb.onNeedInputBuffer = OnNeedInputBuffer;
284    cb.onNewOutputBuffer = OnNewOutputBuffer;
285    OH_VideoEncoder_RegisterCallback(videoEnc, cb, nullptr);
286    ```
287
288    The following is an example of the configuration in surface input mode:
289
290    ```c++
291    // 2.1 Implement the OH_VideoEncoder_OnNeedInputParameter callback function.
292    static void OnNeedInputParameter(OH_AVCodec *codec, uint32_t index, OH_AVFormat *parameter, void *userData)
293    {
294        // The index of the input frame buffer is sent to InIndexQueue.
295        // The input frame data (specified by avformat) is sent to InFormatQueue.
296        // Perform data processing. For details, see:
297        // - Write the stream to encode.
298        // - Notify the encoder of EOS.
299        // - Write the frame parameter.
300        OH_AVFormat_SetIntValue(parameter, OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_MARK_LTR, 1);
301        OH_AVFormat_SetIntValue(parameter, OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_USE_LTR, 4);
302        // Notify the encoder that the frame input is complete.
303        OH_VideoEncoder_PushInputParameter(codec, index);
304    }
305
306    // 2.2 Implement the OH_AVCodecOnNewOutputBuffer callback function.
307    static void OnNewOutputBuffer(OH_AVCodec *codec, uint32_t index, OH_AVBuffer *buffer, void *userData)
308    {
309        // The index of the output frame buffer is sent to outIndexQueue.
310        // The encoded frame data (specified by buffer) is sent to outBufferQueue.
311        // Perform data processing. For details, see:
312        // - Release the encoded frame.
313        // - Record POC and the enabled status of LTR.
314    }
315
316    // 2.3 Register the callback functions.
317    OH_AVCodecCallback cb;
318    cb.onNewOutputBuffer = OnNewOutputBuffer;
319    OH_VideoEncoder_RegisterCallback(videoEnc, cb, nullptr);
320    // 2.4 Register the frame channel callback functions.
321    OH_VideoEncoder_OnNeedInputParameter inParaCb = OnNeedInputParameter;
322    OH_VideoEncoder_RegisterParameterCallback(videoEnc, inParaCb, nullptr);
323    ```
324
3253. In the configuration phase, configure the maximum number of LTR frames.
326
327    ```c++
328    constexpr int32_t NEEDED_LTR_COUNT = 5;
329    // 3.1 Create a temporary AVFormat used for configuration.
330    auto format = std::shared_ptr<OH_AVFormat>(OH_AVFormat_Create(), OH_AVFormat_Destroy);
331    if (format == nullptr) {
332        // Handle exceptions.
333    }
334    // 3.2 Fill in the key-value pair of the parameter that specifies the number of LTR frames.
335    OH_AVFormat_SetIntValue(format.get(), OH_MD_KEY_VIDEO_ENCODER_LTR_FRAME_COUNT, NEEDED_LTR_COUNT);
336    // 3.3 Configure the parameters.
337    int32_t ret = OH_VideoEncoder_Configure(videoEnc, format.get());
338    if (ret != AV_ERR_OK) {
339        // Handle exceptions.
340    }
341    ```
342
3434. (Optional) During output rotation, obtain the temporal layer information corresponding to the bitstream.
344
345    This procedure is the same as that described in the global temporal scalability feature.
346
347    The LTR parameters are configured in the input rotation. You can also record the LTR parameters in the input rotation and find the corresponding input parameters in the output rotation.
348
3495. (Optional) During output rotation, use the temporal layer information obtained for adaptive transmission or decoding.
350
351    This procedure is the same as that described in the global temporal scalability feature.
352