• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Changes for 1.4.2 'Road Runner':
2--------------------------------
3
41.4.2 is a small release of dav1d, improving notably ARM, AVX-512 and PowerPC
5 - AVX2 optimizations for 8-tap and new variants for 6-tap
6 - AVX-512 optimizations for 8-tap and new variants for 6-tap
7 - Improve entropy decoding on ARM64
8 - New ARM64 optimizations for convolutions based on DotProd extension
9 - New ARM64 optimizations for convolutions based on i8mm extension
10 - New ARM64 optimizations for subpel and prep filters for i8mm
11 - Misc improvements on existing ARM64 optimizations, notably for put/prep
12 - New PowerPC9 optimizations for loopfilter
13 - Support for macOS kperf API for benchmarking
14
15
16Changes for 1.4.1 'Road Runner':
17--------------------------------
18
191.4.1 is a small release of dav1d, improving notably ARM and RISC-V speed
20
21- Optimizations for 6tap filters for NEON (ARM)
22- More RISC-V optimizations for itx (4x8, 8x4, 4x16, 16x4, 8x16, 16x8)
23- Reduction of binary size on ARM64, ARM32 and RISC-V
24- Fix out-of-bounds read in 8bpc SSE2/SSSE3 wiener_filter
25- Msac optimizations
26
27
28Changes for 1.4.0 'Road Runner':
29--------------------------------
30
311.4.0 is a medium release of dav1d, focusing on new architecture support and optimizations
32
33- AVX-512 optimizations for z1, z2, z3 in 8bit and high-bitdepth
34- New architecture supported: loongarch
35- Loongarch optimizations for 8bit
36- New architecture supported: RISC-V
37- RISC-V optimizations for itx
38- Misc improvements in threading and in reducing binary size
39- Fix potential integer overflow with extremely large frame sizes (CVE-2024-1580)
40
41
42Changes for 1.3.0 'Tundra Peregrine Falcon (Calidus)':
43------------------------------------------------------
44
451.3.0 is a medium release of dav1d, focus on new APIs and memory usage reduction.
46
47- Reduce memory usage in numerous places
48- ABI break in Dav1dSequenceHeader, Dav1dFrameHeader, Dav1dContentLightLevel structures
49- new API function to check the API version: dav1d_version_api()
50- Rewrite of the SGR functions for ARM64 to be faster
51- NEON implemetation of save_tmvs for ARM32 and ARM64
52- x86 palette DSP for pal_idx_finish function
53
54
55Changes for 1.2.1 'Arctic Peregrine Falcon':
56--------------------------------------------
57
581.2.1 is a small release of dav1d, adding more SIMD and fixes
59
60- Fix a threading race on task_thread.init_done
61- NEON z2 8bpc and high bit-depth optimizations
62- SSSE3 z2 high bit-depth optimziations
63- Fix a desynced luma/chroma planes issue with Film Grain
64- Reduce memory consumption
65- Improve dav1d_parse_sequence_header() speed
66- OBU: Improve header parsing and fix potential overflows
67- OBU: Improve ITU-T T.35 parsing speed
68- Misc buildsystems, CI and headers fixes
69
70
71Changes for 1.2.0 'Arctic Peregrine Falcon':
72--------------------------------------------
73
741.2.0 is a small release of dav1d, adding more SIMD and fixes
75
76- Improvements on attachments of props and T.35 entries on output pictures
77- NEON z1/z3 high bit-depth optimizations and improvements for 8bpc
78- SSSE3 z2/z3 8bpc and SSSE3 z1/z3 high bit-depth optimziations
79- refmvs.save_tmvs optimizations in SSSE3/AVX2/AVX-512
80- AVX-512 optimizations for high bit-depth itx (16x64, 32x64, 64x16, 64x32, 64x64)
81- AVX2 optimizations for 12bpc for 16x32, 32x16, 32x32 itx
82
83
84Changes for 1.1.0 'Arctic Peregrine Falcon':
85--------------------------------------------
86
871.1.0 is an important release of dav1d, fixing numerous bugs, and adding SIMD
88
89- New function dav1d_get_frame_delay to query the decoder frame delay
90- Numerous fixes for strict conformity to the specs and samples
91- NEON and AVX-512 misc fixes and improvements
92- Partial AVX2 12bpc transform implementations
93- AVX-512 high bit-depth cdef_filter, loopfilter, itx
94- NEON z1/z3 optimization for 8bpc
95- SSSE3 z1 optimization for 8bpc
96
97 "From VideoLAN with love"
98
99
100Changes for 1.0.0 'Peregrine Falcon':
101-------------------------------------
102
1031.0.0 is a major release of dav1d, adding important features and bug fixes.
104
105It notably changes, in an important way, the way threading works, by adding
106an automatic thread management.
107
108It also adds support for AVX-512 acceleration, and adds speedups to existing x86
109code (from SSE2 to AVX2).
110
1111.0.0 adds new grain API to ease acceleration on the GPU, and adds an API call
112to get information of which frame failed to decode, in error cases.
113
114Finally, 1.0.0 fixes numerous small bugs that were reported since the beginning
115of the project to have a proper release.
116
117                                     .''.
118         .''.      .        *''*    :_\/_:     .
119        :_\/_:   _\(/_  .:.*_\/_*   : /\ :  .'.:.'.
120    .''.: /\ :   ./)\   ':'* /\ * :  '..'.  -=:o:=-
121   :_\/_:'.:::.    ' *''*    * '.\'/.' _\(/_'.':'.'
122   : /\ : :::::     *_\/_*     -= o =-  /)\    '  *
123    '..'  ':::'     * /\ *     .'/.\'.   '
124        *            *..*         :
125          *                       :
126          *         1.0.0
127
128
129
130Changes for 0.9.2 'Golden Eagle':
131---------------------------------
132
1330.9.2 is a small update of dav1d on the 0.9.x branch:
134 - x86: SSE4 optimizations of inverse transforms for 10bit for all sizes
135 - x86: mc.resize optimizations with AVX2/SSSE3 for 10/12b
136 - x86: SSSE3 optimizations for cdef_filter in 10/12b and mc_w_mask_422/444 in 8b
137 - ARM NEON optimizations for FilmGrain Gen_grain functions
138 - Optimizations for splat_mv in SSE2/AVX2 and NEON
139 - x86: SGR improvements for SSSE3 CPUs
140 - x86: AVX2 optimizations for cfl_ac
141
142
143Changes for 0.9.1 'Golden Eagle':
144---------------------------------
145
1460.9.1 is a middle-size revision of dav1d, adding notably 10b acceleration for SSSE3:
147 - 10/12b SSSE3 optimizations for mc (avg, w_avg, mask, w_mask, emu_edge),
148   prep/put_bilin, prep/put_8tap, ipred (dc/h/v, paeth, smooth, pal, filter), wiener,
149   sgr (10b), warp8x8, deblock, film_grain, cfl_ac/pred for 32bit and 64bit x86 processors
150 - Film grain NEON for fguv 10/12b, fgy/fguv 8b and fgy/fguv 10/12 arm32
151 - Fixes for filmgrain on ARM
152 - itx 10bit optimizations for 4x4/x8/x16, 8x4/x8/x16 for SSE4
153 - Misc improvements on SSE2, SSE4
154
155
156Changes for 0.9.0 'Golden Eagle':
157---------------------------------
158
1590.9.0 is a major version of dav1d, adding notably 10b acceleration on x64.
160
161Details:
162 - x86 (64bit) AVX2 implementation of most 10b/12b functions, which should provide
163   a large boost for high-bitdepth decoding on modern x86 computers and servers.
164 - ARM64 neon implementation of FilmGrain (4:2:0/4:2:2/4:4:4 8bit)
165 - New API to signal events happening during the decoding process
166
167
168Changes for 0.8.2 'Eurasian Hobby':
169-----------------------------------
170
1710.8.2 is a middle-size update of the 0.8.0 branch:
172 - ARM32 optimizations for ipred and itx in 10/12bits,
173   completing the 10b/12b work on ARM64 and ARM32
174 - Give the post-filters their own threads
175 - ARM64: rewrite the wiener functions
176 - Speed up coefficient decoding, 0.5%-3% global decoding gain
177 - x86 optimizations for CDEF_filter and wiener in 10/12bit
178 - x86: rewrite the SGR AVX2 asm
179 - x86: improve msac speed on SSE2+ machines
180 - ARM32: improve speed of ipred and warp
181 - ARM64: improve speed of ipred, cdef_dir, cdef_filter, warp_motion and itx16
182 - ARM32/64: improve speed of looprestoration
183 - Add seeking, pausing to the player
184 - Update the player for rendering of 10b/12b
185 - Misc speed improvements and fixes on all platforms
186 - Add a xxh3 muxer in the dav1d application
187
188
189Changes for 0.8.1 'Eurasian Hobby':
190-----------------------------------
191
1920.8.1 is a minor update on 0.8.0:
193 - Keep references to buffers valid after dav1d_close(). Fixes a regression
194   caused by the picture buffer pool added in 0.8.0.
195 - ARM32 optimizations for 10bit bitdepth for SGR
196 - ARM32 optimizations for 16bit bitdepth for blend/w_masl/emu_edge
197 - ARM64 optimizations for 10bit bitdepth for SGR
198 - x86 optimizations for wiener in SSE2/SSSE3/AVX2
199
200
201Changes for 0.8.0 'Eurasian Hobby':
202-----------------------------------
203
2040.8.0 is a major update for dav1d:
205 - Improve the performance by using a picture buffer pool;
206   The improvements can reach 10% on some cases on Windows.
207 - Support for Apple ARM Silicon
208 - ARM32 optimizations for 8bit bitdepth for ipred paeth, smooth, cfl
209 - ARM32 optimizations for 10/12/16bit bitdepth for mc_avg/mask/w_avg,
210   put/prep 8tap/bilin, wiener and CDEF filters
211 - ARM64 optimizations for cfl_ac 444 for all bitdepths
212 - x86 optimizations for MC 8-tap, mc_scaled in AVX2
213 - x86 optimizations for CDEF in SSE and {put/prep}_{8tap/bilin} in SSSE3
214
215
216Changes for 0.7.1 'Frigatebird':
217------------------------------
218
2190.7.1 is a minor update on 0.7.0:
220 - ARM32 NEON optimizations for itxfm, which can give up to 28% speedup, and MSAC
221 - SSE2 optimizations for prep_bilin and prep_8tap
222 - AVX2 optimizations for MC scaled
223 - Fix a clamping issue in motion vector projection
224 - Fix an issue on some specific Haswell CPU on ipred_z AVX2 functions
225 - Improvements on the dav1dplay utility player to support resizing
226
227
228Changes for 0.7.0 'Frigatebird':
229------------------------------
230
2310.7.0 is a major release for dav1d:
232 - Faster refmv implementation gaining up to 12% speed while -25% of RAM (Single Thread)
233 - 10b/12b ARM64 optimizations are mostly complete:
234   - ipred (paeth, smooth, dc, pal, filter, cfl)
235   - itxfm (only 10b)
236 - AVX2/SSSE3 for non-4:2:0 film grain and for mc.resize
237 - AVX2 for cfl4:4:4
238 - AVX-512 CDEF filter
239 - ARM64 8b improvements for cfl_ac and itxfm
240 - ARM64 implementation for emu_edge in 8b/10b/12b
241 - ARM32 implementation for emu_edge in 8b
242 - Improvements on the dav1dplay utility player to support 10 bit,
243   non-4:2:0 pixel formats and film grain on the GPU
244
245
246Changes for 0.6.0 'Gyrfalcon':
247------------------------------
248
2490.6.0 is a major release for dav1d:
250 - New ARM64 optimizations for the 10/12bit depth:
251    - mc_avg, mc_w_avg, mc_mask
252    - mc_put/mc_prep 8tap/bilin
253    - mc_warp_8x8
254    - mc_w_mask
255    - mc_blend
256    - wiener
257    - SGR
258    - loopfilter
259    - cdef
260 - New AVX-512 optimizations for prep_bilin, prep_8tap, cdef_filter, mc_avg/w_avg/mask
261 - New SSSE3 optimizations for film grain
262 - New AVX2 optimizations for msac_adapt16
263 - Fix rare mismatches against the reference decoder, notably because of clipping
264 - Improvements on ARM64 on msac, cdef, mc_blend_v and looprestoration optimizations
265 - Improvements on AVX2 optimizations for cdef_filter
266 - Improvements in the C version for itxfm, cdef_filter
267
268
269Changes for 0.5.2 'Asiatic Cheetah':
270------------------------------------
271
2720.5.2 is a small release improving speed for ARM32 and adding minor features:
273 - ARM32 optimizations for loopfilter, ipred_dc|h|v
274 - Add section-5 raw OBU demuxer
275 - Improve the speed by reducing the L2 cache collisions
276 - Fix minor issues
277
278
279Changes for 0.5.1 'Asiatic Cheetah':
280------------------------------------
281
2820.5.1 is a small release improving speeds and fixing minor issues
283compared to 0.5.0:
284 - SSE2 optimizations for CDEF, wiener and warp_affine
285 - NEON optimizations for SGR on ARM32
286 - Fix mismatch issue in x86 asm in inverse identity transforms
287 - Fix build issue in ARM64 assembly if debug info was enabled
288 - Add a workaround for Xcode 11 -fstack-check bug
289
290
291Changes for 0.5.0 'Asiatic Cheetah':
292------------------------------------
293
2940.5.0 is a medium release fixing regressions and minor issues,
295and improving speed significantly:
296 - Export ITU T.35 metadata
297 - Speed improvements on blend_ on ARM
298 - Speed improvements on decode_coef and MSAC
299 - NEON optimizations for blend*, w_mask_, ipred functions for ARM64
300 - NEON optimizations for CDEF and warp on ARM32
301 - SSE2 optimizations for MSAC hi_tok decoding
302 - SSSE3 optimizations for deblocking loopfilters and warp_affine
303 - AVX2 optimizations for film grain and ipred_z2
304 - SSE4 optimizations for warp_affine
305 - VSX optimizations for wiener
306 - Fix inverse transform overflows in x86 and NEON asm
307 - Fix integer overflows with large frames
308 - Improve film grain generation to match reference code
309 - Improve compatibility with older binutils for ARM
310 - More advanced Player example in tools
311
312
313Changes for 0.4.0 'Cheetah':
314----------------------------
315
316 - Fix playback with unknown OBUs
317 - Add an option to limit the maximum frame size
318 - SSE2 and ARM64 optimizations for MSAC
319 - Improve speed on 32bits systems
320 - Optimization in obmc blend
321 - Reduce RAM usage significantly
322 - The initial PPC SIMD code, cdef_filter
323 - NEON optimizations for blend functions on ARM
324 - NEON optimizations for w_mask functions on ARM
325 - NEON optimizations for inverse transforms on ARM64
326 - VSX optimizations for CDEF filter
327 - Improve handling of malloc failures
328 - Simple Player example in tools
329
330
331Changes for 0.3.1 'Sailfish':
332------------------------------
333
334 - Fix a buffer overflow in frame-threading mode on SSSE3 CPUs
335 - Reduce binary size, notably on Windows
336 - SSSE3 optimizations for ipred_filter
337 - ARM optimizations for MSAC
338
339
340Changes for 0.3.0 'Sailfish':
341------------------------------
342
343This is the final release for the numerous speed improvements of 0.3.0-rc.
344It mostly:
345 - Fixes an annoying crash on SSSE3 that happened in the itx functions
346
347
348Changes for 0.2.2 (0.3.0-rc) 'Antelope':
349-----------------------------
350
351 - Large improvement on MSAC decoding with SSE, bringing 4-6% speed increase
352   The impact is important on SSSE3, SSE4 and AVX2 cpus
353 - SSSE3 optimizations for all blocks size in itx
354 - SSSE3 optimizations for ipred_paeth and ipred_cfl (420, 422 and 444)
355 - Speed improvements on CDEF for SSE4 CPUs
356 - NEON optimizations for SGR and loop filter
357 - Minor crashes, improvements and build changes
358
359
360Changes for 0.2.1 'Antelope':
361----------------------------
362
363 - SSSE3 optimization for cdef_dir
364 - AVX2 improvements of the existing CDEF optimizations
365 - NEON improvements of the existing CDEF and wiener optimizations
366 - Clarification about the numbering/versionning scheme
367
368
369Changes for 0.2.0 'Antelope':
370----------------------------
371
372 - ARM64 and ARM optimizations using NEON instructions
373 - SSSE3 optimizations for both 32 and 64bits
374 - More AVX2 assembly, reaching almost completion
375 - Fix installation of includes
376 - Rewrite inverse transforms to avoid overflows
377 - Snap packaging for Linux
378 - Updated API (ABI and API break)
379 - Fixes for un-decodable samples
380
381
382Changes for 0.1.0 'Gazelle':
383----------------------------
384
385Initial release of dav1d, the fast and small AV1 decoder.
386 - Support for all features of the AV1 bitstream
387 - Support for all bitdepth, 8, 10 and 12bits
388 - Support for all chroma subsamplings 4:2:0, 4:2:2, 4:4:4 *and* grayscale
389 - Full acceleration for AVX2 64bits processors, making it the fastest decoder
390 - Partial acceleration for SSSE3 processors
391 - Partial acceleration for NEON processors
392