1Changes for 1.4.2 'Road Runner': 2-------------------------------- 3 41.4.2 is a small release of dav1d, improving notably ARM, AVX-512 and PowerPC 5 - AVX2 optimizations for 8-tap and new variants for 6-tap 6 - AVX-512 optimizations for 8-tap and new variants for 6-tap 7 - Improve entropy decoding on ARM64 8 - New ARM64 optimizations for convolutions based on DotProd extension 9 - New ARM64 optimizations for convolutions based on i8mm extension 10 - New ARM64 optimizations for subpel and prep filters for i8mm 11 - Misc improvements on existing ARM64 optimizations, notably for put/prep 12 - New PowerPC9 optimizations for loopfilter 13 - Support for macOS kperf API for benchmarking 14 15 16Changes for 1.4.1 'Road Runner': 17-------------------------------- 18 191.4.1 is a small release of dav1d, improving notably ARM and RISC-V speed 20 21- Optimizations for 6tap filters for NEON (ARM) 22- More RISC-V optimizations for itx (4x8, 8x4, 4x16, 16x4, 8x16, 16x8) 23- Reduction of binary size on ARM64, ARM32 and RISC-V 24- Fix out-of-bounds read in 8bpc SSE2/SSSE3 wiener_filter 25- Msac optimizations 26 27 28Changes for 1.4.0 'Road Runner': 29-------------------------------- 30 311.4.0 is a medium release of dav1d, focusing on new architecture support and optimizations 32 33- AVX-512 optimizations for z1, z2, z3 in 8bit and high-bitdepth 34- New architecture supported: loongarch 35- Loongarch optimizations for 8bit 36- New architecture supported: RISC-V 37- RISC-V optimizations for itx 38- Misc improvements in threading and in reducing binary size 39- Fix potential integer overflow with extremely large frame sizes (CVE-2024-1580) 40 41 42Changes for 1.3.0 'Tundra Peregrine Falcon (Calidus)': 43------------------------------------------------------ 44 451.3.0 is a medium release of dav1d, focus on new APIs and memory usage reduction. 46 47- Reduce memory usage in numerous places 48- ABI break in Dav1dSequenceHeader, Dav1dFrameHeader, Dav1dContentLightLevel structures 49- new API function to check the API version: dav1d_version_api() 50- Rewrite of the SGR functions for ARM64 to be faster 51- NEON implemetation of save_tmvs for ARM32 and ARM64 52- x86 palette DSP for pal_idx_finish function 53 54 55Changes for 1.2.1 'Arctic Peregrine Falcon': 56-------------------------------------------- 57 581.2.1 is a small release of dav1d, adding more SIMD and fixes 59 60- Fix a threading race on task_thread.init_done 61- NEON z2 8bpc and high bit-depth optimizations 62- SSSE3 z2 high bit-depth optimziations 63- Fix a desynced luma/chroma planes issue with Film Grain 64- Reduce memory consumption 65- Improve dav1d_parse_sequence_header() speed 66- OBU: Improve header parsing and fix potential overflows 67- OBU: Improve ITU-T T.35 parsing speed 68- Misc buildsystems, CI and headers fixes 69 70 71Changes for 1.2.0 'Arctic Peregrine Falcon': 72-------------------------------------------- 73 741.2.0 is a small release of dav1d, adding more SIMD and fixes 75 76- Improvements on attachments of props and T.35 entries on output pictures 77- NEON z1/z3 high bit-depth optimizations and improvements for 8bpc 78- SSSE3 z2/z3 8bpc and SSSE3 z1/z3 high bit-depth optimziations 79- refmvs.save_tmvs optimizations in SSSE3/AVX2/AVX-512 80- AVX-512 optimizations for high bit-depth itx (16x64, 32x64, 64x16, 64x32, 64x64) 81- AVX2 optimizations for 12bpc for 16x32, 32x16, 32x32 itx 82 83 84Changes for 1.1.0 'Arctic Peregrine Falcon': 85-------------------------------------------- 86 871.1.0 is an important release of dav1d, fixing numerous bugs, and adding SIMD 88 89- New function dav1d_get_frame_delay to query the decoder frame delay 90- Numerous fixes for strict conformity to the specs and samples 91- NEON and AVX-512 misc fixes and improvements 92- Partial AVX2 12bpc transform implementations 93- AVX-512 high bit-depth cdef_filter, loopfilter, itx 94- NEON z1/z3 optimization for 8bpc 95- SSSE3 z1 optimization for 8bpc 96 97 "From VideoLAN with love" 98 99 100Changes for 1.0.0 'Peregrine Falcon': 101------------------------------------- 102 1031.0.0 is a major release of dav1d, adding important features and bug fixes. 104 105It notably changes, in an important way, the way threading works, by adding 106an automatic thread management. 107 108It also adds support for AVX-512 acceleration, and adds speedups to existing x86 109code (from SSE2 to AVX2). 110 1111.0.0 adds new grain API to ease acceleration on the GPU, and adds an API call 112to get information of which frame failed to decode, in error cases. 113 114Finally, 1.0.0 fixes numerous small bugs that were reported since the beginning 115of the project to have a proper release. 116 117 .''. 118 .''. . *''* :_\/_: . 119 :_\/_: _\(/_ .:.*_\/_* : /\ : .'.:.'. 120 .''.: /\ : ./)\ ':'* /\ * : '..'. -=:o:=- 121 :_\/_:'.:::. ' *''* * '.\'/.' _\(/_'.':'.' 122 : /\ : ::::: *_\/_* -= o =- /)\ ' * 123 '..' ':::' * /\ * .'/.\'. ' 124 * *..* : 125 * : 126 * 1.0.0 127 128 129 130Changes for 0.9.2 'Golden Eagle': 131--------------------------------- 132 1330.9.2 is a small update of dav1d on the 0.9.x branch: 134 - x86: SSE4 optimizations of inverse transforms for 10bit for all sizes 135 - x86: mc.resize optimizations with AVX2/SSSE3 for 10/12b 136 - x86: SSSE3 optimizations for cdef_filter in 10/12b and mc_w_mask_422/444 in 8b 137 - ARM NEON optimizations for FilmGrain Gen_grain functions 138 - Optimizations for splat_mv in SSE2/AVX2 and NEON 139 - x86: SGR improvements for SSSE3 CPUs 140 - x86: AVX2 optimizations for cfl_ac 141 142 143Changes for 0.9.1 'Golden Eagle': 144--------------------------------- 145 1460.9.1 is a middle-size revision of dav1d, adding notably 10b acceleration for SSSE3: 147 - 10/12b SSSE3 optimizations for mc (avg, w_avg, mask, w_mask, emu_edge), 148 prep/put_bilin, prep/put_8tap, ipred (dc/h/v, paeth, smooth, pal, filter), wiener, 149 sgr (10b), warp8x8, deblock, film_grain, cfl_ac/pred for 32bit and 64bit x86 processors 150 - Film grain NEON for fguv 10/12b, fgy/fguv 8b and fgy/fguv 10/12 arm32 151 - Fixes for filmgrain on ARM 152 - itx 10bit optimizations for 4x4/x8/x16, 8x4/x8/x16 for SSE4 153 - Misc improvements on SSE2, SSE4 154 155 156Changes for 0.9.0 'Golden Eagle': 157--------------------------------- 158 1590.9.0 is a major version of dav1d, adding notably 10b acceleration on x64. 160 161Details: 162 - x86 (64bit) AVX2 implementation of most 10b/12b functions, which should provide 163 a large boost for high-bitdepth decoding on modern x86 computers and servers. 164 - ARM64 neon implementation of FilmGrain (4:2:0/4:2:2/4:4:4 8bit) 165 - New API to signal events happening during the decoding process 166 167 168Changes for 0.8.2 'Eurasian Hobby': 169----------------------------------- 170 1710.8.2 is a middle-size update of the 0.8.0 branch: 172 - ARM32 optimizations for ipred and itx in 10/12bits, 173 completing the 10b/12b work on ARM64 and ARM32 174 - Give the post-filters their own threads 175 - ARM64: rewrite the wiener functions 176 - Speed up coefficient decoding, 0.5%-3% global decoding gain 177 - x86 optimizations for CDEF_filter and wiener in 10/12bit 178 - x86: rewrite the SGR AVX2 asm 179 - x86: improve msac speed on SSE2+ machines 180 - ARM32: improve speed of ipred and warp 181 - ARM64: improve speed of ipred, cdef_dir, cdef_filter, warp_motion and itx16 182 - ARM32/64: improve speed of looprestoration 183 - Add seeking, pausing to the player 184 - Update the player for rendering of 10b/12b 185 - Misc speed improvements and fixes on all platforms 186 - Add a xxh3 muxer in the dav1d application 187 188 189Changes for 0.8.1 'Eurasian Hobby': 190----------------------------------- 191 1920.8.1 is a minor update on 0.8.0: 193 - Keep references to buffers valid after dav1d_close(). Fixes a regression 194 caused by the picture buffer pool added in 0.8.0. 195 - ARM32 optimizations for 10bit bitdepth for SGR 196 - ARM32 optimizations for 16bit bitdepth for blend/w_masl/emu_edge 197 - ARM64 optimizations for 10bit bitdepth for SGR 198 - x86 optimizations for wiener in SSE2/SSSE3/AVX2 199 200 201Changes for 0.8.0 'Eurasian Hobby': 202----------------------------------- 203 2040.8.0 is a major update for dav1d: 205 - Improve the performance by using a picture buffer pool; 206 The improvements can reach 10% on some cases on Windows. 207 - Support for Apple ARM Silicon 208 - ARM32 optimizations for 8bit bitdepth for ipred paeth, smooth, cfl 209 - ARM32 optimizations for 10/12/16bit bitdepth for mc_avg/mask/w_avg, 210 put/prep 8tap/bilin, wiener and CDEF filters 211 - ARM64 optimizations for cfl_ac 444 for all bitdepths 212 - x86 optimizations for MC 8-tap, mc_scaled in AVX2 213 - x86 optimizations for CDEF in SSE and {put/prep}_{8tap/bilin} in SSSE3 214 215 216Changes for 0.7.1 'Frigatebird': 217------------------------------ 218 2190.7.1 is a minor update on 0.7.0: 220 - ARM32 NEON optimizations for itxfm, which can give up to 28% speedup, and MSAC 221 - SSE2 optimizations for prep_bilin and prep_8tap 222 - AVX2 optimizations for MC scaled 223 - Fix a clamping issue in motion vector projection 224 - Fix an issue on some specific Haswell CPU on ipred_z AVX2 functions 225 - Improvements on the dav1dplay utility player to support resizing 226 227 228Changes for 0.7.0 'Frigatebird': 229------------------------------ 230 2310.7.0 is a major release for dav1d: 232 - Faster refmv implementation gaining up to 12% speed while -25% of RAM (Single Thread) 233 - 10b/12b ARM64 optimizations are mostly complete: 234 - ipred (paeth, smooth, dc, pal, filter, cfl) 235 - itxfm (only 10b) 236 - AVX2/SSSE3 for non-4:2:0 film grain and for mc.resize 237 - AVX2 for cfl4:4:4 238 - AVX-512 CDEF filter 239 - ARM64 8b improvements for cfl_ac and itxfm 240 - ARM64 implementation for emu_edge in 8b/10b/12b 241 - ARM32 implementation for emu_edge in 8b 242 - Improvements on the dav1dplay utility player to support 10 bit, 243 non-4:2:0 pixel formats and film grain on the GPU 244 245 246Changes for 0.6.0 'Gyrfalcon': 247------------------------------ 248 2490.6.0 is a major release for dav1d: 250 - New ARM64 optimizations for the 10/12bit depth: 251 - mc_avg, mc_w_avg, mc_mask 252 - mc_put/mc_prep 8tap/bilin 253 - mc_warp_8x8 254 - mc_w_mask 255 - mc_blend 256 - wiener 257 - SGR 258 - loopfilter 259 - cdef 260 - New AVX-512 optimizations for prep_bilin, prep_8tap, cdef_filter, mc_avg/w_avg/mask 261 - New SSSE3 optimizations for film grain 262 - New AVX2 optimizations for msac_adapt16 263 - Fix rare mismatches against the reference decoder, notably because of clipping 264 - Improvements on ARM64 on msac, cdef, mc_blend_v and looprestoration optimizations 265 - Improvements on AVX2 optimizations for cdef_filter 266 - Improvements in the C version for itxfm, cdef_filter 267 268 269Changes for 0.5.2 'Asiatic Cheetah': 270------------------------------------ 271 2720.5.2 is a small release improving speed for ARM32 and adding minor features: 273 - ARM32 optimizations for loopfilter, ipred_dc|h|v 274 - Add section-5 raw OBU demuxer 275 - Improve the speed by reducing the L2 cache collisions 276 - Fix minor issues 277 278 279Changes for 0.5.1 'Asiatic Cheetah': 280------------------------------------ 281 2820.5.1 is a small release improving speeds and fixing minor issues 283compared to 0.5.0: 284 - SSE2 optimizations for CDEF, wiener and warp_affine 285 - NEON optimizations for SGR on ARM32 286 - Fix mismatch issue in x86 asm in inverse identity transforms 287 - Fix build issue in ARM64 assembly if debug info was enabled 288 - Add a workaround for Xcode 11 -fstack-check bug 289 290 291Changes for 0.5.0 'Asiatic Cheetah': 292------------------------------------ 293 2940.5.0 is a medium release fixing regressions and minor issues, 295and improving speed significantly: 296 - Export ITU T.35 metadata 297 - Speed improvements on blend_ on ARM 298 - Speed improvements on decode_coef and MSAC 299 - NEON optimizations for blend*, w_mask_, ipred functions for ARM64 300 - NEON optimizations for CDEF and warp on ARM32 301 - SSE2 optimizations for MSAC hi_tok decoding 302 - SSSE3 optimizations for deblocking loopfilters and warp_affine 303 - AVX2 optimizations for film grain and ipred_z2 304 - SSE4 optimizations for warp_affine 305 - VSX optimizations for wiener 306 - Fix inverse transform overflows in x86 and NEON asm 307 - Fix integer overflows with large frames 308 - Improve film grain generation to match reference code 309 - Improve compatibility with older binutils for ARM 310 - More advanced Player example in tools 311 312 313Changes for 0.4.0 'Cheetah': 314---------------------------- 315 316 - Fix playback with unknown OBUs 317 - Add an option to limit the maximum frame size 318 - SSE2 and ARM64 optimizations for MSAC 319 - Improve speed on 32bits systems 320 - Optimization in obmc blend 321 - Reduce RAM usage significantly 322 - The initial PPC SIMD code, cdef_filter 323 - NEON optimizations for blend functions on ARM 324 - NEON optimizations for w_mask functions on ARM 325 - NEON optimizations for inverse transforms on ARM64 326 - VSX optimizations for CDEF filter 327 - Improve handling of malloc failures 328 - Simple Player example in tools 329 330 331Changes for 0.3.1 'Sailfish': 332------------------------------ 333 334 - Fix a buffer overflow in frame-threading mode on SSSE3 CPUs 335 - Reduce binary size, notably on Windows 336 - SSSE3 optimizations for ipred_filter 337 - ARM optimizations for MSAC 338 339 340Changes for 0.3.0 'Sailfish': 341------------------------------ 342 343This is the final release for the numerous speed improvements of 0.3.0-rc. 344It mostly: 345 - Fixes an annoying crash on SSSE3 that happened in the itx functions 346 347 348Changes for 0.2.2 (0.3.0-rc) 'Antelope': 349----------------------------- 350 351 - Large improvement on MSAC decoding with SSE, bringing 4-6% speed increase 352 The impact is important on SSSE3, SSE4 and AVX2 cpus 353 - SSSE3 optimizations for all blocks size in itx 354 - SSSE3 optimizations for ipred_paeth and ipred_cfl (420, 422 and 444) 355 - Speed improvements on CDEF for SSE4 CPUs 356 - NEON optimizations for SGR and loop filter 357 - Minor crashes, improvements and build changes 358 359 360Changes for 0.2.1 'Antelope': 361---------------------------- 362 363 - SSSE3 optimization for cdef_dir 364 - AVX2 improvements of the existing CDEF optimizations 365 - NEON improvements of the existing CDEF and wiener optimizations 366 - Clarification about the numbering/versionning scheme 367 368 369Changes for 0.2.0 'Antelope': 370---------------------------- 371 372 - ARM64 and ARM optimizations using NEON instructions 373 - SSSE3 optimizations for both 32 and 64bits 374 - More AVX2 assembly, reaching almost completion 375 - Fix installation of includes 376 - Rewrite inverse transforms to avoid overflows 377 - Snap packaging for Linux 378 - Updated API (ABI and API break) 379 - Fixes for un-decodable samples 380 381 382Changes for 0.1.0 'Gazelle': 383---------------------------- 384 385Initial release of dav1d, the fast and small AV1 decoder. 386 - Support for all features of the AV1 bitstream 387 - Support for all bitdepth, 8, 10 and 12bits 388 - Support for all chroma subsamplings 4:2:0, 4:2:2, 4:4:4 *and* grayscale 389 - Full acceleration for AVX2 64bits processors, making it the fastest decoder 390 - Partial acceleration for SSSE3 processors 391 - Partial acceleration for NEON processors 392