1# 2.x series change log 2 3This page summarizes the major functional and performance changes in each 4release of the 2.x series. 5 6All performance data on this page is measured on an Intel Core i5-9600K 7clocked at 4.2 GHz, running astcenc using 6 threads. 8 9<!-- ---------------------------------------------------------------------- --> 10## 2.5 11 12**Status:** Released, March 2021 13 14The 2.5 release is the last major release in the 2.x series. After this release 15a `2.x` branch will provide stable long-term support, and the `main` branch 16will switch to focusing on more radical changes for the 3.x series. 17 18Reminder for users of the library interface - the API is not designed to be 19stable across versions, and this release is not compatible with earlier 2.x 20releases. Please update and rebuild your client-side code using the updated 21`astcenc.h` header. 22 23**General:** 24 * **Feature:** The `ISA_INVARIANCE` build option is no longer supported, as 25 there is no longer any performance benefit from the variant paths. All 26 builds are now using the equivalent of the `ISA_INVARIANCE=ON` setting, and 27 all builds (except Armv7) are now believed to be invariant across operating 28 systems, compilers, CPU architectures, and SIMD instruction sets. 29 * **Feature:** Armv8 32-bit builds with NEON are now supported, with 30 out-of-the-box support for Arm Linux soft-float and hard-float ABIs. There 31 are no pre-built binaries for these targets; support is included for 32 library users targeting older 32-bit Android and iOS devices. 33 * **Feature:** A compressor mode for encoding HDR textures that have been 34 encoded into LDR RGBM wrapper format is now supported. Note that this 35 encoding has some strong recommendations for how the RGBM encoding is 36 implemented to avoid block artifacts in the compressed image. 37* **Core API:** 38 * **API Change:** The core API has been changed to be a pure C API, making it 39 easier to wrap the codec in a stable shared library ABI. Some entry points 40 that used to accept references now expect pointers. 41 * **API Change:** The decompression functionality in the core API has been 42 changed to allow use of multiple threads. The design pattern matches the 43 compression functionality, requiring the caller to create the threads, 44 synchronize them between images, and to call the new 45 `astcenc_decompress_reset()` function between images. 46* **API Feature:** Defines to support exporting public API entry point 47 symbols from a shared object are provided, but not exposed off-the-shelf by 48 the CMake provided by the project. 49 * **API Feature:** New `astcenc_get_block_info()` function added to the core 50 API to allow users to perform high level analysis of compressed data. This 51 API is not implemented in decompressor-only builds. 52 * **API Feature:** Codec configuration structure has been extended to expose 53 the new RGBM compression mode. See the API header for details. 54 55<!-- ---------------------------------------------------------------------- --> 56## 2.4 57 58**Status:** Released, February 2021 59 60The 2.4 release is the fifth release in the 2.x series. It is primarily a bug 61fix release for HDR image handling, which impacts all earlier 2.x series 62releases. 63 64**General:** 65 * **Feature:** When using the `-a` option, or the equivalent config option 66 for the API, any 2D blocks that are entirely zero alpha after the alpha 67 filter radius is taken into account are replaced by transparent black 68 constant color blocks. This is an RDO-like technique to improve compression 69 ratios of any additional application packaging compression that is applied. 70**Command Line:** 71 * **Bug fix:** The command line wrapper now correctly loads HDR images that 72 have a non-square aspect ratio. 73 74<!-- ---------------------------------------------------------------------- --> 75## 2.3 76 77**Status:** Released, January 2021 78 79The 2.3 release is the fourth release in the 2.x series. It includes a number 80of performance improvements and new features. 81 82Reminder for users of the library interface - the API is not designed to be 83stable across versions, and this release is not compatible with 2.2. Please 84recompile your client-side code using the updated `astcenc.h` header. 85 86* **General:** 87 * **Feature:** Decompressor-only builds of the codec are supported again. 88 While this is primarily a feature for library users who want to shrink 89 binary size, a variant command line tool `astcdec` can be built by 90 specifying `DECOMPRESSOR=ON` on the CMake configure command line. 91 * **Feature:** Diagnostic builds of the codec can now be built. These builds 92 generate a JSON file containing a trace of the compressor execution. 93 Diagnostic builds are only suitable for codec development; they are slower 94 and JSON generation cannot be disabled. Build by setting `DIAGNOSTICS=ON` 95 on the CMake configure command line. 96 * **Feature:** Code compatibility improved with older versions of GCC, 97 earliest compiler now tested is GCC 7.5 (was GCC 9.3). 98 * **Feature:** Code compatibility improved with newer versions of LLVM, 99 latest compiler now tested is Clang 12.0 (was Clang 9.0). 100 * **Feature:** Code compatibility improved with the Visual Studio 2019 LLVM 101 toolset (`clang-cl`). Using the LLVM toolset gives 25% performance 102 improvements and is recommended. 103* **Command Line:** 104 * **Feature:** Quality level now accepts either a preset (`-fast`, etc) or a 105 float value between 0 and 100, allowing more control over the compression 106 quality vs performance trade-off. The presets are not evenly spaced in the 107 float range; they have been spaced to give the best distribution of points 108 between the fast and thorough presets. 109 * `-fastest`: 0.0 110 * `-fast`: 10.0 111 * `-medium`: 60.0 112 * `-thorough`: 98.0 113 * `-exhaustive`: 100.0 114* **Core API:** 115 * **API Change:** Quality level preset enum replaced with a float value 116 between 0 (`-fastest`) and 100 (`-exhaustive`). See above for more info. 117 118### Performance 119 120This release includes a number of optimizations to improve performance. 121 122* New compressor algorithm for handling encoding candidates and refinement. 123* Vectorized implementation of `compute_error_of_weight_set()`. 124* Unrolled implementation of `encode_ise()`. 125* Many other small improvements! 126 127The most significant change is the change to the compressor path, which now 128uses an adaptive approach to candidate trials and block refinement. 129 130In earlier releases the quality level will determine the number of encoding 131candidates and the number of iterative refinement passes that are used for each 132major encoding trial. This is a fixed behavior; it will always try the full N 133candidates and M refinement iterations specified by the quality level for each 134encoding trial. 135 136The new approach implements two optimizations for this: 137 138* Compression will complete when a block candidate hits the specified target 139 quality, after its M refinement iterations have been applied. Later block 140 candidates are simply abandoned. 141* Block candidates will predict how much refinement can improve them, and 142 abandon refinement if they are unlikely to improve upon the best known 143 encoding already in-hand. 144 145This pair of optimizations provides significant performance improvement to the 146high quality modes which use the most block candidates and refinement 147iterations. A minor loss of image quality is expected, as the blocks we no 148longer test or refine may have been better coding choices. 149 150**Absolute performance vs 2.2 release:** 151 152![Absolute scores 2.3 vs 2.2](./ChangeLogImg/absolute-2.2-to-2.3.png) 153 154**Relative performance vs 2.2 release:** 155 156![Relative scores 2.3 vs 2.2](./ChangeLogImg/relative-2.2-to-2.3.png) 157 158<!-- ---------------------------------------------------------------------- --> 159## 2.2 160 161**Status:** Released, January 2021 162 163The 2.2 release is the third release in the 2.x series. It includes a number 164of performance improvements and new features. 165 166Reminder for users of the library interface - the API is not designed to be 167stable across versions, and this release is not compatible with 2.1. Please 168recompile your client-side code using the updated `astcenc.h` header. 169 170* **General:** 171 * **Feature:** New Arm aarch64 NEON accelerated vector library support. 172 * **Improvement:** New CMake build system for all platforms. 173 * **Improvement:** SSE4.2 feature profile changed to SSE4.1, which more 174 accurately reflects the feature set used. 175* **Binary releases:** 176 * **Improvement:** Linux binaries changed to use Clang 9.0, which gives 177 up to 15% performance improvement. 178 * **Improvement:** Windows binaries are now code signed. 179 * **Improvement:** macOS binaries for Apple silicon platforms now provided. 180 * **Improvement:** macOS binaries are now code signed and notarized. 181* **Command Line:** 182 * **Feature:** New image preprocess `-pp-normalize` option added. This forces 183 normal vectors to be unit length, which is useful when compressing source 184 textures that use normal length to encode an NDF, which is incompatible 185 with ASTC's two channel encoding. 186 * **Feature:** New image preprocess `-pp-premultiply` option added. This 187 scales RGB values by the alpha value. This can be useful to minimize 188 cross-channel color bleed caused by GPU post-multiply filtering/blending. 189 * **Improvements:** Command line tool cleanly traps and reports errors for 190 corrupt input images rather than relying on standard library `assert()` 191 calls in release builds. 192* **Core API:** 193 * **API Change:** Images using region-based metrics no longer need to include 194 padding; all input images should be tightly packed and `dim_pad` is removed 195 from the `astcenc_image` structure. This makes it easier to directly use 196 images loaded from other libraries. 197 * **API Change:** Image `data` is no longer a 3D array accessed using 198 `data[z][y][x]` indexing, it's an array of 2D slices. This makes it easier 199 to directly use images loaded from other libraries. 200 * **API Change:** New `ASTCENC_FLG_SELF_DECOMPRESS_ONLY` flag added to the 201 codec config. Using this flag enables additional optimizations that 202 aggressively exploit implementation- and configuration-specific, behavior 203 to gain performance. When using this flag the codec can only reliably 204 decompress images that were compressed in the same context session. Images 205 produced via other means may fail to decompress correctly, even if they are 206 otherwise valid ASTC files. 207 208### Performance 209 210There is one major set of optimizations in this release, related to the new 211`ASTCENC_FLG_SELF_DECOMPRESS_ONLY` mode. These allow the compressor to only 212create data tables it knows that it is going to use, based on its current set 213of heuristics, rather than needing the full set the format allows. 214 215The first benefit of these changes is a reduced context creation time, which 216can be reduced by up to 250ms on our test machine. This is a significant 217percentage of the command line utility runtime for a small image when using a 218quick search preset. Compressing the whole Kodak test suite using the command 219line utility and the `-fastest` preset is ~30% faster with this release, which 220is mostly due to faster startup. 221 222The reduction in the data table size in this mode also improve the core codec 223speed. Our test sets show an average of 12% improvement in the codec for 224`-fastest` mode, and an average of 3% for `-medium` mode. 225 226Key for performance charts: 227 228* Color = block size (see legend). 229* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR). 230 231**Absolute performance vs 2.1 release:** 232 233![Absolute scores 2.2 vs 2.1](./ChangeLogImg/absolute-2.1-to-2.2.png) 234 235**Relative performance vs 2.1 release:** 236 237![Relative scores 2.2 vs 2.1](./ChangeLogImg/relative-2.1-to-2.2.png) 238 239 240<!-- ---------------------------------------------------------------------- --> 241## 2.1 242 243**Status:** Released, November 2020 244 245The 2.1 release is the second release in the 2.x series. It includes a number 246of performance optimizations and new features. 247 248Reminder for users of the library interface - the API is not designed to be 249stable across versions, and this release is not compatible with 2.0. Please 250recompile your client-side code using the updated `astcenc.h` header. 251 252### Features: 253 254* **Command line:** 255 * **Bug fix:** The meaning of the `-tH\cH\dH` and `-th\ch\dh` compression 256 modes was inverted. They now match the documentation; use `-*H` for HDR 257 RGBA, and `-*h` for HDR RGB with LDR alpha. 258 * **Feature:** A new `-fastest` quality preset is now available. This is 259 designed for fast "roughing out" of new content, and sacrifices significant 260 image quality compared to `-fast`. We do not recommend its use for 261 production builds. 262 * **Feature:** A new `-candidatelimit` compression tuning option is now 263 available. This is a power-user control to determine how many candidates 264 are returned for each block mode encoding trial. This feature is used 265 automatically by the search presets; see `-help` for details. 266 * **Improvement:** The compression test modes (`-tl\ts\th\tH`) now emit a 267 MTex/s performance metric, in addition to coding time. 268* **Core API:** 269 * **Feature:** A new quality preset `ASTCENC_PRE_FASTEST` is available. See 270 `-fastest` above for details. 271 * **Feature:** A new tuning option `tune_candidate_limit` is available in 272 the config structure. See `-candidatelimit` above for details. 273 * **Feature:** Image input/output can now use `ASTCENC_TYPE_F32` data types. 274* **Stability:** 275 * **Feature:** The SSE2, SSE4.2, and AVX2 variants now produce identical 276 compressed output when run on the same CPU when compiled with the 277 preprocessor define `ASTCENC_ISA_INVARIANCE=1`. For Make builds this can 278 be set on the command line by setting `ISA_INV=1`. ISA invariance is off 279 by default; it reduces performance by 1-3%. 280 281### Performance 282 283Key for performance charts: 284 285* Color = block size (see legend). 286* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR). 287 288**Absolute performance vs 2.0 release:** 289 290![Absolute scores 2.1 vs 2.0](./ChangeLogImg/absolute-2.0-to-2.1.png) 291 292**Relative performance vs 2.0 release:** 293 294![Relative scores 2.1 vs 2.0](./ChangeLogImg/relative-2.0-to-2.1.png) 295 296 297<!-- ---------------------------------------------------------------------- --> 298## 2.0 299 300**Status:** Released, August 2020 301 302The 2.0 release is first release in the 2.x series. It includes a number of 303major changes over the earlier 1.7 series, and is not command-line compatible. 304 305### Features: 306 307* The core codec can be built as a library, exposed via a new codec API. 308* The core codec supports accelerated SIMD paths for SSE2, SSE4.2, and AVX2. 309* The command line syntax has a clearer mapping to Khronos feature profiles. 310 311### Performance: 312 313Key for performance charts 314 315* Color = block size (see legend). 316* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR). 317 318**Absolute performance vs 1.7 release:** 319 320![Absolute scores 2.0 vs 1.7](./ChangeLogImg/absolute-1.7-to-2.0.png) 321 322**Relative performance vs 1.7 release:** 323 324![Relative scores 2.0 vs 1.7](./ChangeLogImg/relative-1.7-to-2.0.png) 325 326- - - 327 328_Copyright © 2020-2022, Arm Limited and contributors. All rights reserved._ 329