1# 3.x series change log 2 3This page summarizes the major functional and performance changes in each 4release of the 3.x series. 5 6All performance data on this page is measured on an Intel Core i5-9600K 7clocked at 4.2 GHz, running `astcenc` using AVX2 and 6 threads. 8 9<!-- ---------------------------------------------------------------------- --> 10## 3.7 11 12**Status:** April 2022 13 14The 3.7 release contains another round of performance optimizations, including 15significant improvements to the command line front-end (faster PNG loader) and 16the arm64 build of the codec (faster NEON implementation). 17 18* **General:** 19 * **Feature:** The command line tool PNG loader has been switched to use 20 the Wuffs library, which is robust and significantly faster than the 21 current stb_image implementation. 22 * **Feature:** Support for non-invariant builds returns. Opt-in to slightly 23 faster, but not bit-exact, builds by setting `-DNO_INVARIANCE=ON` for the 24 CMake configuration. This improves performance by around 2%. 25 * **Optimization:** Changed SIMD `select()` so that it matches the default 26 NEON behavior (bitwise select), rather than the default x86-64 behavior 27 (lane select on MSB). Specialization `select_msb()` added for the one case 28 we want to select on a sign-bit, where NEON needs a different 29 implementation. This provides a significant (>25%) performance uplift on 30 NEON implementations. 31 32### Performance: 33 34Key for charts: 35 36* Color = block size (see legend). 37* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR). 38 39**Relative performance vs 3.5 release:** 40 41 42 43<!-- ---------------------------------------------------------------------- --> 44## 3.6 45 46**Status:** April 2022 47 48The 3.6 release contains another round of performance optimizations. 49 50There are no interface changes in this release, but in general the API is not 51designed to be binary compatible across versions. We always recommend 52rebuilding your client-side code using the updated `astcenc.h` header. 53 54* **General:** 55 * **Feature:** Data tables are now optimized for contexts without the 56 `SELF_DECOMPRESS_ONLY` flag set. The flag therefore no longer improves 57 compression performance, but still reduces context creation time and 58 context data table memory footprint. 59 * **Feature:** Image quality for 4x4 `-fastest` configuration has been 60 improved. 61 * **Optimization:** Decimation modes are reliably excluded from processing 62 when they are only partially selected in the compressor configuration (e.g. 63 if used for single plane, but not dual plane modes). This is a significant 64 performance optimization for all quality levels. 65 * **Optimization:** Fast-path block load function variant added for 2D LDR 66 images with no swizzle. This is a moderate performance optimization for the 67 fast and fastest quality levels. 68 69### Performance: 70 71Key for charts: 72 73* Color = block size (see legend). 74* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR). 75 76**Relative performance vs 3.5 release:** 77 78 79 80<!-- ---------------------------------------------------------------------- --> 81## 3.5 82 83**Status:** March 2022 84 85The 3.5 release contains another round of performance optimizations. 86 87There are no interface changes in this release, but in general the API is not 88designed to be binary compatible across versions. We always recommend 89rebuilding your client-side code using the updated `astcenc.h` header. 90 91* **General:** 92 * **Feature:** Compressor configurations using `SELF_DECOMPRESS_ONLY` mode 93 store compacted partition tables, which significantly improves both 94 context create time and runtime performance. 95 * **Feature:** Bilinear infill for decimated weight grids supports a new 96 variant for half-decimated grids which are only decimated in one axis. 97 98### Performance: 99 100Key for charts: 101 102* Color = block size (see legend). 103* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR). 104 105**Relative performance vs 3.4 release:** 106 107 108 109 110<!-- ---------------------------------------------------------------------- --> 111## 3.4 112 113**Status:** February 2022 114 115The 3.4 release introduces another round of optimizations, removing a number 116of power-user configuration options to simplify the core compressor data path. 117 118Reminder for users of the library interface - the API is not designed to be 119binary compatible across versions, and this release is not compatible with 120earlier releases. Please update and rebuild your client-side code using the 121updated `astcenc.h` header. 122 123* **General:** 124 * **Feature:** Many memory allocations have been moved off the stack into 125 dynamically allocated working memory. This significantly reduces the peak 126 stack usage, allowing the compressor to run in systems with 128KB stack 127 limits. 128 * **Feature:** Builds now support `-DBLOCK_MAX_TEXELS=<count>` to allow a 129 compressor to support a subset of block sizes. This can reduce binary size 130 and runtime memory footprint, and improve performance. 131 * **Feature:** The `-v` and `-va` options to set a per-texel error weight 132 function are no longer supported. 133 * **Feature:** The `-b` option to set a per-texel error weight boost for 134 block border texels is no longer supported. 135 * **Feature:** The `-a` option to set a per-texel error weight based on texel 136 alpha value is no longer supported as an error weighting tool, but is still 137 supported for providing sprite-sheet RDO. 138 * **Feature:** The `-mask` option to set an error metric for mask map 139 textures is still supported, but is currently a no-op in the compressor. 140 * **Feature:** The `-perceptual` option to set a perceptual error metric is 141 still supported, but is currently a no-op in the compressor for mask map 142 and normal map textures. 143 * **Bug-fix:** Corrected decompression of error blocks in some cases, so now 144 returning the expected error color (magenta for LDR, NaN for HDR). Note 145 that astcenc determines the error color to use based on the output image 146 data type not the decoder profile. 147* **Binary releases:** 148 * **Improvement:** Windows binaries changed to use ClangCL 12.0, which gives 149 up to 10% performance improvement. 150 151### Performance: 152 153Key for charts: 154 155* Color = block size (see legend). 156* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR). 157 158**Relative performance vs 3.3 release:** 159 160 161 162 163<!-- ---------------------------------------------------------------------- --> 164## 3.3 165 166**Status:** November 2021 167 168The 3.3 release improves image quality for normal maps, and two component 169textures. Normal maps are expected to compress 25% slower than the 3.2 170release, although it should be noted that they are still faster to compress 171in 3.3 than when using the 2.5 series. This release also fixes one reported 172stability issue. 173 174* **General:** 175 * **Feature:** Normal map image quality has been improved. 176 * **Feature:** Two component image quality has been improved, provided 177 that unused components are correctly zero-weighted using e.g. `-cw` on the 178 command line. 179 * **Bug-fix:** Improved stability when trying to compress complex blocks that 180 could not beat even the starting quality threshold. These will now always 181 compress in to a constant color blocks. 182 183<!-- ---------------------------------------------------------------------- --> 184## 3.2 185 186**Status:** August 2021 187 188The 3.2 release is a bugfix release; no significant image quality or 189performance differences are expected. 190 191* **General:** 192 * **Bug-fix:** Improved stability when new contexts were created while other 193 contexts were compressing or decompressing an image. 194 * **Bug-fix:** Improved stability when decompressing blocks with invalid 195 block encodings. 196 197<!-- ---------------------------------------------------------------------- --> 198## 3.1 199 200**Status:** July 2021 201 202The 3.1 release gives another performance boost, typically between 5 and 20% 203faster than the 3.0 release, as well as further incremental improvements to 204image quality. A number of build system improvements make astcenc easier and 205faster to integrate into other projects as a library, including support for 206building universal binaries on macOS. Full change list is shown below. 207 208Reminder for users of the library interface - the API is not designed to be 209binary compatible across versions, and this release is not compatible with 210earlier releases. Please update and rebuild your client-side code using the 211updated `astcenc.h` header. 212 213* **General:** 214 * **Feature:** RGB color data now supports `-perceptual` operation. The 215 current implementation is simple, weighting color channel errors by their 216 contribution to perceived luminance. This mimics the behavior of the human 217 visual system, which is most sensitive to green, then red, then blue. 218 * **Feature:** Codec supports a new low weight search mode, which is a 219 simpler weight assignment for encodings with a low number of weights in the 220 weight grid. The weight threshold can be overridden using the new 221 `-lowweightmodelimit` command line option. 222 * **Feature:** All platform builds now support building a native binary. 223 Native binaries automatically select the SIMD level based on the default 224 configuration of the compiler in use. Native binaries built on one machine 225 may use different SIMD options than native binaries build on another. 226 * **Feature:** macOS platform builds now support building universal binaries 227 containing both `x86_64` and `arm64` target support. 228 * **Feature:** Building the command line can be disabled when using as a 229 library in another project. Set `-DCLI=OFF` during the CMake configure 230 step. 231 * **Feature:** A standalone minimal example of the core codec API usage has 232 been added in the `./Utils/Example/` directory. 233* **Core API:** 234 * **Feature:** Config flag `ASTCENC_FLG_USE_PERCEPTUAL` works for color data. 235 * **Feature:** Config option `tune_low_weight_count_limit` added. 236 * **Feature:** New heuristic added which prunes dual weight plane searches if 237 they are unlikely to help. This heuristic is not user controllable. 238 * **Feature:** Image quality has been improved. In general we see significant 239 improvements (up to 0.2dB) for high bitrate encodings (4x4, 5x4), and a 240 smaller improvement (up to 0.1dB) for lower bitrate encodings. 241 * **Bug fix:** Arm "none" SIMD builds could be invariant with other builds. 242 This fix has also been back-ported to the 2.x LTS branch. 243 244### Performance: 245 246Key for charts: 247 248* Color = block size (see legend). 249* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR). 250 251**Relative performance vs 3.0 release:** 252 253 254 255<!-- ---------------------------------------------------------------------- --> 256## 3.0 257 258**Status:** June 2021 259 260The 3.0 release is the first in a series of updates to the compressor that are 261making more radical changes than we felt we could make with the 2.x series. 262The primary goals of the 3.x series are to keep the image quality ~static or 263better compared to the 2.5 release, but continue to improve performance. 264 265Reminder for users of the library interface - the API is not designed to be 266binary compatible across versions, and this release is not compatible with 267earlier releases. Please update and rebuild your client-side code using the 268updated `astcenc.h` header. 269 270* **General:** 271 * **Feature:** The code has been significantly cleaned up, with improved 272 comments, API documentation, function naming, and variable naming. 273* **Core API:** 274 * **API Change:** The core APIs for `astcenc_compress_image()` and for 275 `astcenc_decompress_image()` now accept swizzle structures by `const` 276 pointer, instead of pass-by-value. 277 * **API Change:** Calling the `astcenc_compress_reset()` and the 278 `astcenc_decompress_reset()` functions between images is no longer required 279 if the context was created for use by a single thread. 280 * **Feature:** New heuristics have been added for controlling when to search 281 beyond 2 partitions and 1 plane, and when to search beyond 3 partitions and 282 1 plane. The previous `tune_partition_early_out_limit` config option has 283 been removed, and replaced with two new options 284 `tune_2_partition_early_out_limit_factor` and 285 `tune_3_partition_early_out_limit_factor`. See command line help for more 286 detailed documentation. 287 * **Feature:** New heuristics have been added for controlling when to use 288 dual weight planes. The previous `tune_two_plane_early_out_limit` has been 289 renamed to`tune_2_plane_early_out_limit_correlation`. See command line help 290 for more detailed documentation. 291 * **Feature:** Support for using dual weight planes has been restricted to 292 single partition blocks; it rarely helps blocks with 2 or more partitions 293 and takes considerable compression search time. 294 295### Performance: 296 297Key for charts: 298 299* Color = block size (see legend). 300* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR). 301 302**Absolute performance vs 2.5 release:** 303 304 305 306**Relative performance vs 2.5 release:** 307 308 309 310- - - 311 312_Copyright © 2021-2022, Arm Limited and contributors. All rights reserved._ 313