• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# 4.x series change log
2
3This page summarizes the major functional and performance changes in each
4release of the 4.x series.
5
6All performance data on this page is measured on an Intel Core i5-9600K
7clocked at 4.2 GHz, running `astcenc` using AVX2 and 6 threads.
8
9<!-- ---------------------------------------------------------------------- -->
10## 4.7.0
11
12**Status:** January 2024
13
14The 4.7.0 release is a major maintenance release, fixing rounding behavior in
15the decompressor to match the Khronos specification. This fix includes the
16addition of explicit support for optimizing for `decode_unorm8` rounding.
17
18Reminder - the codec library API is not designed to be binary compatible across
19versions. We always recommend rebuilding your client-side code using the updated
20`astcenc.h` header.
21
22* **General:**
23  * **Bug fix:** sRGB LDR decompression now uses the correct endpoint expansion
24    method to create the 16-bit RGB endpoint colors, and removes the previous
25    correction code from the interpolation function. This bug could result in
26    LSB bit flips relative to the standard specification.
27  * **Bug fix:** Decompressing to an 8-bit per component output image now matches
28    the `decode_unorm8` extension rounding rules. This bug could result in
29    LSB bit flips relative to the standard specification.
30  * **Bug fix:** Code now avoids using `alignas()` in the reference C
31    implementation, as the  default `alignas(16)` is narrower than the
32    native minimum alignment requirement on some CPUs.
33  * **Feature:** Library configuration supports a new flag,
34    `ASTCENC_FLG_USE_DECODE_UNORM8`. This flag indicates that the image will be
35    used with the `decode_unorm8` decode mode. When set during compression
36    this allows the compressor to use the correct rounding when determining the
37    best encoding.
38  * **Feature:** Command line tool supports a new option, `-decode_unorm8`.
39    This option indicates that the image will be used with the `decode_unorm8`
40    decode mode. This option will automatically be set for decompression
41    (`-d*`) and trial (`-t*`) tool operation if the decompressed output image
42    is stored to an 8-bit per component file format. This option must be set
43    manually for compression (`-c*`) tool operation, as the desired decode mode
44    cannot be reliably determined.
45  * **Feature:** Library configuration supports a new optional progress
46    reporting callback to be specified. This is called during compression to
47    to allow interactive tooling use cases to display incremental progress. The
48    command line tool uses this feature to show compression progress unless
49    `-silent` is used.
50
51<!-- ---------------------------------------------------------------------- -->
52## 4.6.1
53
54**Status:** November 2023
55
56The 4.6.1 release is a minor maintenance release to fix a scaling bug on
57large core count Windows systems.
58
59* **General:**
60  * **Optimization:** Windows builds of the `astcenc` command line tool can now
61    use more than 64 cores on large core count systems. This change doubled
62    command line performance for `-exhaustive` compression when testing on an
63    96 core/192 thread system.
64  * **Feature:** Windows Arm64 native builds of the `astcenc` command line tool
65    are now included in the prebuilt release binaries.
66
67<!-- ---------------------------------------------------------------------- -->
68## 4.6.0
69
70**Status:** November 2023
71
72The 4.6.0 release retunes the compressor heuristics to give improvements to
73performance for trivial losses to image quality. It also includes some minor
74bug fixes and code quality improvements.
75
76Reminder - the codec library API is not designed to be binary compatible across
77versions. We always recommend rebuilding your client-side code using the updated
78`astcenc.h` header.
79
80* **General:**
81  * **Bug-fix:** Fixed context allocation for contexts allocated with the
82    `ASTCENC_FLG_DECOMPRESS_ONLY` flag.
83  * **Bug-fix:** Reduced use of `reinterpret_cast` in the core codec to
84    avoid strict aliasing violations.
85  * **Optimization:** `-medium` search quality no longer tests 4 partition
86     encodings for block sizes between 25 and 83 texels (inclusive). This
87     improves performance for a tiny drop in image quality.
88  * **Optimization:** `-thorough` and higher search qualities no longer test the
89     mode0 first search for block sizes between 25 and 83 texels (inclusive).
90     This improves performance for a tiny drop in image quality.
91  * **Optimization:** `TUNE_MAX_PARTITIONING_CANDIDATES` reduced from 32 to 8
92     to reduce the size of stack allocated data structures. This causes a tiny
93     drop in image quality for the `-verythorough` and `-exhaustive` presets.
94
95<!-- ---------------------------------------------------------------------- -->
96## 4.5.0
97
98**Status:** June 2023
99
100The 4.5.0 release is a maintenance release with small image quality
101improvements, and a number of build system quality of life improvements.
102
103* **General:**
104  * **Bug-fix:** Improved handling compiler arguments in CMake, including
105    consistent use of MSVC-style command line arguments for ClangCL.
106  * **Bug-fix:** Invariant Clang builds now use `-ffp-model=precise` with
107    `-ffp-contract=off` which is needed to restore invariance due to recent
108    changes in compiler defaults.
109  * **Change:** macOS binary releases are now distributed as a single universal
110    binary for all platforms.
111  * **Change:** Windows binary releases are now compiled with VS2022.
112  * **Change:** Invariant MSVC builds for VS2022 now use `/fp:precise` instead
113    of `/fp:strict`, which is is now possible because precise no longer implies
114    contraction. This should improve performance for MSVC builds.
115  * **Change:** Non-invariant Clang builds now use `-ffp-model=precise` with
116    `-ffp-contract=on`. This should improve performance on older Clang
117    versions which defaulted to no contraction.
118  * **Change:** Non-invariant MSVC builds for VS2022 now use `/fp:precise`
119    with `/fp:contract`. This should improve performance for MSVC builds.
120  * **Change:** CMake config variables now use an `ASTCENC_` prefix to add a
121    namespace and group options when the library is used in a larger project.
122  * **Change:** CMake config `ASTCENC_UNIVERSAL_BUILD` for building macOS
123    universal binaries has been improved to include the `x86_64h` slice for
124    AVX2 builds. Universal builds are now on by default for macOS, and always
125    include NEON (arm64), SSE4.1 (x86_64), and AVX2 (x86_64h) variants.
126  * **Change:** CMake config `ASTCENC_NO_INVARIANCE` has been inverted to
127    remove the negated option, and is now `ASTCENC_INVARIANCE` with a default
128    of `ON`. Disabling this option can substantially improve performance, but
129    images can different across platforms and compilers.
130  * **Optimization:** Color quantization and packing for LDR RGB and RGBA has
131    been vectorized to improve performance.
132  * **Change:** Color quantization for LDR RGB and RGBA endpoints will now try
133    multiple quantization packing methods, and pick the one with the lowest
134    endpoint encoding error. This gives a minor image quality improvement, for
135    no significant performance impact when combined with the vectorization
136    optimizations.
137
138<!-- ---------------------------------------------------------------------- -->
139## 4.4.0
140
141**Status:** March 2023
142
143The 4.4.0 release is a minor release with image quality improvements, a small
144performance boost, and a few new quality-of-life features.
145
146* **General:**
147  * **Change:** Core library no longer checks availability of required
148    instruction set extensions, such as SSE4.1 or AVX2. Checking compatibility
149    is now the responsibility of the caller. See `astcenccli_entry.cpp` for
150    an example of code performing this check.
151  * **Change:** Core library can be built as a shared object by setting the
152    `-DSHAREDLIB=ON` CMake option, resulting in e.g. `libastcenc-avx2-shared.so`.
153    Note that the command line tool is always statically linked.
154  * **Change:** Decompressed 3D images will now write one output file per
155    slice, if the target format is a 2D image format.
156  * **Change:** Command line errors print to stderr instead of stdout.
157  * **Change:** Color encoding uses new quantization tables, that now factor
158    in floating-point rounding if a distance tie is found when using the
159    integer quant256 value. This improves image quality for 4x4 and 5x5 block
160    sizes.
161  * **Optimization:** Partition selection uses a simplified line calculation
162    with a faster approximation. This improves performance for all block sizes.
163  * **Bug-fix:** Fixed missing symbol error in decompressor-only builds.
164  * **Bug-fix:** Fixed infinity handling in debug trace JSON files.
165
166### Performance:
167
168Key for charts:
169
170* Color = block size (see legend).
171* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
172
173**Relative performance vs 4.3 release:**
174
175![Relative scores 4.4 vs 4.3](./ChangeLogImg/relative-4.3-to-4.4.png)
176
177<!-- ---------------------------------------------------------------------- -->
178## 4.3.1
179
180**Status:** January 2023
181
182The 4.3.1 release is a minor maintenance release. No performance or image
183quality changes are expected.
184
185* **General:**
186  * **Bug-fix:** Fixed typo in `-2/3/4partitioncandidatelimit` CLI options.
187  * **Bug-fix:** Fixed handling for `-3/4partitionindexlimit` CLI options.
188  * **Bug-fix:** Updated to `stb_image.h` v2.28, which includes multiple fixes
189    and improvements for image loading.
190
191<!-- ---------------------------------------------------------------------- -->
192## 4.3.0
193
194**Status:** January 2023
195
196The 4.3.0 release is an optimization release. There are minor performance
197and image quality improvements in this release.
198
199Reminder - the codec library API is not designed to be binary compatible across
200versions. We always recommend rebuilding your client-side code using the updated
201`astcenc.h` header.
202
203* **General:**
204  * **Bug-fix:** Use lower case `windows.h` include for MinGW compatibility.
205  * **Change:** The `-mask` command line option, `ASTCENC_FLG_MAP_MASK` in the
206    library API, has been removed.
207  * **Optimization:** Always skip blue-contraction for `QUANT_256` encodings.
208    This gives a small image quality improvement for the 4x4 block size.
209  * **Optimization:** Always skip RGBO vector calculation for LDR encodings.
210  * **Optimization:** Defer color packing and scrambling to physical layer.
211  * **Optimization:** Remove folded `decimation_info` lookup tables. This
212    significantly reduces compressor memory footprint and improves context
213    creation time. Impact increases with the active block size.
214  * **Optimization:** Increased trial and refinement pruning by using stricter
215    target errors when determining whether to skip iterations.
216
217### Performance:
218
219Key for charts:
220
221* Color = block size (see legend).
222* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
223
224**Relative performance vs 4.2 release:**
225
226![Relative scores 4.3 vs 4.2](./ChangeLogImg/relative-4.2-to-4.3.png)
227
228
229<!-- ---------------------------------------------------------------------- -->
230## 4.2.0
231
232**Status:** November 2022
233
234The 4.2.0 release is an optimization release. There are significant performance
235improvements, minor image quality improvements, and library interface changes in
236this release.
237
238Reminder - the codec library API is not designed to be binary compatible across
239versions. We always recommend rebuilding your client-side code using the updated
240`astcenc.h` header.
241
242* **General:**
243  * **Bug-fix:** Compression for RGB and RGBA base+offset encodings no
244    longer generate endpoints with the incorrect blue-contract behavior.
245  * **Bug-fix:** Lowest channel correlation calculation now correctly ignores
246    constant color channels for the purposes of filtering 2 plane encodings.
247    On average this improves both performance and image quality.
248  * **Bug-fix:** ISA compatibility now checked in `config_init()` as well as
249    in `context_alloc()`.
250  * **Change:** Removed the low-weight count optimization, as more recent
251    changes had significantly reduced its performance benefit. Option removed
252    from both command line and configuration structure.
253  * **Feature:** The `-exhaustive` mode now runs full trials on more
254    partitioning candidates and block candidates. This improves image quality
255    by 0.1 to 0.25 dB, but slows down compression by 3x. The `-verythorough`
256    and `-thorough` modes also test more candidates.
257  * **Feature:** A new preset, `-verythorough`, has been introduced to provide
258    a standard performance point between `-thorough` and the re-tuned
259    `-exhaustive` mode. This new mode is faster and higher quality than the
260    `-exhaustive` preset in the 4.1 release.
261  * **Feature:** The compressor can now independently vary the number of
262    partitionings considered for error estimation for 2/3/4 partitions. This
263    allows heuristics to put more effort into 2 partitions, and less in to
264    3/4 partitions.
265  * **Feature:** The compressor can now run trials on a variable number of
266    candidate partitionings, allowing high quality modes to explore more of the
267    search space at the expense of slower compression. The number of trials is
268    independently configurable for 2/3/4 partition cases.
269  * **Optimization:** Introduce early-out threshold for 2/3/4 partition
270    searches based on the results after 1 of 2 trials. This significantly
271    improves performance for `-medium` and `-thorough` searches, for a minor
272    loss in image quality.
273  * **Optimization:** Reduce early-out threshold for 3/4 partition searches
274    based on 2/3 partition results. This significantly improves performance,
275    especially for `-thorough` searches, for a minor loss in image quality.
276  * **Optimization:** Use direct vector compare to create a SIMD mask instead
277    of a scalar compare that is broadcast to a vector mask.
278  * **Optimization:** Remove obsolete partition validity masks from the
279    partition selection algorithm.
280  * **Optimization:** Removed obsolete channel scaling from partition
281    `avgs_and_dirs()` calculation.
282
283### Performance:
284
285Key for charts:
286
287* Color = block size (see legend).
288* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
289
290**Relative performance vs 4.0 and 4.1 release:**
291
292![Relative scores 4.2 vs 4.0](./ChangeLogImg/relative-4.0-to-4.2.png)
293
294
295<!-- ---------------------------------------------------------------------- -->
296## 4.1.0
297
298**Status:** August 2022
299
300The 4.1.0 release is a maintenance release. There is no performance or image
301quality change in this release.
302
303* **General:**
304  * **Change:** Command line decompressor no longer uses the legacy
305    `GL_LUMINANCE` or `GL_LUMINANCE_ALPHA` format enums when writing KTX
306    output files. Luminance textures now use the `GL_RED` format and
307    luminance_alpha textures now use the `GL_RG` format.
308  * **Change:** Command line tool gains a new `-dimage` option to generate
309    diagnostic images showing aspects of the compression encoding. The output
310    file name with its extension stripped is used as the stem of the diagnostic
311    image file names.
312  * **Bug-fix:** Library decompressor builds for SSE no longer use masked store
313    `maskmovdqu` instructions, as they can generate faults on masked lanes.
314  * **Bug-fix:** Command line decompressor now correctly uses sized type enums
315    for the internal format when writing output KTX files.
316  * **Bug-fix:** Command line compressor now correctly loads 16 and 32-bit per
317    component input KTX files.
318  * **Bug-fix:** Fixed GCC9 compiler warnings on Arm aarch64.
319
320<!-- ---------------------------------------------------------------------- -->
321## 4.0.0
322
323**Status:** July 2022
324
325The 4.0.0 release introduces some major performance enhancement, and a number
326of larger changes to the heuristics used in the codec to find a more effective
327cost:quality trade off.
328
329* **General:**
330  * **Change:** The `-array` option for specifying the number of image planes
331    for ASTC 3D volumetric block compression been renamed to `-zdim`.
332  * **Change:** The build root package directory is now `bin` instead of
333    `astcenc`, allowing the CMake install step to write binaries into
334    `/usr/local/bin` if the user wishes to do so.
335  * **Feature:** A new `-ssw` option for specifying the shader sampling swizzle
336    has been added as convenience alternative to the `-cw` option. This is
337    needed to correct error weighting during compression if not all components
338    are read in the shader. For example, to extract and compress two components
339    from an RGBA input image, weighting the two components equally when
340    sampling through .ra in the shader, use `-esw ggga -ssw ra`. In this
341    example `-ssw ra` is equivalent to the alternative `-cw 1 0 0 1` encoding.
342  * **Feature:** The `-a` alpha weighting option has been re-enabled in the
343    backend, and now again applies alpha scaling to the RGB error metrics when
344    encoding. This is based on the maximum alpha in each block, not the
345    individual texel alpha values used in the earlier implementation.
346  * **Feature:** The command line tool now has `-repeats <count>` for testing,
347    which will iterate around compression and decompression `count` times.
348    Reported performance metrics also now separate compression and
349    decompression scores.
350  * **Feature:** The core codec is now warning clean up to /W4 for both MSVC
351    `cl.exe` and `clangcl.exe` compilers.
352  * **Feature:** The core codec now supports arm64 for both MSVC `cl.exe` and
353    `clangcl.exe` compilers.
354  * **Feature:** `NO_INVARIANCE` builds will enable the `-ffp-contract=fast`
355    option for all targets when using Clang or GCC. In addition AVX2 targets
356    will also set the `-mfma` option. This reduces image quality by up to 0.2dB
357    (normally much less), but improves performance by up to 5-20%.
358  * **Optimization:** Angular endpoint min/max weight selection is restricted
359    to weight `QUANT_11` or lower. Higher quantization levels assume default
360    0-1 range, which is less accurate but much faster.
361  * **Optimization:** Maximum weight quantization for later trials is selected
362    based on the weight quantization of the best encoding from the 1 plane 1
363    partition trial. This significantly reduces the search space for the later
364    trials with more planes or partitions.
365  * **Optimization:** Small data tables now use in-register SIMD permutes
366    rather than gathers (AVX2) or unrolled scalar lookups (SSE/NEON). This can
367    be a significant optimization for paths that are load unit limited.
368  * **Optimization:** Decompressed image block writes in the decompressor now
369    use a vectorized approach to writing each row of texels in the block,
370    including to ability to exploit masked stores if the target supports them.
371  * **Optimization:** Weight scrambling has been moved into the physical layer;
372    the rest of the codec now uses linear order weights.
373  * **Optimization:** Weight packing has been moved into the physical layer;
374    the rest of the codec now uses unpacked weights in the 0-64 range.
375  * **Optimization:** Consistently vectorize the creation of unquantized weight
376    grids when they are needed.
377  * **Optimization:** Remove redundant per-decimation mode copies of endpoint
378    and weight structures, which were really read-only duplicates.
379  * **Optimization:** Early-out the same endpoint mode color calculation if it
380    cannot be applied.
381  * **Optimization:** Numerous type size reductions applied to arrays to reduce
382    both context working buffer size usage and stack usage.
383
384### Performance:
385
386Key for charts:
387
388* Color = block size (see legend).
389* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
390
391**Relative performance vs 3.7 release:**
392
393![Relative scores 4.0 vs 3.7](./ChangeLogImg/relative-3.7-to-4.0.png)
394
395
396- - -
397
398_Copyright © 2022-2024, Arm Limited and contributors. All rights reserved._
399