• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# 3.x series change log
2
3This page summarizes the major functional and performance changes in each
4release of the 3.x series.
5
6All performance data on this page is measured on an Intel Core i5-9600K
7clocked at 4.2 GHz, running `astcenc` using AVX2 and 6 threads.
8
9<!-- ---------------------------------------------------------------------- -->
10## 3.7
11
12**Status:** April 2022
13
14The 3.7 release contains another round of performance optimizations, including
15significant improvements to the command line front-end (faster PNG loader) and
16the arm64 build of the codec (faster NEON implementation).
17
18* **General:**
19  * **Feature:** The command line tool PNG loader has been switched to use
20    the Wuffs library, which is robust and significantly faster than the
21    current stb_image implementation.
22  * **Feature:** Support for non-invariant builds returns. Opt-in to slightly
23    faster, but not bit-exact, builds by setting `-DNO_INVARIANCE=ON` for the
24    CMake configuration. This improves performance by around 2%.
25  * **Optimization:** Changed SIMD `select()` so that it matches the default
26    NEON behavior (bitwise select), rather than the default x86-64 behavior
27    (lane select on MSB). Specialization `select_msb()` added for the one case
28    we want to select on a sign-bit, where NEON needs a different
29    implementation. This provides a significant (>25%) performance uplift on
30    NEON implementations.
31
32### Performance:
33
34Key for charts:
35
36* Color = block size (see legend).
37* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
38
39**Relative performance vs 3.5 release:**
40
41![Relative scores 3.7 vs 3.6](./ChangeLogImg/relative-3.6-to-3.7.png)
42
43<!-- ---------------------------------------------------------------------- -->
44## 3.6
45
46**Status:** April 2022
47
48The 3.6 release contains another round of performance optimizations.
49
50There are no interface changes in this release, but in general the API is not
51designed to be binary compatible across versions. We always recommend
52rebuilding your client-side code using the updated `astcenc.h` header.
53
54* **General:**
55  * **Feature:** Data tables are now optimized for contexts without the
56    `SELF_DECOMPRESS_ONLY` flag set. The flag therefore no longer improves
57    compression performance, but still reduces context creation time and
58    context data table memory footprint.
59  * **Feature:** Image quality for 4x4 `-fastest` configuration has been
60    improved.
61  * **Optimization:** Decimation modes are reliably excluded from processing
62    when they are only partially selected in the compressor configuration (e.g.
63    if used for single plane, but not dual plane modes). This is a significant
64    performance optimization for all quality levels.
65  * **Optimization:** Fast-path block load function variant added for 2D LDR
66    images with no swizzle. This is a moderate performance optimization for the
67    fast and fastest quality levels.
68
69### Performance:
70
71Key for charts:
72
73* Color = block size (see legend).
74* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
75
76**Relative performance vs 3.5 release:**
77
78![Relative scores 3.6 vs 3.5](./ChangeLogImg/relative-3.5-to-3.6.png)
79
80<!-- ---------------------------------------------------------------------- -->
81## 3.5
82
83**Status:** March 2022
84
85The 3.5 release contains another round of performance optimizations.
86
87There are no interface changes in this release, but in general the API is not
88designed to be binary compatible across versions. We always recommend
89rebuilding your client-side code using the updated `astcenc.h` header.
90
91* **General:**
92  * **Feature:** Compressor configurations using `SELF_DECOMPRESS_ONLY` mode
93    store compacted partition tables, which significantly improves both
94    context create time and runtime performance.
95  * **Feature:** Bilinear infill for decimated weight grids supports a new
96    variant for half-decimated grids which are only decimated in one axis.
97
98### Performance:
99
100Key for charts:
101
102* Color = block size (see legend).
103* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
104
105**Relative performance vs 3.4 release:**
106
107![Relative scores 3.5 vs 3.4](./ChangeLogImg/relative-3.4-to-3.5.png)
108
109
110<!-- ---------------------------------------------------------------------- -->
111## 3.4
112
113**Status:** February 2022
114
115The 3.4 release introduces another round of optimizations, removing a number
116of power-user configuration options to simplify the core compressor data path.
117
118Reminder for users of the library interface - the API is not designed to be
119binary compatible across versions, and this release is not compatible with
120earlier releases. Please update and rebuild your client-side code using the
121updated `astcenc.h` header.
122
123* **General:**
124  * **Feature:** Many memory allocations have been moved off the stack into
125    dynamically allocated working memory. This significantly reduces the peak
126    stack usage, allowing the compressor to run in systems with 128KB stack
127    limits.
128  * **Feature:** Builds now support `-DBLOCK_MAX_TEXELS=<count>` to allow a
129    compressor to support a subset of block sizes. This can reduce binary size
130    and runtime memory footprint, and improve performance.
131  * **Feature:** The `-v` and `-va` options to set a per-texel error weight
132    function are no longer supported.
133  * **Feature:** The `-b` option to set a per-texel error weight boost for
134    block border texels is no longer supported.
135  * **Feature:** The `-a` option to set a per-texel error weight based on texel
136    alpha value is no longer supported as an error weighting tool, but is still
137    supported for providing sprite-sheet RDO.
138  * **Feature:** The `-mask` option to set an error metric for mask map
139    textures is still supported, but is currently a no-op in the compressor.
140  * **Feature:** The `-perceptual` option to set a perceptual error metric is
141    still supported, but is currently a no-op in the compressor for mask map
142    and normal map textures.
143  * **Bug-fix:** Corrected decompression of error blocks in some cases, so now
144    returning the expected error color (magenta for LDR, NaN for HDR). Note
145    that astcenc determines the error color to use based on the output image
146    data type not the decoder profile.
147* **Binary releases:**
148  * **Improvement:** Windows binaries changed to use ClangCL 12.0, which gives
149    up to 10% performance improvement.
150
151### Performance:
152
153Key for charts:
154
155* Color = block size (see legend).
156* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
157
158**Relative performance vs 3.3 release:**
159
160![Relative scores 3.4 vs 3.3](./ChangeLogImg/relative-3.3-to-3.4.png)
161
162
163<!-- ---------------------------------------------------------------------- -->
164## 3.3
165
166**Status:** November 2021
167
168The 3.3 release improves image quality for normal maps, and two component
169textures. Normal maps are expected to compress 25% slower than the 3.2
170release, although it should be noted that they are still faster to compress
171in 3.3 than when using the 2.5 series. This release also fixes one reported
172stability issue.
173
174* **General:**
175  * **Feature:** Normal map image quality has been improved.
176  * **Feature:** Two component image quality has been improved, provided
177    that unused components are correctly zero-weighted using e.g. `-cw` on the
178    command line.
179  * **Bug-fix:** Improved stability when trying to compress complex blocks that
180    could not beat even the starting quality threshold. These will now always
181    compress in to a constant color blocks.
182
183<!-- ---------------------------------------------------------------------- -->
184## 3.2
185
186**Status:** August 2021
187
188The 3.2 release is a bugfix release; no significant image quality or
189performance differences are expected.
190
191* **General:**
192  * **Bug-fix:** Improved stability when new contexts were created while other
193    contexts were compressing or decompressing an image.
194  * **Bug-fix:** Improved stability when decompressing blocks with invalid
195    block encodings.
196
197<!-- ---------------------------------------------------------------------- -->
198## 3.1
199
200**Status:** July 2021
201
202The 3.1 release gives another performance boost, typically between 5 and 20%
203faster than the 3.0 release, as well as further incremental improvements to
204image quality. A number of build system improvements make astcenc easier and
205faster to integrate into other projects as a library, including support for
206building universal binaries on macOS. Full change list is shown below.
207
208Reminder for users of the library interface - the API is not designed to be
209binary compatible across versions, and this release is not compatible with
210earlier releases. Please update and rebuild your client-side code using the
211updated `astcenc.h` header.
212
213* **General:**
214  * **Feature:** RGB color data now supports `-perceptual` operation. The
215    current implementation is simple, weighting color channel errors by their
216    contribution to perceived luminance. This mimics the behavior of the human
217    visual system, which is most sensitive to green, then red, then blue.
218  * **Feature:** Codec supports a new low weight search mode, which is a
219    simpler weight assignment for encodings with a low number of weights in the
220    weight grid. The weight threshold can be overridden using the new
221    `-lowweightmodelimit` command line option.
222  * **Feature:** All platform builds now support building a native binary.
223    Native binaries automatically select the SIMD level based on the default
224    configuration of the compiler in use. Native binaries built on one machine
225    may use different SIMD options than native binaries build on another.
226  * **Feature:** macOS platform builds now support building universal binaries
227    containing both `x86_64` and `arm64` target support.
228  * **Feature:** Building the command line can be disabled when using as a
229    library in another project. Set `-DCLI=OFF` during the CMake configure
230    step.
231  * **Feature:** A standalone minimal example of the core codec API usage has
232    been added in the `./Utils/Example/` directory.
233* **Core API:**
234  * **Feature:** Config flag `ASTCENC_FLG_USE_PERCEPTUAL` works for color data.
235  * **Feature:** Config option `tune_low_weight_count_limit` added.
236  * **Feature:** New heuristic added which prunes dual weight plane searches if
237    they are unlikely to help. This heuristic is not user controllable.
238  * **Feature:** Image quality has been improved. In general we see significant
239    improvements (up to 0.2dB) for high bitrate encodings (4x4, 5x4), and a
240    smaller improvement (up to 0.1dB) for lower bitrate encodings.
241  * **Bug fix:** Arm "none" SIMD builds could be invariant with other builds.
242    This fix has also been back-ported to the 2.x LTS branch.
243
244### Performance:
245
246Key for charts:
247
248* Color = block size (see legend).
249* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
250
251**Relative performance vs 3.0 release:**
252
253![Relative scores 3.1 vs 3.0](./ChangeLogImg/relative-3.0-to-3.1.png)
254
255<!-- ---------------------------------------------------------------------- -->
256## 3.0
257
258**Status:** June 2021
259
260The 3.0 release is the first in a series of updates to the compressor that are
261making more radical changes than we felt we could make with the 2.x series.
262The primary goals of the 3.x series are to keep the image quality ~static or
263better compared to the 2.5 release, but continue to improve performance.
264
265Reminder for users of the library interface - the API is not designed to be
266binary compatible across versions, and this release is not compatible with
267earlier releases. Please update and rebuild your client-side code using the
268updated `astcenc.h` header.
269
270* **General:**
271  * **Feature:** The code has been significantly cleaned up, with improved
272    comments, API documentation, function naming, and variable naming.
273* **Core API:**
274  * **API Change:** The core APIs for `astcenc_compress_image()` and for
275    `astcenc_decompress_image()` now accept swizzle structures by `const`
276    pointer, instead of pass-by-value.
277  * **API Change:** Calling the `astcenc_compress_reset()` and the
278    `astcenc_decompress_reset()` functions between images is no longer required
279    if the context was created for use by a single thread.
280  * **Feature:** New heuristics have been added for controlling when to search
281    beyond 2 partitions and 1 plane, and when to search beyond 3 partitions and
282    1 plane. The previous `tune_partition_early_out_limit` config option has
283    been removed, and replaced with two new options
284    `tune_2_partition_early_out_limit_factor` and
285    `tune_3_partition_early_out_limit_factor`. See command line help for more
286    detailed documentation.
287  * **Feature:** New heuristics have been added for controlling when to use
288    dual weight planes. The previous `tune_two_plane_early_out_limit` has been
289    renamed to`tune_2_plane_early_out_limit_correlation`. See command line help
290    for more detailed documentation.
291  * **Feature:** Support for using dual weight planes has been restricted to
292    single partition blocks; it rarely helps blocks with 2 or more partitions
293    and takes considerable compression search time.
294
295### Performance:
296
297Key for charts:
298
299* Color = block size (see legend).
300* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
301
302**Absolute performance vs 2.5 release:**
303
304![Absolute scores 3.0 vs 2.5](./ChangeLogImg/absolute-2.5-to-3.0.png)
305
306**Relative performance vs 2.5 release:**
307
308![Relative scores 3.0 vs 2.5](./ChangeLogImg/relative-2.5-to-3.0.png)
309
310- - -
311
312_Copyright © 2021-2022, Arm Limited and contributors. All rights reserved._
313