• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<!--
2
3Although you may be viewing an alternate representation, this document is
4sourced in Markdown, a light-duty markup scheme, and is optimized for the
5[kramdown](https://kramdown.gettalong.org/) transformer.
6
7See the accompanying specs_generation.md. External link targets are referenced
8at the end of this file.
9
10-->
11
12Specification for WebP Lossless Bitstream
13=========================================
14
15_Jyrki Alakuijala, Ph.D., Google, Inc., 2012-06-19_
16
17Paragraphs marked as \[AMENDED\] were amended on 2014-09-16.
18
19Paragraphs marked as \[AMENDED2\] were amended on 2022-05-13.
20
21Paragraphs marked as \[AMENDED3\] were amended on 2022-11-21.
22
23Abstract
24--------
25
26WebP lossless is an image format for lossless compression of ARGB images. The
27lossless format stores and restores the pixel values exactly, including the
28color values for pixels whose alpha value is 0. The format uses subresolution
29images, recursively embedded into the format itself, for storing statistical
30data about the images, such as the used entropy codes, spatial predictors, color
31space conversion, and color table. LZ77, prefix coding, and a color cache are
32used for compression of the bulk data. Decoding speeds faster than PNG have been
33demonstrated, as well as 25% denser compression than can be achieved using
34today's PNG format.
35
36
37* TOC placeholder
38{:toc}
39
40
411 Introduction
42--------------
43
44This document describes the compressed data representation of a WebP lossless
45image. It is intended as a detailed reference for the WebP lossless encoder and
46decoder implementation.
47
48In this document, we extensively use C programming language syntax to describe
49the bitstream, and assume the existence of a function for reading bits,
50`ReadBits(n)`. The bytes are read in the natural order of the stream containing
51them, and bits of each byte are read in least-significant-bit-first order. When
52multiple bits are read at the same time, the integer is constructed from the
53original data in the original order. The most significant bits of the returned
54integer are also the most significant bits of the original data. Thus, the
55statement
56
57~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
58b = ReadBits(2);
59~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
60
61is equivalent with the two statements below:
62
63~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
64b = ReadBits(1);
65b |= ReadBits(1) << 1;
66~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
67
68We assume that each color component (e.g. alpha, red, blue and green) is
69represented using an 8-bit byte. We define the corresponding type as uint8. A
70whole ARGB pixel is represented by a type called uint32, an unsigned integer
71consisting of 32 bits. In the code showing the behavior of the transformations,
72alpha value is codified in bits 31..24, red in bits 23..16, green in bits 15..8
73and blue in bits 7..0, but implementations of the format are free to use another
74representation internally.
75
76Broadly, a WebP lossless image contains header data, transform information and
77actual image data. Headers contain width and height of the image. A WebP
78lossless image can go through four different types of transformation before
79being entropy encoded. The transform information in the bitstream contains the
80data required to apply the respective inverse transforms.
81
82
832 Nomenclature
84--------------
85
86ARGB
87:   A pixel value consisting of alpha, red, green, and blue values.
88
89ARGB image
90:   A two-dimensional array containing ARGB pixels.
91
92color cache
93:   A small hash-addressed array to store recently used colors, to be able to
94    recall them with shorter codes.
95
96color indexing image
97:   A one-dimensional image of colors that can be indexed using a small integer
98    (up to 256 within WebP lossless).
99
100color transform image
101:   A two-dimensional subresolution image containing data about correlations of
102    color components.
103
104distance mapping
105:   Changes LZ77 distances to have the smallest values for pixels in 2D
106    proximity.
107
108entropy image
109:   A two-dimensional subresolution image indicating which entropy coding should
110    be used in a respective square in the image, i.e., each pixel is a meta
111    prefix code.
112
113prefix code
114:   A classic way to do entropy coding where a smaller number of bits are used
115    for more frequent codes.
116
117LZ77
118:   Dictionary-based sliding window compression algorithm that either emits
119    symbols or describes them as sequences of past symbols.
120
121meta prefix code
122:   A small integer (up to 16 bits) that indexes an element in the meta prefix
123    table.
124
125predictor image
126:   A two-dimensional subresolution image indicating which spatial predictor is
127    used for a particular square in the image.
128
129prefix coding
130:   A way to entropy code larger integers that codes a few bits of the integer
131    using an entropy code and codifies the remaining bits raw. This allows for
132    the descriptions of the entropy codes to remain relatively small even when
133    the range of symbols is large.
134
135scan-line order
136:   A processing order of pixels, left-to-right, top-to-bottom, starting from
137    the left-hand-top pixel, proceeding to the right. Once a row is completed,
138    continue from the left-hand column of the next row.
139
1403 RIFF Header
141-------------
142
143The beginning of the header has the RIFF container. This consists of the
144following 21 bytes:
145
146   1. String "RIFF"
147   2. A little-endian 32 bit value of the block length, the whole size
148      of the block controlled by the RIFF header. Normally this equals
149      the payload size (file size minus 8 bytes: 4 bytes for the 'RIFF'
150      identifier and 4 bytes for storing the value itself).
151   3. String "WEBP" (RIFF container name).
152   4. String "VP8L" (chunk tag for lossless encoded image data).
153   5. A little-endian 32-bit value of the number of bytes in the
154      lossless stream.
155   6. One byte signature 0x2f.
156
157The first 28 bits of the bitstream specify the width and height of the image.
158Width and height are decoded as 14-bit integers as follows:
159
160~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
161int image_width = ReadBits(14) + 1;
162int image_height = ReadBits(14) + 1;
163~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
164
165The 14-bit dynamics for image size limit the maximum size of a WebP lossless
166image to 16384✕16384 pixels.
167
168The alpha_is_used bit is a hint only, and should not impact decoding. It should
169be set to 0 when all alpha values are 255 in the picture, and 1 otherwise.
170
171~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
172int alpha_is_used = ReadBits(1);
173~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
174
175The version_number is a 3 bit code that must be set to 0. Any other value should
176be treated as an error. \[AMENDED\]
177
178~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
179int version_number = ReadBits(3);
180~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
181
182
1834 Transformations
184-----------------
185
186Transformations are reversible manipulations of the image data that can reduce
187the remaining symbolic entropy by modeling spatial and color correlations.
188Transformations can make the final compression more dense.
189
190An image can go through four types of transformation. A 1 bit indicates the
191presence of a transform. Each transform is allowed to be used only once. The
192transformations are used only for the main level ARGB image: the subresolution
193images have no transforms, not even the 0 bit indicating the end-of-transforms.
194
195Typically, an encoder would use these transforms to reduce the Shannon entropy
196in the residual image. Also, the transform data can be decided based on entropy
197minimization.
198
199~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
200while (ReadBits(1)) {  // Transform present.
201  // Decode transform type.
202  enum TransformType transform_type = ReadBits(2);
203  // Decode transform data.
204  ...
205}
206
207// Decode actual image data (Section 4).
208~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
209
210If a transform is present then the next two bits specify the transform type.
211There are four types of transforms.
212
213~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
214enum TransformType {
215  PREDICTOR_TRANSFORM             = 0,
216  COLOR_TRANSFORM                 = 1,
217  SUBTRACT_GREEN_TRANSFORM        = 2,
218  COLOR_INDEXING_TRANSFORM        = 3,
219};
220~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
221
222The transform type is followed by the transform data. Transform data contains
223the information required to apply the inverse transform and depends on the
224transform type. Next we describe the transform data for different types.
225
226
227### 4.1 Predictor Transform
228
229The predictor transform can be used to reduce entropy by exploiting the fact
230that neighboring pixels are often correlated. In the predictor transform, the
231current pixel value is predicted from the pixels already decoded (in scan-line
232order) and only the residual value (actual - predicted) is encoded. The
233_prediction mode_ determines the type of prediction to use. We divide the image
234into squares and all the pixels in a square use the same prediction mode.
235
236The first 3 bits of prediction data define the block width and height in number
237of bits. The number of block columns, `block_xsize`, is used in indexing
238two-dimensionally.
239
240~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
241int size_bits = ReadBits(3) + 2;
242int block_width = (1 << size_bits);
243int block_height = (1 << size_bits);
244#define DIV_ROUND_UP(num, den) ((num) + (den) - 1) / (den))
245int block_xsize = DIV_ROUND_UP(image_width, 1 << size_bits);
246~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
247
248The transform data contains the prediction mode for each block of the image. All
249the `block_width * block_height` pixels of a block use same prediction mode. The
250prediction modes are treated as pixels of an image and encoded using the same
251techniques described in [Chapter 5](#image-data).
252
253For a pixel _x, y_, one can compute the respective filter block address by:
254
255~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
256int block_index = (y >> size_bits) * block_xsize +
257                  (x >> size_bits);
258~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
259
260There are 14 different prediction modes. In each prediction mode, the current
261pixel value is predicted from one or more neighboring pixels whose values are
262already known.
263
264We choose the neighboring pixels (TL, T, TR, and L) of the current pixel (P) as
265follows:
266
267~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
268O    O    O    O    O    O    O    O    O    O    O
269O    O    O    O    O    O    O    O    O    O    O
270O    O    O    O    TL   T    TR   O    O    O    O
271O    O    O    O    L    P    X    X    X    X    X
272X    X    X    X    X    X    X    X    X    X    X
273X    X    X    X    X    X    X    X    X    X    X
274~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
275
276where TL means top-left, T top, TR top-right, L left pixel. At the time of
277predicting a value for P, all pixels O, TL, T, TR and L have already been
278processed, and pixel P and all pixels X are unknown.
279
280Given the above neighboring pixels, the different prediction modes are defined
281as follows.
282
283| Mode   | Predicted value of each channel of the current pixel    |
284| ------ | ------------------------------------------------------- |
285|  0     | 0xff000000 (represents solid black color in ARGB)       |
286|  1     | L                                                       |
287|  2     | T                                                       |
288|  3     | TR                                                      |
289|  4     | TL                                                      |
290|  5     | Average2(Average2(L, TR), T)                            |
291|  6     | Average2(L, TL)                                         |
292|  7     | Average2(L, T)                                          |
293|  8     | Average2(TL, T)                                         |
294|  9     | Average2(T, TR)                                         |
295| 10     | Average2(Average2(L, TL), Average2(T, TR))              |
296| 11     | Select(L, T, TL)                                        |
297| 12     | ClampAddSubtractFull(L, T, TL)                          |
298| 13     | ClampAddSubtractHalf(Average2(L, T), TL)                |
299
300
301`Average2` is defined as follows for each ARGB component:
302
303~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
304uint8 Average2(uint8 a, uint8 b) {
305  return (a + b) / 2;
306}
307~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
308
309The Select predictor is defined as follows:
310
311~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
312uint32 Select(uint32 L, uint32 T, uint32 TL) {
313  // L = left pixel, T = top pixel, TL = top left pixel.
314
315  // ARGB component estimates for prediction.
316  int pAlpha = ALPHA(L) + ALPHA(T) - ALPHA(TL);
317  int pRed = RED(L) + RED(T) - RED(TL);
318  int pGreen = GREEN(L) + GREEN(T) - GREEN(TL);
319  int pBlue = BLUE(L) + BLUE(T) - BLUE(TL);
320
321  // Manhattan distances to estimates for left and top pixels.
322  int pL = abs(pAlpha - ALPHA(L)) + abs(pRed - RED(L)) +
323           abs(pGreen - GREEN(L)) + abs(pBlue - BLUE(L));
324  int pT = abs(pAlpha - ALPHA(T)) + abs(pRed - RED(T)) +
325           abs(pGreen - GREEN(T)) + abs(pBlue - BLUE(T));
326
327  // Return either left or top, the one closer to the prediction.
328  if (pL < pT) {     // \[AMENDED\]
329    return L;
330  } else {
331    return T;
332  }
333}
334~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
335
336The functions `ClampAddSubtractFull` and `ClampAddSubtractHalf` are performed
337for each ARGB component as follows:
338
339~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
340// Clamp the input value between 0 and 255.
341int Clamp(int a) {
342  return (a < 0) ? 0 : (a > 255) ?  255 : a;
343}
344~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
345
346~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
347int ClampAddSubtractFull(int a, int b, int c) {
348  return Clamp(a + b - c);
349}
350~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
351
352~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
353int ClampAddSubtractHalf(int a, int b) {
354  return Clamp(a + (a - b) / 2);
355}
356~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
357
358There are special handling rules for some border pixels. If there is a
359prediction transform, regardless of the mode \[0..13\] for these pixels, the
360predicted value for the left-topmost pixel of the image is 0xff000000, L-pixel
361for all pixels on the top row, and T-pixel for all pixels on the leftmost
362column.
363
364\[AMENDED2\] Addressing the TR-pixel for pixels on the rightmost column is
365exceptional. The pixels on the rightmost column are predicted by using the modes
366\[0..13\] just like pixels not on the border, but the leftmost pixel on the same
367row as the current pixel is instead used as the TR-pixel.
368
369
370### 4.2 Color Transform
371
372\[AMENDED2\]
373
374The goal of the color transform is to decorrelate the R, G and B values of each
375pixel. The color transform keeps the green (G) value as it is, transforms red
376(R) based on green and transforms blue (B) based on green and then based on red.
377
378As is the case for the predictor transform, first the image is divided into
379blocks and the same transform mode is used for all the pixels in a block. For
380each block there are three types of color transform elements.
381
382~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
383typedef struct {
384  uint8 green_to_red;
385  uint8 green_to_blue;
386  uint8 red_to_blue;
387} ColorTransformElement;
388~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
389
390The actual color transformation is done by defining a color transform delta. The
391color transform delta depends on the `ColorTransformElement`, which is the same
392for all the pixels in a particular block. The delta is subtracted during the
393color transform. The inverse color transform then is just adding those deltas.
394
395The color transform function is defined as follows:
396
397~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
398void ColorTransform(uint8 red, uint8 blue, uint8 green,
399                    ColorTransformElement *trans,
400                    uint8 *new_red, uint8 *new_blue) {
401  // Transformed values of red and blue components
402  int tmp_red = red;
403  int tmp_blue = blue;
404
405  // Applying the transform is just subtracting the transform deltas
406  tmp_red  -= ColorTransformDelta(trans->green_to_red_,  green);
407  tmp_blue -= ColorTransformDelta(trans->green_to_blue_, green);
408  tmp_blue -= ColorTransformDelta(trans->red_to_blue_, red);
409
410  *new_red = tmp_red & 0xff;
411  *new_blue = tmp_blue & 0xff;
412}
413~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
414
415`ColorTransformDelta` is computed using a signed 8-bit integer representing a
4163.5-fixed-point number, and a signed 8-bit RGB color channel (c) \[-128..127\]
417and is defined as follows:
418
419~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
420int8 ColorTransformDelta(int8 t, int8 c) {
421  return (t * c) >> 5;
422}
423~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
424
425A conversion from the 8-bit unsigned representation (uint8) to the 8-bit signed
426one (int8) is required before calling `ColorTransformDelta()`. It should be
427performed using 8-bit two's complement (that is: uint8 range \[128..255\] is
428mapped to the \[-128..-1\] range of its converted int8 value).
429
430The multiplication is to be done using more precision (with at least 16-bit
431dynamics). The sign extension property of the shift operation does not matter
432here: only the lowest 8 bits are used from the result, and there the sign
433extension shifting and unsigned shifting are consistent with each other.
434
435Now we describe the contents of color transform data so that decoding can apply
436the inverse color transform and recover the original red and blue values. The
437first 3 bits of the color transform data contain the width and height of the
438image block in number of bits, just like the predictor transform:
439
440~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
441int size_bits = ReadBits(3) + 2;
442int block_width = 1 << size_bits;
443int block_height = 1 << size_bits;
444~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
445
446The remaining part of the color transform data contains `ColorTransformElement`
447instances corresponding to each block of the image. `ColorTransformElement`
448instances are treated as pixels of an image and encoded using the methods
449described in [Chapter 5](#image-data).
450
451During decoding, `ColorTransformElement` instances of the blocks are decoded and
452the inverse color transform is applied on the ARGB values of the pixels. As
453mentioned earlier, that inverse color transform is just adding
454`ColorTransformElement` values to the red and blue channels. \[AMENDED3\]
455
456~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
457void InverseTransform(uint8 red, uint8 green, uint8 blue,
458                      ColorTransformElement *trans,
459                      uint8 *new_red, uint8 *new_blue) {
460  // Transformed values of red and blue components
461  int tmp_red = red;
462  int tmp_blue = blue;
463
464  // Applying the inverse transform is just adding the
465  // color transform deltas
466  tmp_red  += ColorTransformDelta(trans->green_to_red, green);
467  tmp_blue += ColorTransformDelta(trans->green_to_blue, green);
468  tmp_blue +=
469      ColorTransformDelta(trans->red_to_blue, tmp_red & 0xff);
470
471  *new_red = tmp_red & 0xff;
472  *new_blue = tmp_blue & 0xff;
473}
474~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
475
476
477### 4.3 Subtract Green Transform
478
479The subtract green transform subtracts green values from red and blue values of
480each pixel. When this transform is present, the decoder needs to add the green
481value to both red and blue. There is no data associated with this transform. The
482decoder applies the inverse transform as follows:
483
484~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
485void AddGreenToBlueAndRed(uint8 green, uint8 *red, uint8 *blue) {
486  *red  = (*red  + green) & 0xff;
487  *blue = (*blue + green) & 0xff;
488}
489~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
490
491This transform is redundant as it can be modeled using the color transform, but
492it is still often useful. Since it can extend the dynamics of the color
493transform and there is no additional data here, the subtract green transform can
494be coded using fewer bits than a full-blown color transform.
495
496
497### 4.4 Color Indexing Transform
498
499If there are not many unique pixel values, it may be more efficient to create a
500color index array and replace the pixel values by the array's indices. The color
501indexing transform achieves this. (In the context of WebP lossless, we
502specifically do not call this a palette transform because a similar but more
503dynamic concept exists in WebP lossless encoding: color cache).
504
505The color indexing transform checks for the number of unique ARGB values in the
506image. If that number is below a threshold (256), it creates an array of those
507ARGB values, which is then used to replace the pixel values with the
508corresponding index: the green channel of the pixels are replaced with the
509index; all alpha values are set to 255; all red and blue values to 0.
510
511The transform data contains color table size and the entries in the color table.
512The decoder reads the color indexing transform data as follows:
513
514~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
515// 8 bit value for color table size
516int color_table_size = ReadBits(8) + 1;
517~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
518
519The color table is stored using the image storage format itself. The color table
520can be obtained by reading an image, without the RIFF header, image size, and
521transforms, assuming a height of one pixel and a width of `color_table_size`.
522The color table is always subtraction-coded to reduce image entropy. The deltas
523of palette colors contain typically much less entropy than the colors
524themselves, leading to significant savings for smaller images. In decoding,
525every final color in the color table can be obtained by adding the previous
526color component values by each ARGB component separately, and storing the least
527significant 8 bits of the result.
528
529The inverse transform for the image is simply replacing the pixel values (which
530are indices to the color table) with the actual color table values. The indexing
531is done based on the green component of the ARGB color.
532
533~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
534// Inverse transform
535argb = color_table[GREEN(argb)];
536~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
537
538If the index is equal or larger than `color_table_size`, the argb color value
539should be set to 0x00000000 (transparent black). \[AMENDED\]
540
541When the color table is small (equal to or less than 16 colors), several pixels
542are bundled into a single pixel. The pixel bundling packs several (2, 4, or 8)
543pixels into a single pixel, reducing the image width respectively. Pixel
544bundling allows for a more efficient joint distribution entropy coding of
545neighboring pixels, and gives some arithmetic coding-like benefits to the
546entropy code, but it can only be used when there are 16 or fewer unique values.
547
548`color_table_size` specifies how many pixels are combined:
549
550~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
551int width_bits;
552if (color_table_size <= 2) {
553  width_bits = 3;
554} else if (color_table_size <= 4) {
555  width_bits = 2;
556} else if (color_table_size <= 16) {
557  width_bits = 1;
558} else {
559  width_bits = 0;
560}
561~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
562
563`width_bits` has a value of 0, 1, 2 or 3. A value of 0 indicates no pixel
564bundling is to be done for the image. A value of 1 indicates that two pixels are
565combined, and each pixel has a range of \[0..15\]. A value of 2 indicates that
566four pixels are combined, and each pixel has a range of \[0..3\]. A value of 3
567indicates that eight pixels are combined and each pixel has a range of \[0..1\],
568i.e., a binary value.
569
570The values are packed into the green component as follows:
571
572  * `width_bits` = 1: for every x value where x0 (mod 2), a green
573    value at x is positioned into the 4 least-significant bits of the
574    green value at x / 2, a green value at x + 1 is positioned into the
575    4 most-significant bits of the green value at x / 2.
576  * `width_bits` = 2: for every x value where x0 (mod 4), a green
577    value at x is positioned into the 2 least-significant bits of the
578    green value at x / 4, green values at x + 1 to x + 3 are positioned in order
579    to the more significant bits of the green value at x / 4.
580  * `width_bits` = 3: for every x value where x0 (mod 8), a green
581    value at x is positioned into the least-significant bit of the green
582    value at x / 8, green values at x + 1 to x + 7 are positioned in order to
583    the more significant bits of the green value at x / 8.
584
585
5865 Image Data
587------------
588
589Image data is an array of pixel values in scan-line order.
590
591### 5.1 Roles of Image Data
592
593We use image data in five different roles:
594
595  1. ARGB image: Stores the actual pixels of the image.
596  1. Entropy image: Stores the
597     [meta prefix codes](#decoding-of-meta-prefix-codes). The red and green
598     components of a pixel define the meta prefix code used in a particular
599     block of the ARGB image.
600  1. Predictor image: Stores the metadata for
601     [Predictor Transform](#predictor-transform). The green component of a pixel
602     defines which of the 14 predictors is used within a particular block of the
603     ARGB image.
604  1. Color transform image. It is created by `ColorTransformElement` values
605     (defined in [Color Transform](#color-transform)) for different blocks of
606     the image. Each `ColorTransformElement` `'cte'` is treated as a pixel whose
607     alpha component is `255`, red component is `cte.red_to_blue`, green
608     component is `cte.green_to_blue` and blue component is `cte.green_to_red`.
609  1. Color indexing image: An array of size `color_table_size` (up to 256
610     ARGB values) storing the metadata for the
611     [Color Indexing Transform](#color-indexing-transform). This is stored as an
612     image of width `color_table_size` and height `1`.
613
614### 5.2 Encoding of Image Data
615
616The encoding of image data is independent of its role.
617
618The image is first divided into a set of fixed-size blocks (typically 16x16
619blocks). Each of these blocks are modeled using their own entropy codes. Also,
620several blocks may share the same entropy codes.
621
622**Rationale:** Storing an entropy code incurs a cost. This cost can be minimized
623if statistically similar blocks share an entropy code, thereby storing that code
624only once. For example, an encoder can find similar blocks by clustering them
625using their statistical properties, or by repeatedly joining a pair of randomly
626selected clusters when it reduces the overall amount of bits needed to encode
627the image.
628
629Each pixel is encoded using one of the three possible methods:
630
631  1. Prefix coded literal: each channel (green, red, blue and alpha) is
632     entropy-coded independently;
633  2. LZ77 backward reference: a sequence of pixels are copied from elsewhere
634     in the image; or
635  3. Color cache code: using a short multiplicative hash code (color cache
636     index) of a recently seen color.
637
638The following subsections describe each of these in detail.
639
640#### 5.2.1 Prefix Coded Literals
641
642The pixel is stored as prefix coded values of green, red, blue and alpha (in
643that order). See [this section](#decoding-entropy-coded-image-data) for details.
644
645#### 5.2.2 LZ77 Backward Reference
646
647Backward references are tuples of _length_ and _distance code_:
648
649  * Length indicates how many pixels in scan-line order are to be copied.
650  * Distance code is a number indicating the position of a previously seen
651    pixel, from which the pixels are to be copied. The exact mapping is
652    described [below](#distance-mapping).
653
654The length and distance values are stored using **LZ77 prefix coding**.
655
656LZ77 prefix coding divides large integer values into two parts: the _prefix
657code_ and the _extra bits_: the prefix code is stored using an entropy code,
658while the extra bits are stored as they are (without an entropy code).
659
660**Rationale**: This approach reduces the storage requirement for the entropy
661code. Also, large values are usually rare, and so extra bits would be used for
662very few values in the image. Thus, this approach results in better compression
663overall.
664
665The following table denotes the prefix codes and extra bits used for storing
666different ranges of values.
667
668Note: The maximum backward reference length is limited to 4096. Hence, only the
669first 24 prefix codes (with the respective extra bits) are meaningful for length
670values. For distance values, however, all the 40 prefix codes are valid.
671
672| Value range     | Prefix code | Extra bits |
673| --------------- | ----------- | ---------- |
674| 1               | 0           | 0          |
675| 2               | 1           | 0          |
676| 3               | 2           | 0          |
677| 4               | 3           | 0          |
678| 5..6            | 4           | 1          |
679| 7..8            | 5           | 1          |
680| 9..12           | 6           | 2          |
681| 13..16          | 7           | 2          |
682| ...             | ...         | ...        |
683| 3072..4096      | 23          | 10         |
684| ...             | ...         | ...        |
685| 524289..786432  | 38          | 18         |
686| 786433..1048576 | 39          | 18         |
687
688The pseudocode to obtain a (length or distance) value from the prefix code is as
689follows:
690
691~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
692if (prefix_code < 4) {
693  return prefix_code + 1;
694}
695int extra_bits = (prefix_code - 2) >> 1;
696int offset = (2 + (prefix_code & 1)) << extra_bits;
697return offset + ReadBits(extra_bits) + 1;
698~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
699
700**Distance Mapping:**
701{:#distance-mapping}
702
703As noted previously, a distance code is a number indicating the position of a
704previously seen pixel, from which the pixels are to be copied. This subsection
705defines the mapping between a distance code and the position of a previous
706pixel.
707
708Distance codes larger than 120 denote the pixel-distance in scan-line order,
709offset by 120.
710
711The smallest distance codes \[1..120\] are special, and are reserved for a close
712neighborhood of the current pixel. This neighborhood consists of 120 pixels:
713
714  * Pixels that are 1 to 7 rows above the current pixel, and are up to 8 columns
715    to the left or up to 7 columns to the right of the current pixel. \[Total
716    such pixels = `7 * (8 + 1 + 7) = 112`\].
717  * Pixels that are in same row as the current pixel, and are up to 8 columns to
718    the left of the current pixel. \[`8` such pixels\].
719
720The mapping between distance code `i` and the neighboring pixel offset
721`(xi, yi)` is as follows:
722
723~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
724(0, 1),  (1, 0),  (1, 1),  (-1, 1), (0, 2),  (2, 0),  (1, 2),
725(-1, 2), (2, 1),  (-2, 1), (2, 2),  (-2, 2), (0, 3),  (3, 0),
726(1, 3),  (-1, 3), (3, 1),  (-3, 1), (2, 3),  (-2, 3), (3, 2),
727(-3, 2), (0, 4),  (4, 0),  (1, 4),  (-1, 4), (4, 1),  (-4, 1),
728(3, 3),  (-3, 3), (2, 4),  (-2, 4), (4, 2),  (-4, 2), (0, 5),
729(3, 4),  (-3, 4), (4, 3),  (-4, 3), (5, 0),  (1, 5),  (-1, 5),
730(5, 1),  (-5, 1), (2, 5),  (-2, 5), (5, 2),  (-5, 2), (4, 4),
731(-4, 4), (3, 5),  (-3, 5), (5, 3),  (-5, 3), (0, 6),  (6, 0),
732(1, 6),  (-1, 6), (6, 1),  (-6, 1), (2, 6),  (-2, 6), (6, 2),
733(-6, 2), (4, 5),  (-4, 5), (5, 4),  (-5, 4), (3, 6),  (-3, 6),
734(6, 3),  (-6, 3), (0, 7),  (7, 0),  (1, 7),  (-1, 7), (5, 5),
735(-5, 5), (7, 1),  (-7, 1), (4, 6),  (-4, 6), (6, 4),  (-6, 4),
736(2, 7),  (-2, 7), (7, 2),  (-7, 2), (3, 7),  (-3, 7), (7, 3),
737(-7, 3), (5, 6),  (-5, 6), (6, 5),  (-6, 5), (8, 0),  (4, 7),
738(-4, 7), (7, 4),  (-7, 4), (8, 1),  (8, 2),  (6, 6),  (-6, 6),
739(8, 3),  (5, 7),  (-5, 7), (7, 5),  (-7, 5), (8, 4),  (6, 7),
740(-6, 7), (7, 6),  (-7, 6), (8, 5),  (7, 7),  (-7, 7), (8, 6),
741(8, 7)
742~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
743
744For example, the distance code `1` indicates an offset of `(0, 1)` for the
745neighboring pixel, that is, the pixel above the current pixel (0 pixel
746difference in the X-direction and 1 pixel difference in the Y-direction).
747Similarly, the distance code `3` indicates the left-top pixel.
748
749The decoder can convert a distance code `i` to a scan-line order distance `dist`
750as follows:
751
752\[AMENDED3\]
753
754~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
755(xi, yi) = distance_map[i - 1]
756dist = xi + yi * xsize
757if (dist < 1) {
758  dist = 1
759}
760~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
761
762where `distance_map` is the mapping noted above and `xsize` is the width of the
763image in pixels.
764
765
766#### 5.2.3 Color Cache Coding
767{:#color-cache-code}
768
769Color cache stores a set of colors that have been recently used in the image.
770
771**Rationale:** This way, the recently used colors can sometimes be referred to
772more efficiently than emitting them using the other two methods (described in
773[5.2.1](#prefix-coded-literals) and [5.2.2](#lz77-backward-reference)).
774
775Color cache codes are stored as follows. First, there is a 1-bit value that
776indicates if the color cache is used. If this bit is 0, no color cache codes
777exist, and they are not transmitted in the prefix code that decodes the green
778symbols and the length prefix codes. However, if this bit is 1, the color cache
779size is read next:
780
781~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
782int color_cache_code_bits = ReadBits(4);
783int color_cache_size = 1 << color_cache_code_bits;
784~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
785
786`color_cache_code_bits` defines the size of the color_cache by (1 <<
787`color_cache_code_bits`). The range of allowed values for
788`color_cache_code_bits` is \[1..11\]. Compliant decoders must indicate a
789corrupted bitstream for other values.
790
791A color cache is an array of size `color_cache_size`. Each entry stores one ARGB
792color. Colors are looked up by indexing them by (0x1e35a7bd * `color`) >> (32 -
793`color_cache_code_bits`). Only one lookup is done in a color cache; there is no
794conflict resolution.
795
796In the beginning of decoding or encoding of an image, all entries in all color
797cache values are set to zero. The color cache code is converted to this color at
798decoding time. The state of the color cache is maintained by inserting every
799pixel, be it produced by backward referencing or as literals, into the cache in
800the order they appear in the stream.
801
802
8036 Entropy Code
804--------------
805
806### 6.1 Overview
807
808Most of the data is coded using a [canonical prefix code][canonical_huff].
809Hence, the codes are transmitted by sending the _prefix code lengths_, as
810opposed to the actual _prefix codes_.
811
812In particular, the format uses **spatially-variant prefix coding**. In other
813words, different blocks of the image can potentially use different entropy
814codes.
815
816**Rationale**: Different areas of the image may have different characteristics.
817So, allowing them to use different entropy codes provides more flexibility and
818potentially better compression.
819
820### 6.2 Details
821
822The encoded image data consists of several parts:
823
824  1. Decoding and building the prefix codes \[AMENDED2\]
825  1. Meta prefix codes
826  1. Entropy-coded image data
827
828#### 6.2.1 Decoding and Building the Prefix Codes
829
830There are several steps in decoding the prefix codes.
831
832**Decoding the Code Lengths:**
833{:#decoding-the-code-lengths}
834
835This section describes how to read the prefix code lengths from the bitstream.
836
837The prefix code lengths can be coded in two ways. The method used is specified
838by a 1-bit value.
839
840  * If this bit is 1, it is a _simple code length code_, and
841  * If this bit is 0, it is a _normal code length code_.
842
843In both cases, there can be unused code lengths that are still part of the
844stream. This may be inefficient, but it is allowed by the format.
845
846**(i) Simple Code Length Code:**
847
848\[AMENDED2\]
849
850This variant is used in the special case when only 1 or 2 prefix symbols are in
851the range \[0..255\] with code length `1`. All other prefix code lengths are
852implicitly zeros.
853
854The first bit indicates the number of symbols:
855
856~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
857int num_symbols = ReadBits(1) + 1;
858~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
859
860Following are the symbol values.
861
862This first symbol is coded using 1 or 8 bits depending on the value of
863`is_first_8bits`. The range is \[0..1\] or \[0..255\], respectively. The second
864symbol, if present, is always assumed to be in the range \[0..255\] and coded
865using 8 bits.
866
867~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
868int is_first_8bits = ReadBits(1);
869symbol0 = ReadBits(1 + 7 * is_first_8bits);
870code_lengths[symbol0] = 1;
871if (num_symbols == 2) {
872  symbol1 = ReadBits(8);
873  code_lengths[symbol1] = 1;
874}
875~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
876
877**Note:** Another special case is when _all_ prefix code lengths are _zeros_ (an
878empty prefix code). For example, a prefix code for distance can be empty if
879there are no backward references. Similarly, prefix codes for alpha, red, and
880blue can be empty if all pixels within the same meta prefix code are produced
881using the color cache. However, this case doesn't need special handling, as
882empty prefix codes can be coded as those containing a single symbol `0`.
883
884**(ii) Normal Code Length Code:**
885
886The code lengths of the prefix code fit in 8 bits and are read as follows.
887First, `num_code_lengths` specifies the number of code lengths.
888
889~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
890int num_code_lengths = 4 + ReadBits(4);
891~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
892
893If `num_code_lengths` is > 19, the bitstream is invalid. \[AMENDED3\]
894
895The code lengths are themselves encoded using prefix codes: lower level code
896lengths, `code_length_code_lengths`, first have to be read. The rest of those
897`code_length_code_lengths` (according to the order in `kCodeLengthCodeOrder`)
898are zeros.
899
900~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
901int kCodeLengthCodes = 19;
902int kCodeLengthCodeOrder[kCodeLengthCodes] = {
903  17, 18, 0, 1, 2, 3, 4, 5, 16, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
904};
905int code_length_code_lengths[kCodeLengthCodes] = { 0 };  // All zeros
906for (i = 0; i < num_code_lengths; ++i) {
907  code_length_code_lengths[kCodeLengthCodeOrder[i]] = ReadBits(3);
908}
909~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
910
911Next, if `ReadBits(1) == 0`, the maximum number of different read symbols is
912`num_code_lengths`. Otherwise, it is defined as:
913
914~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
915int length_nbits = 2 + 2 * ReadBits(3);
916int max_symbol = 2 + ReadBits(length_nbits);
917~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
918
919A prefix table is then built from `code_length_code_lengths` and used to read up
920to `max_symbol` code lengths.
921
922  * Code \[0..15\] indicates literal code lengths.
923    * Value 0 means no symbols have been coded.
924    * Values \[1..15\] indicate the bit length of the respective code.
925  * Code 16 repeats the previous non-zero value \[3..6\] times, i.e.,
926    `3 + ReadBits(2)` times. If code 16 is used before a non-zero
927    value has been emitted, a value of 8 is repeated.
928  * Code 17 emits a streak of zeros \[3..10\], i.e., `3 + ReadBits(3)`
929    times.
930  * Code 18 emits a streak of zeros of length \[11..138\], i.e.,
931    `11 + ReadBits(7)` times.
932
933Once code lengths are read, a prefix code for each symbol type (A, R, G, B,
934distance) is formed using their respective alphabet sizes:
935
936  * G channel: 256 + 24 + `color_cache_size`
937  * other literals (A,R,B): 256
938  * distance code: 40
939
940#### 6.2.2 Decoding of Meta Prefix Codes
941
942As noted earlier, the format allows the use of different prefix codes for
943different blocks of the image. _Meta prefix codes_ are indexes identifying which
944prefix codes to use in different parts of the image.
945
946Meta prefix codes may be used _only_ when the image is being used in the
947[role](#roles-of-image-data) of an _ARGB image_.
948
949There are two possibilities for the meta prefix codes, indicated by a 1-bit
950value:
951
952  * If this bit is zero, there is only one meta prefix code used everywhere in
953    the image. No more data is stored.
954  * If this bit is one, the image uses multiple meta prefix codes. These meta
955    prefix codes are stored as an _entropy image_ (described below).
956
957**Entropy image:**
958
959The entropy image defines which prefix codes are used in different parts of the
960image, as described below.
961
962The first 3-bits contain the `prefix_bits` value. The dimensions of the entropy
963image are derived from `prefix_bits`.
964
965~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
966int prefix_bits = ReadBits(3) + 2;
967int prefix_xsize = DIV_ROUND_UP(xsize, 1 << prefix_bits);
968int prefix_ysize = DIV_ROUND_UP(ysize, 1 << prefix_bits);
969~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
970
971where `DIV_ROUND_UP` is as defined [earlier](#predictor-transform).
972
973The next bits contain an entropy image of width `prefix_xsize` and height
974`prefix_ysize`.
975
976**Interpretation of Meta Prefix Codes:**
977
978For any given pixel (x, y), there is a set of five prefix codes associated with
979it. These codes are (in bitstream order):
980
981  * **Prefix code #1**: used for green channel, backward-reference length and
982    color cache.
983  * **Prefix code #2, #3 and #4**: used for red, blue and alpha channels
984    respectively.
985  * **Prefix code #5**: used for backward-reference distance.
986
987From here on, we refer to this set as a **prefix code group**.
988
989The number of prefix code groups in the ARGB image can be obtained by finding
990the _largest meta prefix code_ from the entropy image:
991
992~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
993int num_prefix_groups = max(entropy image) + 1;
994~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
995where `max(entropy image)` indicates the largest prefix code stored in the
996entropy image.
997
998As each prefix code group contains five prefix codes, the total number of prefix
999codes is:
1000
1001~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1002int num_prefix_codes = 5 * num_prefix_groups;
1003~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1004
1005Given a pixel (x, y) in the ARGB image, we can obtain the corresponding prefix
1006codes to be used as follows:
1007
1008~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1009int position =
1010    (y >> prefix_bits) * prefix_xsize + (x >> prefix_bits);
1011int meta_prefix_code = (entropy_image[pos] >> 8) & 0xffff;
1012PrefixCodeGroup prefix_group = prefix_code_groups[meta_prefix_code];
1013~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1014
1015where, we have assumed the existence of `PrefixCodeGroup` structure, which
1016represents a set of five prefix codes. Also, `prefix_code_groups` is an array of
1017`PrefixCodeGroup` (of size `num_prefix_groups`).
1018
1019The decoder then uses prefix code group `prefix_group` to decode the pixel
1020(x, y) as explained in the [next section](#decoding-entropy-coded-image-data).
1021
1022#### 6.2.3 Decoding Entropy-coded Image Data
1023
1024\[AMENDED2\]
1025
1026For the current position (x, y) in the image, the decoder first identifies the
1027corresponding prefix code group (as explained in the last section). Given the
1028prefix code group, the pixel is read and decoded as follows:
1029
1030Read the next symbol S from the bitstream using prefix code #1. Note that S is
1031any integer in the range `0` to
1032`(256 + 24 + ` [`color_cache_size`](#color-cache-code)` - 1)`.
1033
1034The interpretation of S depends on its value:
1035
1036  1. if S < 256
1037     1. Use S as the green component.
1038     1. Read red from the bitstream using prefix code #2.
1039     1. Read blue from the bitstream using prefix code #3.
1040     1. Read alpha from the bitstream using prefix code #4.
1041  1. if S >= 256 && S < 256 + 24
1042     1. Use S - 256 as a length prefix code.
1043     1. Read extra bits for length from the bitstream.
1044     1. Determine backward-reference length L from length prefix code and the
1045        extra bits read.
1046     1. Read distance prefix code from the bitstream using prefix code #5.
1047     1. Read extra bits for distance from the bitstream.
1048     1. Determine backward-reference distance D from distance prefix code and
1049        the extra bits read.
1050     1. Copy the L pixels (in scan-line order) from the sequence of pixels
1051        prior to them by D pixels.
1052  1. if S >= 256 + 24
1053     1. Use S - (256 + 24) as the index into the color cache.
1054     1. Get ARGB color from the color cache at that index.
1055
1056
10577 Overall Structure of the Format
1058---------------------------------
1059
1060Below is a view into the format in Augmented Backus-Naur Form ([ABNF]). It does
1061not cover all details. End-of-image (EOI) is only implicitly coded into the
1062number of pixels (xsize * ysize).
1063
1064
1065#### 7.1 Basic Structure
1066
1067~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1068format       = RIFF-header image-size image-stream
1069RIFF-header  = "RIFF" 4OCTET "WEBP" "VP8L" 4OCTET %x2F
1070image-size   = 14BIT 14BIT ; width - 1, height - 1
1071image-stream = optional-transform spatially-coded-image
1072~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1073
1074
1075#### 7.2 Structure of Transforms
1076
1077~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1078optional-transform   =  (%b1 transform optional-transform) / %b0
1079transform            =  predictor-tx / color-tx / subtract-green-tx
1080transform            =/ color-indexing-tx
1081
1082predictor-tx         =  %b00 predictor-image
1083predictor-image      =  3BIT ; sub-pixel code
1084                        entropy-coded-image
1085
1086color-tx             =  %b01 color-image
1087color-image          =  3BIT ; sub-pixel code
1088                        entropy-coded-image
1089
1090subtract-green-tx    =  %b10
1091
1092color-indexing-tx    =  %b11 color-indexing-image
1093color-indexing-image =  8BIT ; color count
1094                        entropy-coded-image
1095~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1096
1097
1098#### 7.3 Structure of the Image Data
1099
1100\[AMENDED2\]
1101
1102~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1103spatially-coded-image =  color-cache-info meta-prefix data
1104entropy-coded-image   =  color-cache-info data
1105
1106color-cache-info      =  %b0
1107color-cache-info      =/ (%b1 4BIT) ; 1 followed by color cache size
1108
1109meta-prefix           =  %b0 / (%b1 entropy-image)
1110
1111data                  =  prefix-codes lz77-coded-image
1112entropy-image         =  3BIT ; subsample value
1113                         entropy-coded-image
1114
1115prefix-codes          =  prefix-code-group *prefix-codes
1116prefix-code-group     =
1117    5prefix-code ; See "Interpretation of Meta Prefix Codes" to
1118                 ; understand what each of these five prefix
1119                 ; codes are for.
1120
1121prefix-code           =  simple-prefix-code / normal-prefix-code
1122simple-prefix-code    =  ; see "Simple Code Length Code" for details
1123normal-prefix-code    =  code-length-code encoded-code-lengths
1124code-length-code      =  ; see section "Normal Code Length Code"
1125
1126lz77-coded-image      =
1127    *((argb-pixel / lz77-copy / color-cache-code) lz77-coded-image)
1128~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1129
1130A possible example sequence:
1131
1132~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1133RIFF-header image-size %b1 subtract-green-tx
1134%b1 predictor-tx %b0 color-cache-info
1135%b0 prefix-codes lz77-coded-image
1136~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1137
1138[ABNF]: https://www.rfc-editor.org/rfc/rfc5234
1139[canonical_huff]: https://en.wikipedia.org/wiki/Canonical_Huffman_code
1140