1<!-- 2 3Although you may be viewing an alternate representation, this document is 4sourced in Markdown, a light-duty markup scheme, and is optimized for the 5[kramdown](https://kramdown.gettalong.org/) transformer. 6 7See the accompanying specs_generation.md. External link targets are referenced 8at the end of this file. 9 10--> 11 12Specification for WebP Lossless Bitstream 13========================================= 14 15_Jyrki Alakuijala, Ph.D., Google, Inc., 2012-06-19_ 16 17Paragraphs marked as \[AMENDED\] were amended on 2014-09-16. 18 19Paragraphs marked as \[AMENDED2\] were amended on 2022-05-13. 20 21Paragraphs marked as \[AMENDED3\] were amended on 2022-11-21. 22 23Abstract 24-------- 25 26WebP lossless is an image format for lossless compression of ARGB images. The 27lossless format stores and restores the pixel values exactly, including the 28color values for pixels whose alpha value is 0. The format uses subresolution 29images, recursively embedded into the format itself, for storing statistical 30data about the images, such as the used entropy codes, spatial predictors, color 31space conversion, and color table. LZ77, prefix coding, and a color cache are 32used for compression of the bulk data. Decoding speeds faster than PNG have been 33demonstrated, as well as 25% denser compression than can be achieved using 34today's PNG format. 35 36 37* TOC placeholder 38{:toc} 39 40 411 Introduction 42-------------- 43 44This document describes the compressed data representation of a WebP lossless 45image. It is intended as a detailed reference for the WebP lossless encoder and 46decoder implementation. 47 48In this document, we extensively use C programming language syntax to describe 49the bitstream, and assume the existence of a function for reading bits, 50`ReadBits(n)`. The bytes are read in the natural order of the stream containing 51them, and bits of each byte are read in least-significant-bit-first order. When 52multiple bits are read at the same time, the integer is constructed from the 53original data in the original order. The most significant bits of the returned 54integer are also the most significant bits of the original data. Thus, the 55statement 56 57~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 58b = ReadBits(2); 59~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 60 61is equivalent with the two statements below: 62 63~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 64b = ReadBits(1); 65b |= ReadBits(1) << 1; 66~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 67 68We assume that each color component (e.g. alpha, red, blue and green) is 69represented using an 8-bit byte. We define the corresponding type as uint8. A 70whole ARGB pixel is represented by a type called uint32, an unsigned integer 71consisting of 32 bits. In the code showing the behavior of the transformations, 72alpha value is codified in bits 31..24, red in bits 23..16, green in bits 15..8 73and blue in bits 7..0, but implementations of the format are free to use another 74representation internally. 75 76Broadly, a WebP lossless image contains header data, transform information and 77actual image data. Headers contain width and height of the image. A WebP 78lossless image can go through four different types of transformation before 79being entropy encoded. The transform information in the bitstream contains the 80data required to apply the respective inverse transforms. 81 82 832 Nomenclature 84-------------- 85 86ARGB 87: A pixel value consisting of alpha, red, green, and blue values. 88 89ARGB image 90: A two-dimensional array containing ARGB pixels. 91 92color cache 93: A small hash-addressed array to store recently used colors, to be able to 94 recall them with shorter codes. 95 96color indexing image 97: A one-dimensional image of colors that can be indexed using a small integer 98 (up to 256 within WebP lossless). 99 100color transform image 101: A two-dimensional subresolution image containing data about correlations of 102 color components. 103 104distance mapping 105: Changes LZ77 distances to have the smallest values for pixels in 2D 106 proximity. 107 108entropy image 109: A two-dimensional subresolution image indicating which entropy coding should 110 be used in a respective square in the image, i.e., each pixel is a meta 111 prefix code. 112 113prefix code 114: A classic way to do entropy coding where a smaller number of bits are used 115 for more frequent codes. 116 117LZ77 118: Dictionary-based sliding window compression algorithm that either emits 119 symbols or describes them as sequences of past symbols. 120 121meta prefix code 122: A small integer (up to 16 bits) that indexes an element in the meta prefix 123 table. 124 125predictor image 126: A two-dimensional subresolution image indicating which spatial predictor is 127 used for a particular square in the image. 128 129prefix coding 130: A way to entropy code larger integers that codes a few bits of the integer 131 using an entropy code and codifies the remaining bits raw. This allows for 132 the descriptions of the entropy codes to remain relatively small even when 133 the range of symbols is large. 134 135scan-line order 136: A processing order of pixels, left-to-right, top-to-bottom, starting from 137 the left-hand-top pixel, proceeding to the right. Once a row is completed, 138 continue from the left-hand column of the next row. 139 1403 RIFF Header 141------------- 142 143The beginning of the header has the RIFF container. This consists of the 144following 21 bytes: 145 146 1. String "RIFF" 147 2. A little-endian 32 bit value of the block length, the whole size 148 of the block controlled by the RIFF header. Normally this equals 149 the payload size (file size minus 8 bytes: 4 bytes for the 'RIFF' 150 identifier and 4 bytes for storing the value itself). 151 3. String "WEBP" (RIFF container name). 152 4. String "VP8L" (chunk tag for lossless encoded image data). 153 5. A little-endian 32-bit value of the number of bytes in the 154 lossless stream. 155 6. One byte signature 0x2f. 156 157The first 28 bits of the bitstream specify the width and height of the image. 158Width and height are decoded as 14-bit integers as follows: 159 160~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 161int image_width = ReadBits(14) + 1; 162int image_height = ReadBits(14) + 1; 163~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 164 165The 14-bit dynamics for image size limit the maximum size of a WebP lossless 166image to 16384✕16384 pixels. 167 168The alpha_is_used bit is a hint only, and should not impact decoding. It should 169be set to 0 when all alpha values are 255 in the picture, and 1 otherwise. 170 171~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 172int alpha_is_used = ReadBits(1); 173~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 174 175The version_number is a 3 bit code that must be set to 0. Any other value should 176be treated as an error. \[AMENDED\] 177 178~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 179int version_number = ReadBits(3); 180~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 181 182 1834 Transformations 184----------------- 185 186Transformations are reversible manipulations of the image data that can reduce 187the remaining symbolic entropy by modeling spatial and color correlations. 188Transformations can make the final compression more dense. 189 190An image can go through four types of transformation. A 1 bit indicates the 191presence of a transform. Each transform is allowed to be used only once. The 192transformations are used only for the main level ARGB image: the subresolution 193images have no transforms, not even the 0 bit indicating the end-of-transforms. 194 195Typically, an encoder would use these transforms to reduce the Shannon entropy 196in the residual image. Also, the transform data can be decided based on entropy 197minimization. 198 199~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 200while (ReadBits(1)) { // Transform present. 201 // Decode transform type. 202 enum TransformType transform_type = ReadBits(2); 203 // Decode transform data. 204 ... 205} 206 207// Decode actual image data (Section 4). 208~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 209 210If a transform is present then the next two bits specify the transform type. 211There are four types of transforms. 212 213~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 214enum TransformType { 215 PREDICTOR_TRANSFORM = 0, 216 COLOR_TRANSFORM = 1, 217 SUBTRACT_GREEN_TRANSFORM = 2, 218 COLOR_INDEXING_TRANSFORM = 3, 219}; 220~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 221 222The transform type is followed by the transform data. Transform data contains 223the information required to apply the inverse transform and depends on the 224transform type. Next we describe the transform data for different types. 225 226 227### 4.1 Predictor Transform 228 229The predictor transform can be used to reduce entropy by exploiting the fact 230that neighboring pixels are often correlated. In the predictor transform, the 231current pixel value is predicted from the pixels already decoded (in scan-line 232order) and only the residual value (actual - predicted) is encoded. The 233_prediction mode_ determines the type of prediction to use. We divide the image 234into squares and all the pixels in a square use the same prediction mode. 235 236The first 3 bits of prediction data define the block width and height in number 237of bits. The number of block columns, `block_xsize`, is used in indexing 238two-dimensionally. 239 240~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 241int size_bits = ReadBits(3) + 2; 242int block_width = (1 << size_bits); 243int block_height = (1 << size_bits); 244#define DIV_ROUND_UP(num, den) ((num) + (den) - 1) / (den)) 245int block_xsize = DIV_ROUND_UP(image_width, 1 << size_bits); 246~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 247 248The transform data contains the prediction mode for each block of the image. All 249the `block_width * block_height` pixels of a block use same prediction mode. The 250prediction modes are treated as pixels of an image and encoded using the same 251techniques described in [Chapter 5](#image-data). 252 253For a pixel _x, y_, one can compute the respective filter block address by: 254 255~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 256int block_index = (y >> size_bits) * block_xsize + 257 (x >> size_bits); 258~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 259 260There are 14 different prediction modes. In each prediction mode, the current 261pixel value is predicted from one or more neighboring pixels whose values are 262already known. 263 264We choose the neighboring pixels (TL, T, TR, and L) of the current pixel (P) as 265follows: 266 267~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 268O O O O O O O O O O O 269O O O O O O O O O O O 270O O O O TL T TR O O O O 271O O O O L P X X X X X 272X X X X X X X X X X X 273X X X X X X X X X X X 274~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 275 276where TL means top-left, T top, TR top-right, L left pixel. At the time of 277predicting a value for P, all pixels O, TL, T, TR and L have already been 278processed, and pixel P and all pixels X are unknown. 279 280Given the above neighboring pixels, the different prediction modes are defined 281as follows. 282 283| Mode | Predicted value of each channel of the current pixel | 284| ------ | ------------------------------------------------------- | 285| 0 | 0xff000000 (represents solid black color in ARGB) | 286| 1 | L | 287| 2 | T | 288| 3 | TR | 289| 4 | TL | 290| 5 | Average2(Average2(L, TR), T) | 291| 6 | Average2(L, TL) | 292| 7 | Average2(L, T) | 293| 8 | Average2(TL, T) | 294| 9 | Average2(T, TR) | 295| 10 | Average2(Average2(L, TL), Average2(T, TR)) | 296| 11 | Select(L, T, TL) | 297| 12 | ClampAddSubtractFull(L, T, TL) | 298| 13 | ClampAddSubtractHalf(Average2(L, T), TL) | 299 300 301`Average2` is defined as follows for each ARGB component: 302 303~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 304uint8 Average2(uint8 a, uint8 b) { 305 return (a + b) / 2; 306} 307~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 308 309The Select predictor is defined as follows: 310 311~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 312uint32 Select(uint32 L, uint32 T, uint32 TL) { 313 // L = left pixel, T = top pixel, TL = top left pixel. 314 315 // ARGB component estimates for prediction. 316 int pAlpha = ALPHA(L) + ALPHA(T) - ALPHA(TL); 317 int pRed = RED(L) + RED(T) - RED(TL); 318 int pGreen = GREEN(L) + GREEN(T) - GREEN(TL); 319 int pBlue = BLUE(L) + BLUE(T) - BLUE(TL); 320 321 // Manhattan distances to estimates for left and top pixels. 322 int pL = abs(pAlpha - ALPHA(L)) + abs(pRed - RED(L)) + 323 abs(pGreen - GREEN(L)) + abs(pBlue - BLUE(L)); 324 int pT = abs(pAlpha - ALPHA(T)) + abs(pRed - RED(T)) + 325 abs(pGreen - GREEN(T)) + abs(pBlue - BLUE(T)); 326 327 // Return either left or top, the one closer to the prediction. 328 if (pL < pT) { // \[AMENDED\] 329 return L; 330 } else { 331 return T; 332 } 333} 334~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 335 336The functions `ClampAddSubtractFull` and `ClampAddSubtractHalf` are performed 337for each ARGB component as follows: 338 339~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 340// Clamp the input value between 0 and 255. 341int Clamp(int a) { 342 return (a < 0) ? 0 : (a > 255) ? 255 : a; 343} 344~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 345 346~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 347int ClampAddSubtractFull(int a, int b, int c) { 348 return Clamp(a + b - c); 349} 350~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 351 352~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 353int ClampAddSubtractHalf(int a, int b) { 354 return Clamp(a + (a - b) / 2); 355} 356~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 357 358There are special handling rules for some border pixels. If there is a 359prediction transform, regardless of the mode \[0..13\] for these pixels, the 360predicted value for the left-topmost pixel of the image is 0xff000000, L-pixel 361for all pixels on the top row, and T-pixel for all pixels on the leftmost 362column. 363 364\[AMENDED2\] Addressing the TR-pixel for pixels on the rightmost column is 365exceptional. The pixels on the rightmost column are predicted by using the modes 366\[0..13\] just like pixels not on the border, but the leftmost pixel on the same 367row as the current pixel is instead used as the TR-pixel. 368 369 370### 4.2 Color Transform 371 372\[AMENDED2\] 373 374The goal of the color transform is to decorrelate the R, G and B values of each 375pixel. The color transform keeps the green (G) value as it is, transforms red 376(R) based on green and transforms blue (B) based on green and then based on red. 377 378As is the case for the predictor transform, first the image is divided into 379blocks and the same transform mode is used for all the pixels in a block. For 380each block there are three types of color transform elements. 381 382~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 383typedef struct { 384 uint8 green_to_red; 385 uint8 green_to_blue; 386 uint8 red_to_blue; 387} ColorTransformElement; 388~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 389 390The actual color transformation is done by defining a color transform delta. The 391color transform delta depends on the `ColorTransformElement`, which is the same 392for all the pixels in a particular block. The delta is subtracted during the 393color transform. The inverse color transform then is just adding those deltas. 394 395The color transform function is defined as follows: 396 397~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 398void ColorTransform(uint8 red, uint8 blue, uint8 green, 399 ColorTransformElement *trans, 400 uint8 *new_red, uint8 *new_blue) { 401 // Transformed values of red and blue components 402 int tmp_red = red; 403 int tmp_blue = blue; 404 405 // Applying the transform is just subtracting the transform deltas 406 tmp_red -= ColorTransformDelta(trans->green_to_red_, green); 407 tmp_blue -= ColorTransformDelta(trans->green_to_blue_, green); 408 tmp_blue -= ColorTransformDelta(trans->red_to_blue_, red); 409 410 *new_red = tmp_red & 0xff; 411 *new_blue = tmp_blue & 0xff; 412} 413~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 414 415`ColorTransformDelta` is computed using a signed 8-bit integer representing a 4163.5-fixed-point number, and a signed 8-bit RGB color channel (c) \[-128..127\] 417and is defined as follows: 418 419~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 420int8 ColorTransformDelta(int8 t, int8 c) { 421 return (t * c) >> 5; 422} 423~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 424 425A conversion from the 8-bit unsigned representation (uint8) to the 8-bit signed 426one (int8) is required before calling `ColorTransformDelta()`. It should be 427performed using 8-bit two's complement (that is: uint8 range \[128..255\] is 428mapped to the \[-128..-1\] range of its converted int8 value). 429 430The multiplication is to be done using more precision (with at least 16-bit 431dynamics). The sign extension property of the shift operation does not matter 432here: only the lowest 8 bits are used from the result, and there the sign 433extension shifting and unsigned shifting are consistent with each other. 434 435Now we describe the contents of color transform data so that decoding can apply 436the inverse color transform and recover the original red and blue values. The 437first 3 bits of the color transform data contain the width and height of the 438image block in number of bits, just like the predictor transform: 439 440~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 441int size_bits = ReadBits(3) + 2; 442int block_width = 1 << size_bits; 443int block_height = 1 << size_bits; 444~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 445 446The remaining part of the color transform data contains `ColorTransformElement` 447instances corresponding to each block of the image. `ColorTransformElement` 448instances are treated as pixels of an image and encoded using the methods 449described in [Chapter 5](#image-data). 450 451During decoding, `ColorTransformElement` instances of the blocks are decoded and 452the inverse color transform is applied on the ARGB values of the pixels. As 453mentioned earlier, that inverse color transform is just adding 454`ColorTransformElement` values to the red and blue channels. \[AMENDED3\] 455 456~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 457void InverseTransform(uint8 red, uint8 green, uint8 blue, 458 ColorTransformElement *trans, 459 uint8 *new_red, uint8 *new_blue) { 460 // Transformed values of red and blue components 461 int tmp_red = red; 462 int tmp_blue = blue; 463 464 // Applying the inverse transform is just adding the 465 // color transform deltas 466 tmp_red += ColorTransformDelta(trans->green_to_red, green); 467 tmp_blue += ColorTransformDelta(trans->green_to_blue, green); 468 tmp_blue += 469 ColorTransformDelta(trans->red_to_blue, tmp_red & 0xff); 470 471 *new_red = tmp_red & 0xff; 472 *new_blue = tmp_blue & 0xff; 473} 474~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 475 476 477### 4.3 Subtract Green Transform 478 479The subtract green transform subtracts green values from red and blue values of 480each pixel. When this transform is present, the decoder needs to add the green 481value to both red and blue. There is no data associated with this transform. The 482decoder applies the inverse transform as follows: 483 484~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 485void AddGreenToBlueAndRed(uint8 green, uint8 *red, uint8 *blue) { 486 *red = (*red + green) & 0xff; 487 *blue = (*blue + green) & 0xff; 488} 489~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 490 491This transform is redundant as it can be modeled using the color transform, but 492it is still often useful. Since it can extend the dynamics of the color 493transform and there is no additional data here, the subtract green transform can 494be coded using fewer bits than a full-blown color transform. 495 496 497### 4.4 Color Indexing Transform 498 499If there are not many unique pixel values, it may be more efficient to create a 500color index array and replace the pixel values by the array's indices. The color 501indexing transform achieves this. (In the context of WebP lossless, we 502specifically do not call this a palette transform because a similar but more 503dynamic concept exists in WebP lossless encoding: color cache). 504 505The color indexing transform checks for the number of unique ARGB values in the 506image. If that number is below a threshold (256), it creates an array of those 507ARGB values, which is then used to replace the pixel values with the 508corresponding index: the green channel of the pixels are replaced with the 509index; all alpha values are set to 255; all red and blue values to 0. 510 511The transform data contains color table size and the entries in the color table. 512The decoder reads the color indexing transform data as follows: 513 514~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 515// 8 bit value for color table size 516int color_table_size = ReadBits(8) + 1; 517~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 518 519The color table is stored using the image storage format itself. The color table 520can be obtained by reading an image, without the RIFF header, image size, and 521transforms, assuming a height of one pixel and a width of `color_table_size`. 522The color table is always subtraction-coded to reduce image entropy. The deltas 523of palette colors contain typically much less entropy than the colors 524themselves, leading to significant savings for smaller images. In decoding, 525every final color in the color table can be obtained by adding the previous 526color component values by each ARGB component separately, and storing the least 527significant 8 bits of the result. 528 529The inverse transform for the image is simply replacing the pixel values (which 530are indices to the color table) with the actual color table values. The indexing 531is done based on the green component of the ARGB color. 532 533~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 534// Inverse transform 535argb = color_table[GREEN(argb)]; 536~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 537 538If the index is equal or larger than `color_table_size`, the argb color value 539should be set to 0x00000000 (transparent black). \[AMENDED\] 540 541When the color table is small (equal to or less than 16 colors), several pixels 542are bundled into a single pixel. The pixel bundling packs several (2, 4, or 8) 543pixels into a single pixel, reducing the image width respectively. Pixel 544bundling allows for a more efficient joint distribution entropy coding of 545neighboring pixels, and gives some arithmetic coding-like benefits to the 546entropy code, but it can only be used when there are 16 or fewer unique values. 547 548`color_table_size` specifies how many pixels are combined: 549 550~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 551int width_bits; 552if (color_table_size <= 2) { 553 width_bits = 3; 554} else if (color_table_size <= 4) { 555 width_bits = 2; 556} else if (color_table_size <= 16) { 557 width_bits = 1; 558} else { 559 width_bits = 0; 560} 561~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 562 563`width_bits` has a value of 0, 1, 2 or 3. A value of 0 indicates no pixel 564bundling is to be done for the image. A value of 1 indicates that two pixels are 565combined, and each pixel has a range of \[0..15\]. A value of 2 indicates that 566four pixels are combined, and each pixel has a range of \[0..3\]. A value of 3 567indicates that eight pixels are combined and each pixel has a range of \[0..1\], 568i.e., a binary value. 569 570The values are packed into the green component as follows: 571 572 * `width_bits` = 1: for every x value where x ≡ 0 (mod 2), a green 573 value at x is positioned into the 4 least-significant bits of the 574 green value at x / 2, a green value at x + 1 is positioned into the 575 4 most-significant bits of the green value at x / 2. 576 * `width_bits` = 2: for every x value where x ≡ 0 (mod 4), a green 577 value at x is positioned into the 2 least-significant bits of the 578 green value at x / 4, green values at x + 1 to x + 3 are positioned in order 579 to the more significant bits of the green value at x / 4. 580 * `width_bits` = 3: for every x value where x ≡ 0 (mod 8), a green 581 value at x is positioned into the least-significant bit of the green 582 value at x / 8, green values at x + 1 to x + 7 are positioned in order to 583 the more significant bits of the green value at x / 8. 584 585 5865 Image Data 587------------ 588 589Image data is an array of pixel values in scan-line order. 590 591### 5.1 Roles of Image Data 592 593We use image data in five different roles: 594 595 1. ARGB image: Stores the actual pixels of the image. 596 1. Entropy image: Stores the 597 [meta prefix codes](#decoding-of-meta-prefix-codes). The red and green 598 components of a pixel define the meta prefix code used in a particular 599 block of the ARGB image. 600 1. Predictor image: Stores the metadata for 601 [Predictor Transform](#predictor-transform). The green component of a pixel 602 defines which of the 14 predictors is used within a particular block of the 603 ARGB image. 604 1. Color transform image. It is created by `ColorTransformElement` values 605 (defined in [Color Transform](#color-transform)) for different blocks of 606 the image. Each `ColorTransformElement` `'cte'` is treated as a pixel whose 607 alpha component is `255`, red component is `cte.red_to_blue`, green 608 component is `cte.green_to_blue` and blue component is `cte.green_to_red`. 609 1. Color indexing image: An array of size `color_table_size` (up to 256 610 ARGB values) storing the metadata for the 611 [Color Indexing Transform](#color-indexing-transform). This is stored as an 612 image of width `color_table_size` and height `1`. 613 614### 5.2 Encoding of Image Data 615 616The encoding of image data is independent of its role. 617 618The image is first divided into a set of fixed-size blocks (typically 16x16 619blocks). Each of these blocks are modeled using their own entropy codes. Also, 620several blocks may share the same entropy codes. 621 622**Rationale:** Storing an entropy code incurs a cost. This cost can be minimized 623if statistically similar blocks share an entropy code, thereby storing that code 624only once. For example, an encoder can find similar blocks by clustering them 625using their statistical properties, or by repeatedly joining a pair of randomly 626selected clusters when it reduces the overall amount of bits needed to encode 627the image. 628 629Each pixel is encoded using one of the three possible methods: 630 631 1. Prefix coded literal: each channel (green, red, blue and alpha) is 632 entropy-coded independently; 633 2. LZ77 backward reference: a sequence of pixels are copied from elsewhere 634 in the image; or 635 3. Color cache code: using a short multiplicative hash code (color cache 636 index) of a recently seen color. 637 638The following subsections describe each of these in detail. 639 640#### 5.2.1 Prefix Coded Literals 641 642The pixel is stored as prefix coded values of green, red, blue and alpha (in 643that order). See [this section](#decoding-entropy-coded-image-data) for details. 644 645#### 5.2.2 LZ77 Backward Reference 646 647Backward references are tuples of _length_ and _distance code_: 648 649 * Length indicates how many pixels in scan-line order are to be copied. 650 * Distance code is a number indicating the position of a previously seen 651 pixel, from which the pixels are to be copied. The exact mapping is 652 described [below](#distance-mapping). 653 654The length and distance values are stored using **LZ77 prefix coding**. 655 656LZ77 prefix coding divides large integer values into two parts: the _prefix 657code_ and the _extra bits_: the prefix code is stored using an entropy code, 658while the extra bits are stored as they are (without an entropy code). 659 660**Rationale**: This approach reduces the storage requirement for the entropy 661code. Also, large values are usually rare, and so extra bits would be used for 662very few values in the image. Thus, this approach results in better compression 663overall. 664 665The following table denotes the prefix codes and extra bits used for storing 666different ranges of values. 667 668Note: The maximum backward reference length is limited to 4096. Hence, only the 669first 24 prefix codes (with the respective extra bits) are meaningful for length 670values. For distance values, however, all the 40 prefix codes are valid. 671 672| Value range | Prefix code | Extra bits | 673| --------------- | ----------- | ---------- | 674| 1 | 0 | 0 | 675| 2 | 1 | 0 | 676| 3 | 2 | 0 | 677| 4 | 3 | 0 | 678| 5..6 | 4 | 1 | 679| 7..8 | 5 | 1 | 680| 9..12 | 6 | 2 | 681| 13..16 | 7 | 2 | 682| ... | ... | ... | 683| 3072..4096 | 23 | 10 | 684| ... | ... | ... | 685| 524289..786432 | 38 | 18 | 686| 786433..1048576 | 39 | 18 | 687 688The pseudocode to obtain a (length or distance) value from the prefix code is as 689follows: 690 691~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 692if (prefix_code < 4) { 693 return prefix_code + 1; 694} 695int extra_bits = (prefix_code - 2) >> 1; 696int offset = (2 + (prefix_code & 1)) << extra_bits; 697return offset + ReadBits(extra_bits) + 1; 698~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 699 700**Distance Mapping:** 701{:#distance-mapping} 702 703As noted previously, a distance code is a number indicating the position of a 704previously seen pixel, from which the pixels are to be copied. This subsection 705defines the mapping between a distance code and the position of a previous 706pixel. 707 708Distance codes larger than 120 denote the pixel-distance in scan-line order, 709offset by 120. 710 711The smallest distance codes \[1..120\] are special, and are reserved for a close 712neighborhood of the current pixel. This neighborhood consists of 120 pixels: 713 714 * Pixels that are 1 to 7 rows above the current pixel, and are up to 8 columns 715 to the left or up to 7 columns to the right of the current pixel. \[Total 716 such pixels = `7 * (8 + 1 + 7) = 112`\]. 717 * Pixels that are in same row as the current pixel, and are up to 8 columns to 718 the left of the current pixel. \[`8` such pixels\]. 719 720The mapping between distance code `i` and the neighboring pixel offset 721`(xi, yi)` is as follows: 722 723~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 724(0, 1), (1, 0), (1, 1), (-1, 1), (0, 2), (2, 0), (1, 2), 725(-1, 2), (2, 1), (-2, 1), (2, 2), (-2, 2), (0, 3), (3, 0), 726(1, 3), (-1, 3), (3, 1), (-3, 1), (2, 3), (-2, 3), (3, 2), 727(-3, 2), (0, 4), (4, 0), (1, 4), (-1, 4), (4, 1), (-4, 1), 728(3, 3), (-3, 3), (2, 4), (-2, 4), (4, 2), (-4, 2), (0, 5), 729(3, 4), (-3, 4), (4, 3), (-4, 3), (5, 0), (1, 5), (-1, 5), 730(5, 1), (-5, 1), (2, 5), (-2, 5), (5, 2), (-5, 2), (4, 4), 731(-4, 4), (3, 5), (-3, 5), (5, 3), (-5, 3), (0, 6), (6, 0), 732(1, 6), (-1, 6), (6, 1), (-6, 1), (2, 6), (-2, 6), (6, 2), 733(-6, 2), (4, 5), (-4, 5), (5, 4), (-5, 4), (3, 6), (-3, 6), 734(6, 3), (-6, 3), (0, 7), (7, 0), (1, 7), (-1, 7), (5, 5), 735(-5, 5), (7, 1), (-7, 1), (4, 6), (-4, 6), (6, 4), (-6, 4), 736(2, 7), (-2, 7), (7, 2), (-7, 2), (3, 7), (-3, 7), (7, 3), 737(-7, 3), (5, 6), (-5, 6), (6, 5), (-6, 5), (8, 0), (4, 7), 738(-4, 7), (7, 4), (-7, 4), (8, 1), (8, 2), (6, 6), (-6, 6), 739(8, 3), (5, 7), (-5, 7), (7, 5), (-7, 5), (8, 4), (6, 7), 740(-6, 7), (7, 6), (-7, 6), (8, 5), (7, 7), (-7, 7), (8, 6), 741(8, 7) 742~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 743 744For example, the distance code `1` indicates an offset of `(0, 1)` for the 745neighboring pixel, that is, the pixel above the current pixel (0 pixel 746difference in the X-direction and 1 pixel difference in the Y-direction). 747Similarly, the distance code `3` indicates the left-top pixel. 748 749The decoder can convert a distance code `i` to a scan-line order distance `dist` 750as follows: 751 752\[AMENDED3\] 753 754~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 755(xi, yi) = distance_map[i - 1] 756dist = xi + yi * xsize 757if (dist < 1) { 758 dist = 1 759} 760~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 761 762where `distance_map` is the mapping noted above and `xsize` is the width of the 763image in pixels. 764 765 766#### 5.2.3 Color Cache Coding 767{:#color-cache-code} 768 769Color cache stores a set of colors that have been recently used in the image. 770 771**Rationale:** This way, the recently used colors can sometimes be referred to 772more efficiently than emitting them using the other two methods (described in 773[5.2.1](#prefix-coded-literals) and [5.2.2](#lz77-backward-reference)). 774 775Color cache codes are stored as follows. First, there is a 1-bit value that 776indicates if the color cache is used. If this bit is 0, no color cache codes 777exist, and they are not transmitted in the prefix code that decodes the green 778symbols and the length prefix codes. However, if this bit is 1, the color cache 779size is read next: 780 781~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 782int color_cache_code_bits = ReadBits(4); 783int color_cache_size = 1 << color_cache_code_bits; 784~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 785 786`color_cache_code_bits` defines the size of the color_cache by (1 << 787`color_cache_code_bits`). The range of allowed values for 788`color_cache_code_bits` is \[1..11\]. Compliant decoders must indicate a 789corrupted bitstream for other values. 790 791A color cache is an array of size `color_cache_size`. Each entry stores one ARGB 792color. Colors are looked up by indexing them by (0x1e35a7bd * `color`) >> (32 - 793`color_cache_code_bits`). Only one lookup is done in a color cache; there is no 794conflict resolution. 795 796In the beginning of decoding or encoding of an image, all entries in all color 797cache values are set to zero. The color cache code is converted to this color at 798decoding time. The state of the color cache is maintained by inserting every 799pixel, be it produced by backward referencing or as literals, into the cache in 800the order they appear in the stream. 801 802 8036 Entropy Code 804-------------- 805 806### 6.1 Overview 807 808Most of the data is coded using a [canonical prefix code][canonical_huff]. 809Hence, the codes are transmitted by sending the _prefix code lengths_, as 810opposed to the actual _prefix codes_. 811 812In particular, the format uses **spatially-variant prefix coding**. In other 813words, different blocks of the image can potentially use different entropy 814codes. 815 816**Rationale**: Different areas of the image may have different characteristics. 817So, allowing them to use different entropy codes provides more flexibility and 818potentially better compression. 819 820### 6.2 Details 821 822The encoded image data consists of several parts: 823 824 1. Decoding and building the prefix codes \[AMENDED2\] 825 1. Meta prefix codes 826 1. Entropy-coded image data 827 828#### 6.2.1 Decoding and Building the Prefix Codes 829 830There are several steps in decoding the prefix codes. 831 832**Decoding the Code Lengths:** 833{:#decoding-the-code-lengths} 834 835This section describes how to read the prefix code lengths from the bitstream. 836 837The prefix code lengths can be coded in two ways. The method used is specified 838by a 1-bit value. 839 840 * If this bit is 1, it is a _simple code length code_, and 841 * If this bit is 0, it is a _normal code length code_. 842 843In both cases, there can be unused code lengths that are still part of the 844stream. This may be inefficient, but it is allowed by the format. 845 846**(i) Simple Code Length Code:** 847 848\[AMENDED2\] 849 850This variant is used in the special case when only 1 or 2 prefix symbols are in 851the range \[0..255\] with code length `1`. All other prefix code lengths are 852implicitly zeros. 853 854The first bit indicates the number of symbols: 855 856~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 857int num_symbols = ReadBits(1) + 1; 858~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 859 860Following are the symbol values. 861 862This first symbol is coded using 1 or 8 bits depending on the value of 863`is_first_8bits`. The range is \[0..1\] or \[0..255\], respectively. The second 864symbol, if present, is always assumed to be in the range \[0..255\] and coded 865using 8 bits. 866 867~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 868int is_first_8bits = ReadBits(1); 869symbol0 = ReadBits(1 + 7 * is_first_8bits); 870code_lengths[symbol0] = 1; 871if (num_symbols == 2) { 872 symbol1 = ReadBits(8); 873 code_lengths[symbol1] = 1; 874} 875~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 876 877**Note:** Another special case is when _all_ prefix code lengths are _zeros_ (an 878empty prefix code). For example, a prefix code for distance can be empty if 879there are no backward references. Similarly, prefix codes for alpha, red, and 880blue can be empty if all pixels within the same meta prefix code are produced 881using the color cache. However, this case doesn't need special handling, as 882empty prefix codes can be coded as those containing a single symbol `0`. 883 884**(ii) Normal Code Length Code:** 885 886The code lengths of the prefix code fit in 8 bits and are read as follows. 887First, `num_code_lengths` specifies the number of code lengths. 888 889~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 890int num_code_lengths = 4 + ReadBits(4); 891~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 892 893If `num_code_lengths` is > 19, the bitstream is invalid. \[AMENDED3\] 894 895The code lengths are themselves encoded using prefix codes: lower level code 896lengths, `code_length_code_lengths`, first have to be read. The rest of those 897`code_length_code_lengths` (according to the order in `kCodeLengthCodeOrder`) 898are zeros. 899 900~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 901int kCodeLengthCodes = 19; 902int kCodeLengthCodeOrder[kCodeLengthCodes] = { 903 17, 18, 0, 1, 2, 3, 4, 5, 16, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 904}; 905int code_length_code_lengths[kCodeLengthCodes] = { 0 }; // All zeros 906for (i = 0; i < num_code_lengths; ++i) { 907 code_length_code_lengths[kCodeLengthCodeOrder[i]] = ReadBits(3); 908} 909~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 910 911Next, if `ReadBits(1) == 0`, the maximum number of different read symbols is 912`num_code_lengths`. Otherwise, it is defined as: 913 914~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 915int length_nbits = 2 + 2 * ReadBits(3); 916int max_symbol = 2 + ReadBits(length_nbits); 917~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 918 919A prefix table is then built from `code_length_code_lengths` and used to read up 920to `max_symbol` code lengths. 921 922 * Code \[0..15\] indicates literal code lengths. 923 * Value 0 means no symbols have been coded. 924 * Values \[1..15\] indicate the bit length of the respective code. 925 * Code 16 repeats the previous non-zero value \[3..6\] times, i.e., 926 `3 + ReadBits(2)` times. If code 16 is used before a non-zero 927 value has been emitted, a value of 8 is repeated. 928 * Code 17 emits a streak of zeros \[3..10\], i.e., `3 + ReadBits(3)` 929 times. 930 * Code 18 emits a streak of zeros of length \[11..138\], i.e., 931 `11 + ReadBits(7)` times. 932 933Once code lengths are read, a prefix code for each symbol type (A, R, G, B, 934distance) is formed using their respective alphabet sizes: 935 936 * G channel: 256 + 24 + `color_cache_size` 937 * other literals (A,R,B): 256 938 * distance code: 40 939 940#### 6.2.2 Decoding of Meta Prefix Codes 941 942As noted earlier, the format allows the use of different prefix codes for 943different blocks of the image. _Meta prefix codes_ are indexes identifying which 944prefix codes to use in different parts of the image. 945 946Meta prefix codes may be used _only_ when the image is being used in the 947[role](#roles-of-image-data) of an _ARGB image_. 948 949There are two possibilities for the meta prefix codes, indicated by a 1-bit 950value: 951 952 * If this bit is zero, there is only one meta prefix code used everywhere in 953 the image. No more data is stored. 954 * If this bit is one, the image uses multiple meta prefix codes. These meta 955 prefix codes are stored as an _entropy image_ (described below). 956 957**Entropy image:** 958 959The entropy image defines which prefix codes are used in different parts of the 960image, as described below. 961 962The first 3-bits contain the `prefix_bits` value. The dimensions of the entropy 963image are derived from `prefix_bits`. 964 965~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 966int prefix_bits = ReadBits(3) + 2; 967int prefix_xsize = DIV_ROUND_UP(xsize, 1 << prefix_bits); 968int prefix_ysize = DIV_ROUND_UP(ysize, 1 << prefix_bits); 969~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 970 971where `DIV_ROUND_UP` is as defined [earlier](#predictor-transform). 972 973The next bits contain an entropy image of width `prefix_xsize` and height 974`prefix_ysize`. 975 976**Interpretation of Meta Prefix Codes:** 977 978For any given pixel (x, y), there is a set of five prefix codes associated with 979it. These codes are (in bitstream order): 980 981 * **Prefix code #1**: used for green channel, backward-reference length and 982 color cache. 983 * **Prefix code #2, #3 and #4**: used for red, blue and alpha channels 984 respectively. 985 * **Prefix code #5**: used for backward-reference distance. 986 987From here on, we refer to this set as a **prefix code group**. 988 989The number of prefix code groups in the ARGB image can be obtained by finding 990the _largest meta prefix code_ from the entropy image: 991 992~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 993int num_prefix_groups = max(entropy image) + 1; 994~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 995where `max(entropy image)` indicates the largest prefix code stored in the 996entropy image. 997 998As each prefix code group contains five prefix codes, the total number of prefix 999codes is: 1000 1001~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1002int num_prefix_codes = 5 * num_prefix_groups; 1003~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1004 1005Given a pixel (x, y) in the ARGB image, we can obtain the corresponding prefix 1006codes to be used as follows: 1007 1008~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1009int position = 1010 (y >> prefix_bits) * prefix_xsize + (x >> prefix_bits); 1011int meta_prefix_code = (entropy_image[pos] >> 8) & 0xffff; 1012PrefixCodeGroup prefix_group = prefix_code_groups[meta_prefix_code]; 1013~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1014 1015where, we have assumed the existence of `PrefixCodeGroup` structure, which 1016represents a set of five prefix codes. Also, `prefix_code_groups` is an array of 1017`PrefixCodeGroup` (of size `num_prefix_groups`). 1018 1019The decoder then uses prefix code group `prefix_group` to decode the pixel 1020(x, y) as explained in the [next section](#decoding-entropy-coded-image-data). 1021 1022#### 6.2.3 Decoding Entropy-coded Image Data 1023 1024\[AMENDED2\] 1025 1026For the current position (x, y) in the image, the decoder first identifies the 1027corresponding prefix code group (as explained in the last section). Given the 1028prefix code group, the pixel is read and decoded as follows: 1029 1030Read the next symbol S from the bitstream using prefix code #1. Note that S is 1031any integer in the range `0` to 1032`(256 + 24 + ` [`color_cache_size`](#color-cache-code)` - 1)`. 1033 1034The interpretation of S depends on its value: 1035 1036 1. if S < 256 1037 1. Use S as the green component. 1038 1. Read red from the bitstream using prefix code #2. 1039 1. Read blue from the bitstream using prefix code #3. 1040 1. Read alpha from the bitstream using prefix code #4. 1041 1. if S >= 256 && S < 256 + 24 1042 1. Use S - 256 as a length prefix code. 1043 1. Read extra bits for length from the bitstream. 1044 1. Determine backward-reference length L from length prefix code and the 1045 extra bits read. 1046 1. Read distance prefix code from the bitstream using prefix code #5. 1047 1. Read extra bits for distance from the bitstream. 1048 1. Determine backward-reference distance D from distance prefix code and 1049 the extra bits read. 1050 1. Copy the L pixels (in scan-line order) from the sequence of pixels 1051 prior to them by D pixels. 1052 1. if S >= 256 + 24 1053 1. Use S - (256 + 24) as the index into the color cache. 1054 1. Get ARGB color from the color cache at that index. 1055 1056 10577 Overall Structure of the Format 1058--------------------------------- 1059 1060Below is a view into the format in Augmented Backus-Naur Form ([ABNF]). It does 1061not cover all details. End-of-image (EOI) is only implicitly coded into the 1062number of pixels (xsize * ysize). 1063 1064 1065#### 7.1 Basic Structure 1066 1067~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1068format = RIFF-header image-size image-stream 1069RIFF-header = "RIFF" 4OCTET "WEBP" "VP8L" 4OCTET %x2F 1070image-size = 14BIT 14BIT ; width - 1, height - 1 1071image-stream = optional-transform spatially-coded-image 1072~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1073 1074 1075#### 7.2 Structure of Transforms 1076 1077~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1078optional-transform = (%b1 transform optional-transform) / %b0 1079transform = predictor-tx / color-tx / subtract-green-tx 1080transform =/ color-indexing-tx 1081 1082predictor-tx = %b00 predictor-image 1083predictor-image = 3BIT ; sub-pixel code 1084 entropy-coded-image 1085 1086color-tx = %b01 color-image 1087color-image = 3BIT ; sub-pixel code 1088 entropy-coded-image 1089 1090subtract-green-tx = %b10 1091 1092color-indexing-tx = %b11 color-indexing-image 1093color-indexing-image = 8BIT ; color count 1094 entropy-coded-image 1095~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1096 1097 1098#### 7.3 Structure of the Image Data 1099 1100\[AMENDED2\] 1101 1102~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1103spatially-coded-image = color-cache-info meta-prefix data 1104entropy-coded-image = color-cache-info data 1105 1106color-cache-info = %b0 1107color-cache-info =/ (%b1 4BIT) ; 1 followed by color cache size 1108 1109meta-prefix = %b0 / (%b1 entropy-image) 1110 1111data = prefix-codes lz77-coded-image 1112entropy-image = 3BIT ; subsample value 1113 entropy-coded-image 1114 1115prefix-codes = prefix-code-group *prefix-codes 1116prefix-code-group = 1117 5prefix-code ; See "Interpretation of Meta Prefix Codes" to 1118 ; understand what each of these five prefix 1119 ; codes are for. 1120 1121prefix-code = simple-prefix-code / normal-prefix-code 1122simple-prefix-code = ; see "Simple Code Length Code" for details 1123normal-prefix-code = code-length-code encoded-code-lengths 1124code-length-code = ; see section "Normal Code Length Code" 1125 1126lz77-coded-image = 1127 *((argb-pixel / lz77-copy / color-cache-code) lz77-coded-image) 1128~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1129 1130A possible example sequence: 1131 1132~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1133RIFF-header image-size %b1 subtract-green-tx 1134%b1 predictor-tx %b0 color-cache-info 1135%b0 prefix-codes lz77-coded-image 1136~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1137 1138[ABNF]: https://www.rfc-editor.org/rfc/rfc5234 1139[canonical_huff]: https://en.wikipedia.org/wiki/Canonical_Huffman_code 1140