1# GIF 2 3GIF (Graphics Interchange Format) is an image compression format for paletted 4still and animated images. It is specified in [the GIF89a 5specification](https://www.w3.org/Graphics/GIF/spec-gif89a.txt). 6 7 8# Wire Format Worked Example 9 10Consider `test/data/bricks-nodither.gif`. 11 12 offset xoffset ASCII hex binary 13 000000 0x0000 G 0x47 0b_0100_0111 14 000001 0x0001 I 0x49 0b_0100_1001 15 000002 0x0002 F 0x46 0b_0100_0110 16 000003 0x0003 8 0x38 0b_0011_1000 17 000004 0x0004 9 0x39 0b_0011_1001 18 000005 0x0005 a 0x61 0b_0110_0001 19 000006 0x0006 . 0xA0 0b_1010_0000 20 000007 0x0007 . 0x00 0b_0000_0000 21 000008 0x0008 x 0x78 0b_0111_1000 22 etc 23 014230 0x3796 . 0xD3 0b_1101_0011 24 014231 0x3797 . 0x02 0b_0000_0010 25 014232 0x3798 . 0x06 0b_0000_0110 26 014233 0x3799 . 0x04 0b_0000_0100 27 014234 0x379A . 0x00 0b_0000_0000 28 014235 0x379B ; 0x3B 0b_0011_1011 29 30Starting at the top, a magic identifier, either "GIF87a" or "GIF89a". In 31theory, some decoders can only handle the 1987 flavor. In practice, support for 32the 1989 flavor is ubiquitous. Anyway, the magic ID is: 33 34 offset xoffset ASCII hex binary 35 000000 0x0000 G 0x47 0b_0100_0111 36 000001 0x0001 I 0x49 0b_0100_1001 37 000002 0x0002 F 0x46 0b_0100_0110 38 000003 0x0003 8 0x38 0b_0011_1000 39 000004 0x0004 9 0x39 0b_0011_1001 40 000005 0x0005 a 0x61 0b_0110_0001 41 42 43## Logical Screen Descriptor 44 45The Logical Screen Descriptor is at least 7 bytes long: 46 47 offset xoffset ASCII hex binary 48 000006 0x0006 . 0xA0 0b_1010_0000 49 000007 0x0007 . 0x00 0b_0000_0000 50 000008 0x0008 x 0x78 0b_0111_1000 51 000009 0x0009 . 0x00 0b_0000_0000 52 000010 0x000A . 0xF7 0b_1111_0111 53 000011 0x000B . 0x00 0b_0000_0000 54 000012 0x000C . 0x00 0b_0000_0000 55 56The image size is 0x00A0 by 0x0078 pixels, or 160 by 120. The fifth byte is 57flags, the sixth is the background color index, the seventh is ignored. 58 59The high bit (0x80) and low three bits (0x07) of the flags indicate that we 60have a GCT (Global Color Table) and that it contains 1<<(7+1), or 256, 61elements. Global means that it applies to each frame in the possibly animated 62image unless a per-frame Local Color Table overrides it, and Color Table means 63a palette of up to 256 RGB (Red, Green, Blue) triples. That GCT follows 64immediately: 65 66 offset xoffset ASCII hex binary 67 000013 0x000D . 0x01 0b_0000_0001 68 000014 0x000E . 0x03 0b_0000_0011 69 000015 0x000F . 0x04 0b_0000_0100 70 000016 0x0010 . 0x00 0b_0000_0000 71 000017 0x0011 . 0x0C 0b_0000_1100 72 000018 0x0012 . 0x03 0b_0000_0011 73 etc 74 000250 0x00FA . 0x01 0b_0000_0001 75 000251 0x00FB $ 0x24 0b_0010_0100 76 000252 0x00FC c 0x63 0b_0110_0011 77 etc 78 000673 0x02A1 . 0x0F 0b_0000_1111 79 000674 0x02A2 . 0xA7 0b_1010_0111 80 000675 0x02A3 . 0xF6 0b_1111_0110 81 etc 82 000778 0x030A . 0x9F 0b_1001_1111 83 000779 0x030B . 0xB8 0b_1011_1000 84 000780 0x030C . 0xCD 0b_1100_1101 85 86Thus, color 0 in the palette is the RGB triple {0x01, 0x03, 0x04}, color 1 is 87{0x00, 0x0C, 0x03}, etc, color 79 is {0x01, 0x24, 0x63}, etc, color 220 is 88{0x0F, 0xA7, 0xF6}, etc, color 255 is {0x9F, 0xB8, 0xCD}. 89 90 91## Graphic Control Extension 92 93For this particular GIF, next comes an Extension Introducer byte (0x21) for a 94GCE (Graphic Control Extension) (0xF9), followed by one or more data blocks: 95 96 offset xoffset ASCII hex binary 97 000781 0x030D ! 0x21 0b_0010_0001 98 000782 0x030E . 0xF9 0b_1111_1001 99 000783 0x030F . 0x04 0b_0000_0100 100 000784 0x0310 . 0x00 0b_0000_0000 101 000785 0x0311 . 0x00 0b_0000_0000 102 000786 0x0312 . 0x00 0b_0000_0000 103 000787 0x0313 . 0x00 0b_0000_0000 104 000788 0x0314 . 0x00 0b_0000_0000 105 106The first block has 0x04 payload bytes. After those four bytes, the second 107block has 0x00 payload bytes, which marks the end of the data blocks. The 108payload bytes are flags, animation timing and transparency information that, 109for this simple, opaque, single-frame image, happen to be all zero. 110 111 112## Image Descriptor 113 114Image Descriptors form the bulk of a GIF image. A still image will have one 115Image Descriptor. An animated image will have many. Image Descriptors begin 116with an 0x2C byte, followed by at least nine more bytes for the {left, top, 117width, height} of this frame relative to the overall (potentially animated) 118image, plus a flags byte: 119 120 offset xoffset ASCII hex binary 121 000789 0x0315 , 0x2C 0b_0010_1100 122 000790 0x0316 . 0x00 0b_0000_0000 123 000791 0x0317 . 0x00 0b_0000_0000 124 000792 0x0318 . 0x00 0b_0000_0000 125 000793 0x0319 . 0x00 0b_0000_0000 126 000794 0x031A . 0xA0 0b_1010_0000 127 000795 0x031B . 0x00 0b_0000_0000 128 000796 0x031C x 0x78 0b_0111_1000 129 000797 0x031D . 0x00 0b_0000_0000 130 000798 0x031E . 0x00 0b_0000_0000 131 132The frame's origin is {0x0000, 0x0000} and its extent is {0x00A0, 0x0078}. The 133flags byte indicates a non-interlaced frame and that no LCT (Local Color Table) 134overrides the GCT. After the (lack of) LCT comes the LZW (Lempel Ziv Welch) 135compression's `log2(literal_width)`, discussed further below. 136 137 offset xoffset ASCII hex binary 138 000799 0x031F . 0x08 0b_0000_1000 139 140 141## Pixel Data 142 143Pixel data from the bulk of an Image Descriptor. Just like the GCE format, the 144payload is framed as a sequence of blocks, each block starting with a single 145byte count of the remaining bytes in the block, and the final block containing 146only a 0x00 byte. 147 148 offset xoffset ASCII hex binary 149 000800 0x0320 . 0xFE 0b_1111_1110 150 000801 0x0321 . 0x00 0b_0000_0000 151 000802 0x0322 . 0xB9 0b_1011_1001 152 000803 0x0323 . 0x09 0b_0000_1001 153 000804 0x0324 . 0x14 0b_0001_0100 154 000805 0x0325 . 0x98 0b_1001_1000 155 etc 156 001053 0x041D 6 0x36 0b_0011_0110 157 001054 0x041E . 0x1E 0b_0001_1110 158 001055 0x041F . 0xFE 0b_1111_1110 159 001056 0x0420 J 0x4A 0b_0100_1010 160 001057 0x0421 . 0xA4 0b_1010_0100 161 etc 162 001308 0x051C . 0x1F 0b_0001_1111 163 001309 0x051D . 0x81 0b_1000_0001 164 001310 0x051E . 0xFE 0b_1111_1110 165 001311 0x051F 5 0x35 0b_0011_0101 166 001312 0x0520 ] 0x5D 0b_0101_1101 167 etc 168 etc 169 etc 170 013803 0x35EB n 0x6E 0b_0110_1110 171 013804 0x35EC . 0xD7 0b_1101_0111 172 013805 0x35ED . 0xFE 0b_1111_1110 173 013806 0x35EE . 0x17 0b_0001_0111 174 013807 0x35EF . 0xB4 0b_1011_0100 175 etc 176 014058 0x36EA . 0xDB 0b_1101_1011 177 014059 0x36EB g 0x67 0b_0110_0111 178 014060 0x36EC . 0xAD 0b_1010_1101 179 014061 0x36ED . 0x81 0b_1000_0001 180 014062 0x36EE . 0xD6 0b_1101_0110 181 etc 182 014232 0x3798 . 0x06 0b_0000_0110 183 014233 0x3799 . 0x04 0b_0000_0100 184 014234 0x379A . 0x00 0b_0000_0000 185 186The specification allows for the block count to be 0xFF, but for whatever 187reason, the ImageMagick encoder and its command-line `convert` tool generated 188slightly shorter blocks, with lengths 0xFE, 0xFE, 0xFE, ..., 0xFE, 0xAD, 0x00. 189 190The payload wrapped by these block counts are LZW compressed, discussed below. 191 192 193## Trailer 194 195The final byte is 0x3B. 196 197 offset xoffset ASCII hex binary 198 014235 0x379B ; 0x3B 0b_0011_1011 199 200 201# LZW (Lempel Ziv Welch) Compression 202 203See `std/lzw/README.md` for a description of LZW: a general purpose compression 204algorithm. Below is a detailed decoding of the LZW data from the Wire Format 205Worked Example, deconstructed above. 206 207It is not an official format, but the 208`test/data/bricks-nodither.indexes.giflzw` file contains the extracted "Pixel 209Data" payload from `test/data/bricks-nodither.gif`, preceded by the one byte 210`log2(literal_width)`. Deriving a separate file, as a separate, contiguous 211stream, makes it easier to discuss the LZW compression format. 212 213 offset xoffset ASCII hex binary 214 000000 0x0000 . 0x08 0b_0000_1000 215 000001 0x0001 . 0x00 0b_0000_0000 216 000002 0x0002 . 0xB9 0b_1011_1001 217 000003 0x0003 . 0x09 0b_0000_1001 218 000004 0x0004 . 0x14 0b_0001_0100 219 000005 0x0005 . 0x98 0b_1001_1000 220 000006 0x0006 . 0xAD 0b_1010_1101 221 000007 0x0007 ^ 0x5E 0b_0101_1110 222 000008 0x0008 > 0x3E 0b_0011_1110 223 000009 0x0009 { 0x7B 0b_0111_1011 224 000010 0x000A . 0xDF 0b_1101_1111 225 000011 0x000B . 0xB8 0b_1011_1000 226 000012 0x000C } 0x7D 0b_0111_1101 227 000013 0x000D . 0xC9 0b_1100_1001 228 000014 0x000E . 0xA1 0b_1010_0001 229 000015 0x000F . 0xA3 0b_1010_0011 230 000016 0x0010 a 0x61 0b_0110_0001 231 000017 0x0011 . 0xC3 0b_1100_0011 232 000018 0x0012 0 0x30 0b_0011_0000 233 etc 234 000253 0x00FD 6 0x36 0b_0011_0110 235 000254 0x00FE . 0x1E 0b_0001_1110 236 000255 0x00FF J 0x4A 0b_0100_1010 237 000256 0x0100 . 0xA4 0b_1010_0100 238 etc 239 000507 0x01FB . 0x1F 0b_0001_1111 240 000508 0x01FC . 0x81 0b_1000_0001 241 000509 0x01FD 5 0x35 0b_0011_0101 242 000510 0x01FE ] 0x5D 0b_0101_1101 243 etc 244 etc 245 etc 246 012953 0x3299 n 0x6E 0b_0110_1110 247 012954 0x329A . 0xD7 0b_1101_0111 248 012955 0x329B . 0x17 0b_0001_0111 249 012956 0x329C . 0xB4 0b_1011_0100 250 etc 251 013207 0x3397 . 0xDB 0b_1101_1011 252 013208 0x3398 g 0x67 0b_0110_0111 253 013209 0x3399 . 0x81 0b_1000_0001 254 013210 0x339A . 0xD6 0b_1101_0110 255 etc 256 013380 0x3444 . 0x06 0b_0000_0110 257 013381 0x3445 . 0x04 0b_0000_0100 258 259 260## Codes 261 262LZW consists of a sequence of code values, in the range `[0, max]` inclusive. 263Each code decodes to either a literal value (one byte), a back-reference 264(multiple bytes), a 'clear code' (discussed below), or indicates the end of the 265data stream. 266 267The first `1<<l2lw` codes are literal codes, where `l2lw` is the 268`log2(literal_width)` mentioned above. For GIF's flavor of LZW, valid `l2lw` 269values are between 2 and 8 inclusive. 270 271For example, if `l2lw == 8`, then the first 256 codes are literal codes: the 0 272code decodes one 0x00 byte, the 1 code decodes one 0x01 byte, etc. The next 273code (256) is the clear code and the next code after that (257) is the end 274code. Any code higher than that denotes a back-reference, but any code above 275the current `max` value is invalid. 276 277`max`'s initial value is the same as the end code, and it increases by 1 with 278each code consumed, up until `max == 4095`. A clear code resets `max` to its 279initial value, and clears the meaning of higher back-reference codes. 280 281Codes take up a variable number of bits in the input stream, the `width`, 282depending on `max`. For example, if `max` is in the range [256, 511] then the 283next code takes 9 bits, if `max` is in the range [512, 1023] then `width == 28410`, and so on. For GIF, when a code spans multiple bytes, codes are formed 285Least Significant Bits first. For the 286`test/data/bricks-nodither.indexes.giflzw` example, `l2lw == 8` and so `width` 287starts at 9 bits. Printing the bits on byte boundaries give: 288 289 offset xoffset ASCII hex binary 290 000001 0x0001 . 0x00 0b_0000_0000 291 000002 0x0002 . 0xB9 0b_1011_1001 292 000003 0x0003 . 0x09 0b_0000_1001 293 000004 0x0004 . 0x14 0b_0001_0100 294 000005 0x0005 . 0x98 0b_1001_1000 295 000006 0x0006 . 0xAD 0b_1010_1101 296 000007 0x0007 ^ 0x5E 0b_0101_1110 297 000008 0x0008 > 0x3E 0b_0011_1110 298 000009 0x0009 { 0x7B 0b_0111_1011 299 000010 0x000A . 0xDF 0b_1101_1111 300 000011 0x000B . 0xB8 0b_1011_1000 301 000012 0x000C } 0x7D 0b_0111_1101 302 000013 0x000D . 0xC9 0b_1100_1001 303 000014 0x000E . 0xA1 0b_1010_0001 304 000015 0x000F . 0xA3 0b_1010_0011 305 000016 0x0010 a 0x61 0b_0110_0001 306 000017 0x0011 . 0xC3 0b_1100_0011 307 000018 0x0012 0 0x30 0b_0011_0000 308 etc 309 000147 0x0093 > 0x3E 0b_0011_1110 310 000148 0x0094 . 0xC4 0b_1100_0100 311 etc 312 000286 0x011E _ 0x5F 0b_0101_1111 313 000287 0x011F . 0xEA 0b_1110_1010 314 000288 0x0120 . 0x9C 0b_1001_1100 315 000289 0x0121 . 0xFC 0b_1111_1100 316 000290 0x0122 . 0xD4 0b_1101_0100 317 000291 0x0123 . 0xA4 0b_1010_0100 318 etc 319 005407 0x151F . 0xBB 0b_1011_1011 320 005408 0x1520 . 0xFB 0b_1111_1011 321 005409 0x1521 . 0x00 0b_0000_0000 322 005410 0x1522 . 0xE1 0b_1110_0001 323 005411 0x1523 . 0xA1 0b_1010_0001 324 005412 0x1524 . 0xC3 0b_1100_0011 325 005413 0x1525 @ 0x40 0b_0100_0000 326 etc 327 013378 0x3442 . 0xD3 0b_1101_0011 328 013379 0x3443 . 0x02 0b_0000_0010 329 013380 0x3444 . 0x06 0b_0000_0110 330 013381 0x3445 . 0x04 0b_0000_0100 331 332 333and on code boundaries give: 334 335 binary width code 336 0b_...._...._...._...1_0000_0000 9 0x0100 337 0b_...._...._...._..01_1011_100. 9 0x00DC 338 0b_...._...._...._.100_0000_10.. 9 0x0102 339 0b_...._...._...._1000_0001_0... 9 0x0102 340 0b_...._...._...0_1101_1001_.... 9 0x00D9 341 0b_...._...._..01_1110_101._.... 9 0x00F5 342 0b_...._...._.011_1110_01.._.... 9 0x00F9 343 0b_...._...._0111_1011_0..._.... 9 0x00F6 344 0b_...._...._...._...0_1101_1111 9 0x00DF 345 0b_...._...._...._..01_1011_100. 9 0x00DC 346 0b_...._...._...._.001_0111_11.. 9 0x005F 347 0b_...._...._...._0001_1100_1... 9 0x0039 348 0b_...._...._...0_0011_1010_.... 9 0x003A 349 0b_...._...._..10_0001_101._.... 9 0x010D 350 0b_...._...._.100_0011_01.._.... 9 0x010D 351 0b_...._...._0011_0000_1..._.... 9 0x0061 352 etc 353 0b_...._...._...._.100_0011_11.. 9 0x010F 354 etc 355 0b_...._...._.110_1010_01.._.... 9 0x01A9 356 0b_...._...._1001_1100_1..._.... 9 0x0139 357 0b_...._...._...._..00_1111_1100 10 0x00FC (implicit width increase) 358 0b_...._...._...._0100_1101_01.. 10 0x0135 359 etc 360 0b_...._...._1111_1011_1011_.... 12 0x0FBB 361 0b_...._...._...._0001_0000_0000 12 0x0100 (code == 0x0100 means clear) 362 0b_...._...._...0_0001_1110_.... 9 0x001E (explicit width reset) 363 0b_...._...._..00_0011_101._.... 9 0x001D 364 0b_...._...._.100_0000_11.._.... 9 0x0103 365 etc 366 0b_...._..10_0000_0010_11.._.... 12 0x080B 367 0b_...._...._..00_0100_0000_01.. 12 0x0101 368 369The implicit `width` increase is because `max` ticked over from 0x01FF to 3700x200. Processing these codes give: 371 372 code meaning max output 373 0x0100 clear 0x0101 . 374 0x00DC literal 0x0101 0xDC 375 0x0102 back-ref 0x0102 0xDC 0xDC 376 0x0102 back-ref 0x0103 0xDC 0xDC 377 0x00D9 literal 0x0104 0xD9 378 0x00F5 literal 0x0105 0xF5 379 0x00F9 literal 0x0106 0xF9 380 0x00F6 literal 0x0107 0xF6 381 0x00DF literal 0x0108 0xDF 382 0x00DC literal 0x0109 0xDC 383 0x005F literal 0x010A 0x5F 384 0x0039 literal 0x010B 0x39 385 0x003A literal 0x010C 0x3A 386 0x010D back-ref 0x010D 0x3A 0x3A 387 0x010D back-ref 0x010E 0x3A 0x3A 388 0x0061 literal 0x010F 0x61 389 etc 390 0x010F back-ref 0x0182 0x3A 0x3A 0x61 391 etc 392 0x01A9 back-ref 0x01FE 0xFB 0xFB 0xFB 0xFB 393 0x0139 back-ref 0x01FF 0xFB 0xFC 394 0x00FC literal 0x0200 0xFC 395 0x0135 back-ref 0x0201 0xFA 0xFA 396 etc 397 0x0FBB back-ref 0x0FFF 0xC1 0xC1 0xC1 0xC1 0xC1 0xC1 0xDF 398 0x0100 clear 0x0FFF . 399 0x001E literal 0x0101 0x1E 400 0x001D literal 0x0102 0x1D 401 0x0103 back-ref 0x0103 0x1D 0x1D 402 etc 403 0x080B back-ref 0x0896 0x4F 0x4F 0x4F 0x4F 0x4F 404 0x0101 end 0x0897 . 405 406The `max` column here is the `max` code allowed when processing that row's 407code. The initial clear code is redundant (as the initial value of `max` is 408already 0x0101 here), but some encoders produce them because the specification 409says that "encoders should output a Clear code as the first code of each image 410data stream". 411 412Other than clear and end codes, each code produces some output: a variable 413length prefix and a one byte suffix. For literal codes, the prefix is empty and 414the suffix is the literal. 415 416When `max` is greater than the end code (e.g. greater than 0x0101), the row 417defines a new back-reference, and that definition persists up until the next 418clear code. For example, the third, fourth, fifth, etc. rows define what the 4190x0102, 0x0103, 0x0104, etc. codes output. That code's output (prefix + suffix) 420is defined by the prefix being the previous row's output and the suffix being 421the first byte of the current row's output. 422 423For example, the 15th and 16th rows are: 424 425 code meaning max output 426 0x010D back-ref 0x010E 0x3A 0x3A 427 0x0061 literal 0x010F 0x61 428 429which defines the 0x010F code to output "0x3A 0x3A 0x61". Later on, an 0x010F 430code is encountered, and its output is exactly that. 431 432A row's code can be the code that that row itself defines, in which case the 433prefix is the previous row and the suffix is the first byte of the previous 434row. Consider the opening few codes: 435 436 code meaning max output 437 0x0100 clear 0x0101 . 438 0x00DC literal 0x0101 0xDC 439 0x0102 back-ref 0x0102 0xDC 0xDC 440 0x0102 back-ref 0x0103 0xDC 0xDC 441 0x00D9 literal 0x0104 0xD9 442 443The third row's code is 0x0102, which that row also defines. Its output is 444therefore the concatenation of "0xDC" and the first byte of "0xDC", both of 445those strings being the previous row's output. The fourth row executes the same 4460x0102 code (and outputs "0xDC 0xDC" again) and defines the 0x0103 code. The 447next code is a literal 0xD9, and hence the decoded output starts with five 0xDC 448bytes and then an 0xD9. 449 450 451## Actual Pixel Indexes 452 453It is also not an official format, but for reference, the 454`test/data/bricks-nodither.indexes` contains the decoded 455`test/data/bricks-nodither.indexes.giflzw` data: 456 457 offset xoffset ASCII hex binary 458 000000 0x0000 . 0xDC 0b_1101_1100 459 000001 0x0001 . 0xDC 0b_1101_1100 460 000002 0x0002 . 0xDC 0b_1101_1100 461 000003 0x0003 . 0xDC 0b_1101_1100 462 000004 0x0004 . 0xDC 0b_1101_1100 463 000005 0x0005 . 0xD9 0b_1101_1001 464 000006 0x0006 . 0xF5 0b_1111_0101 465 000007 0x0007 . 0xF9 0b_1111_1001 466 000008 0x0008 . 0xF6 0b_1111_0110 467 000009 0x0009 . 0xDF 0b_1101_1111 468 000010 0x000A . 0xDC 0b_1101_1100 469 000011 0x000B _ 0x5F 0b_0101_1111 470 000012 0x000C 9 0x39 0b_0011_1001 471 000013 0x000D : 0x3A 0b_0011_1010 472 000014 0x000E : 0x3A 0b_0011_1010 473 000015 0x000F : 0x3A 0b_0011_1010 474 000016 0x0010 : 0x3A 0b_0011_1010 475 000017 0x0011 : 0x3A 0b_0011_1010 476 000018 0x0012 a 0x61 0b_0110_0001 477 etc 478 019195 0x4AFB O 0x4F 0b_0100_1111 479 019196 0x4AFC O 0x4F 0b_0100_1111 480 019197 0x4AFD O 0x4F 0b_0100_1111 481 019198 0x4AFE O 0x4F 0b_0100_1111 482 019199 0x4AFF O 0x4F 0b_0100_1111 483 484We have 19200 = 160 × 120 pixels. The top row starts with five pixels with RGB 485values indexed by 0xDC (or 220 in decimal, and recall that the GCT maps color 486220 to {0x0F, 0xA7, 0xF6}), the bottom row ends with index 0x4F. 487 488Indeed, opening `test/data/bricks-nodither.gif` in an image editor should 489verify that the top left pixel's RGB is {0x0F, 0xA7, 0xF6}, and likewise the 490bottom right pixel's RGB is {0x01, 0x24, 0x63}. 491