• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# GIF
2
3GIF (Graphics Interchange Format) is an image compression format for paletted
4still and animated images. It is specified in [the GIF89a
5specification](https://www.w3.org/Graphics/GIF/spec-gif89a.txt).
6
7
8# Wire Format Worked Example
9
10Consider `test/data/bricks-nodither.gif`.
11
12    offset  xoffset ASCII   hex     binary
13    000000  0x0000  G       0x47    0b_0100_0111
14    000001  0x0001  I       0x49    0b_0100_1001
15    000002  0x0002  F       0x46    0b_0100_0110
16    000003  0x0003  8       0x38    0b_0011_1000
17    000004  0x0004  9       0x39    0b_0011_1001
18    000005  0x0005  a       0x61    0b_0110_0001
19    000006  0x0006  .       0xA0    0b_1010_0000
20    000007  0x0007  .       0x00    0b_0000_0000
21    000008  0x0008  x       0x78    0b_0111_1000
22    etc
23    014230  0x3796  .       0xD3    0b_1101_0011
24    014231  0x3797  .       0x02    0b_0000_0010
25    014232  0x3798  .       0x06    0b_0000_0110
26    014233  0x3799  .       0x04    0b_0000_0100
27    014234  0x379A  .       0x00    0b_0000_0000
28    014235  0x379B  ;       0x3B    0b_0011_1011
29
30Starting at the top, a magic identifier, either "GIF87a" or "GIF89a". In
31theory, some decoders can only handle the 1987 flavor. In practice, support for
32the 1989 flavor is ubiquitous. Anyway, the magic ID is:
33
34    offset  xoffset ASCII   hex     binary
35    000000  0x0000  G       0x47    0b_0100_0111
36    000001  0x0001  I       0x49    0b_0100_1001
37    000002  0x0002  F       0x46    0b_0100_0110
38    000003  0x0003  8       0x38    0b_0011_1000
39    000004  0x0004  9       0x39    0b_0011_1001
40    000005  0x0005  a       0x61    0b_0110_0001
41
42
43## Logical Screen Descriptor
44
45The Logical Screen Descriptor is at least 7 bytes long:
46
47    offset  xoffset ASCII   hex     binary
48    000006  0x0006  .       0xA0    0b_1010_0000
49    000007  0x0007  .       0x00    0b_0000_0000
50    000008  0x0008  x       0x78    0b_0111_1000
51    000009  0x0009  .       0x00    0b_0000_0000
52    000010  0x000A  .       0xF7    0b_1111_0111
53    000011  0x000B  .       0x00    0b_0000_0000
54    000012  0x000C  .       0x00    0b_0000_0000
55
56The image size is 0x00A0 by 0x0078 pixels, or 160 by 120. The fifth byte is
57flags, the sixth is the background color index, the seventh is ignored.
58
59The high bit (0x80) and low three bits (0x07) of the flags indicate that we
60have a GCT (Global Color Table) and that it contains 1<<(7+1), or 256,
61elements. Global means that it applies to each frame in the possibly animated
62image unless a per-frame Local Color Table overrides it, and Color Table means
63a palette of up to 256 RGB (Red, Green, Blue) triples. That GCT follows
64immediately:
65
66    offset  xoffset ASCII   hex     binary
67    000013  0x000D  .       0x01    0b_0000_0001
68    000014  0x000E  .       0x03    0b_0000_0011
69    000015  0x000F  .       0x04    0b_0000_0100
70    000016  0x0010  .       0x00    0b_0000_0000
71    000017  0x0011  .       0x0C    0b_0000_1100
72    000018  0x0012  .       0x03    0b_0000_0011
73    etc
74    000250  0x00FA  .       0x01    0b_0000_0001
75    000251  0x00FB  $       0x24    0b_0010_0100
76    000252  0x00FC  c       0x63    0b_0110_0011
77    etc
78    000673  0x02A1  .       0x0F    0b_0000_1111
79    000674  0x02A2  .       0xA7    0b_1010_0111
80    000675  0x02A3  .       0xF6    0b_1111_0110
81    etc
82    000778  0x030A  .       0x9F    0b_1001_1111
83    000779  0x030B  .       0xB8    0b_1011_1000
84    000780  0x030C  .       0xCD    0b_1100_1101
85
86Thus, color 0 in the palette is the RGB triple {0x01, 0x03, 0x04}, color 1 is
87{0x00, 0x0C, 0x03}, etc, color 79 is {0x01, 0x24, 0x63}, etc, color 220 is
88{0x0F, 0xA7, 0xF6}, etc, color 255 is {0x9F, 0xB8, 0xCD}.
89
90
91## Graphic Control Extension
92
93For this particular GIF, next comes an Extension Introducer byte (0x21) for a
94GCE (Graphic Control Extension) (0xF9), followed by one or more data blocks:
95
96    offset  xoffset ASCII   hex     binary
97    000781  0x030D  !       0x21    0b_0010_0001
98    000782  0x030E  .       0xF9    0b_1111_1001
99    000783  0x030F  .       0x04    0b_0000_0100
100    000784  0x0310  .       0x00    0b_0000_0000
101    000785  0x0311  .       0x00    0b_0000_0000
102    000786  0x0312  .       0x00    0b_0000_0000
103    000787  0x0313  .       0x00    0b_0000_0000
104    000788  0x0314  .       0x00    0b_0000_0000
105
106The first block has 0x04 payload bytes. After those four bytes, the second
107block has 0x00 payload bytes, which marks the end of the data blocks. The
108payload bytes are flags, animation timing and transparency information that,
109for this simple, opaque, single-frame image, happen to be all zero.
110
111
112## Image Descriptor
113
114Image Descriptors form the bulk of a GIF image. A still image will have one
115Image Descriptor. An animated image will have many. Image Descriptors begin
116with an 0x2C byte, followed by at least nine more bytes for the {left, top,
117width, height} of this frame relative to the overall (potentially animated)
118image, plus a flags byte:
119
120    offset  xoffset ASCII   hex     binary
121    000789  0x0315  ,       0x2C    0b_0010_1100
122    000790  0x0316  .       0x00    0b_0000_0000
123    000791  0x0317  .       0x00    0b_0000_0000
124    000792  0x0318  .       0x00    0b_0000_0000
125    000793  0x0319  .       0x00    0b_0000_0000
126    000794  0x031A  .       0xA0    0b_1010_0000
127    000795  0x031B  .       0x00    0b_0000_0000
128    000796  0x031C  x       0x78    0b_0111_1000
129    000797  0x031D  .       0x00    0b_0000_0000
130    000798  0x031E  .       0x00    0b_0000_0000
131
132The frame's origin is {0x0000, 0x0000} and its extent is {0x00A0, 0x0078}. The
133flags byte indicates a non-interlaced frame and that no LCT (Local Color Table)
134overrides the GCT. After the (lack of) LCT comes the LZW (Lempel Ziv Welch)
135compression's `log2(literal_width)`, discussed further below.
136
137    offset  xoffset ASCII   hex     binary
138    000799  0x031F  .       0x08    0b_0000_1000
139
140
141## Pixel Data
142
143Pixel data from the bulk of an Image Descriptor. Just like the GCE format, the
144payload is framed as a sequence of blocks, each block starting with a single
145byte count of the remaining bytes in the block, and the final block containing
146only a 0x00 byte.
147
148    offset  xoffset ASCII   hex     binary
149    000800  0x0320  .       0xFE    0b_1111_1110
150    000801  0x0321  .       0x00    0b_0000_0000
151    000802  0x0322  .       0xB9    0b_1011_1001
152    000803  0x0323  .       0x09    0b_0000_1001
153    000804  0x0324  .       0x14    0b_0001_0100
154    000805  0x0325  .       0x98    0b_1001_1000
155    etc
156    001053  0x041D  6       0x36    0b_0011_0110
157    001054  0x041E  .       0x1E    0b_0001_1110
158    001055  0x041F  .       0xFE    0b_1111_1110
159    001056  0x0420  J       0x4A    0b_0100_1010
160    001057  0x0421  .       0xA4    0b_1010_0100
161    etc
162    001308  0x051C  .       0x1F    0b_0001_1111
163    001309  0x051D  .       0x81    0b_1000_0001
164    001310  0x051E  .       0xFE    0b_1111_1110
165    001311  0x051F  5       0x35    0b_0011_0101
166    001312  0x0520  ]       0x5D    0b_0101_1101
167    etc
168    etc
169    etc
170    013803  0x35EB  n       0x6E    0b_0110_1110
171    013804  0x35EC  .       0xD7    0b_1101_0111
172    013805  0x35ED  .       0xFE    0b_1111_1110
173    013806  0x35EE  .       0x17    0b_0001_0111
174    013807  0x35EF  .       0xB4    0b_1011_0100
175    etc
176    014058  0x36EA  .       0xDB    0b_1101_1011
177    014059  0x36EB  g       0x67    0b_0110_0111
178    014060  0x36EC  .       0xAD    0b_1010_1101
179    014061  0x36ED  .       0x81    0b_1000_0001
180    014062  0x36EE  .       0xD6    0b_1101_0110
181    etc
182    014232  0x3798  .       0x06    0b_0000_0110
183    014233  0x3799  .       0x04    0b_0000_0100
184    014234  0x379A  .       0x00    0b_0000_0000
185
186The specification allows for the block count to be 0xFF, but for whatever
187reason, the ImageMagick encoder and its command-line `convert` tool generated
188slightly shorter blocks, with lengths 0xFE, 0xFE, 0xFE, ..., 0xFE, 0xAD, 0x00.
189
190The payload wrapped by these block counts are LZW compressed, discussed below.
191
192
193## Trailer
194
195The final byte is 0x3B.
196
197    offset  xoffset ASCII   hex     binary
198    014235  0x379B  ;       0x3B    0b_0011_1011
199
200
201# LZW (Lempel Ziv Welch) Compression
202
203See `std/lzw/README.md` for a description of LZW: a general purpose compression
204algorithm. Below is a detailed decoding of the LZW data from the Wire Format
205Worked Example, deconstructed above.
206
207It is not an official format, but the
208`test/data/bricks-nodither.indexes.giflzw` file contains the extracted "Pixel
209Data" payload from `test/data/bricks-nodither.gif`, preceded by the one byte
210`log2(literal_width)`. Deriving a separate file, as a separate, contiguous
211stream, makes it easier to discuss the LZW compression format.
212
213    offset  xoffset ASCII   hex     binary
214    000000  0x0000  .       0x08    0b_0000_1000
215    000001  0x0001  .       0x00    0b_0000_0000
216    000002  0x0002  .       0xB9    0b_1011_1001
217    000003  0x0003  .       0x09    0b_0000_1001
218    000004  0x0004  .       0x14    0b_0001_0100
219    000005  0x0005  .       0x98    0b_1001_1000
220    000006  0x0006  .       0xAD    0b_1010_1101
221    000007  0x0007  ^       0x5E    0b_0101_1110
222    000008  0x0008  >       0x3E    0b_0011_1110
223    000009  0x0009  {       0x7B    0b_0111_1011
224    000010  0x000A  .       0xDF    0b_1101_1111
225    000011  0x000B  .       0xB8    0b_1011_1000
226    000012  0x000C  }       0x7D    0b_0111_1101
227    000013  0x000D  .       0xC9    0b_1100_1001
228    000014  0x000E  .       0xA1    0b_1010_0001
229    000015  0x000F  .       0xA3    0b_1010_0011
230    000016  0x0010  a       0x61    0b_0110_0001
231    000017  0x0011  .       0xC3    0b_1100_0011
232    000018  0x0012  0       0x30    0b_0011_0000
233    etc
234    000253  0x00FD  6       0x36    0b_0011_0110
235    000254  0x00FE  .       0x1E    0b_0001_1110
236    000255  0x00FF  J       0x4A    0b_0100_1010
237    000256  0x0100  .       0xA4    0b_1010_0100
238    etc
239    000507  0x01FB  .       0x1F    0b_0001_1111
240    000508  0x01FC  .       0x81    0b_1000_0001
241    000509  0x01FD  5       0x35    0b_0011_0101
242    000510  0x01FE  ]       0x5D    0b_0101_1101
243    etc
244    etc
245    etc
246    012953  0x3299  n       0x6E    0b_0110_1110
247    012954  0x329A  .       0xD7    0b_1101_0111
248    012955  0x329B  .       0x17    0b_0001_0111
249    012956  0x329C  .       0xB4    0b_1011_0100
250    etc
251    013207  0x3397  .       0xDB    0b_1101_1011
252    013208  0x3398  g       0x67    0b_0110_0111
253    013209  0x3399  .       0x81    0b_1000_0001
254    013210  0x339A  .       0xD6    0b_1101_0110
255    etc
256    013380  0x3444  .       0x06    0b_0000_0110
257    013381  0x3445  .       0x04    0b_0000_0100
258
259
260## Codes
261
262LZW consists of a sequence of code values, in the range `[0, max]` inclusive.
263Each code decodes to either a literal value (one byte), a back-reference
264(multiple bytes), a 'clear code' (discussed below), or indicates the end of the
265data stream.
266
267The first `1<<l2lw` codes are literal codes, where `l2lw` is the
268`log2(literal_width)` mentioned above. For GIF's flavor of LZW, valid `l2lw`
269values are between 2 and 8 inclusive.
270
271For example, if `l2lw == 8`, then the first 256 codes are literal codes: the 0
272code decodes one 0x00 byte, the 1 code decodes one 0x01 byte, etc. The next
273code (256) is the clear code and the next code after that (257) is the end
274code. Any code higher than that denotes a back-reference, but any code above
275the current `max` value is invalid.
276
277`max`'s initial value is the same as the end code, and it increases by 1 with
278each code consumed, up until `max == 4095`. A clear code resets `max` to its
279initial value, and clears the meaning of higher back-reference codes.
280
281Codes take up a variable number of bits in the input stream, the `width`,
282depending on `max`. For example, if `max` is in the range [256, 511] then the
283next code takes 9 bits, if `max` is in the range [512, 1023] then `width ==
28410`, and so on. For GIF, when a code spans multiple bytes, codes are formed
285Least Significant Bits first. For the
286`test/data/bricks-nodither.indexes.giflzw` example, `l2lw == 8` and so `width`
287starts at 9 bits. Printing the bits on byte boundaries give:
288
289    offset  xoffset ASCII   hex     binary
290    000001  0x0001  .       0x00    0b_0000_0000
291    000002  0x0002  .       0xB9    0b_1011_1001
292    000003  0x0003  .       0x09    0b_0000_1001
293    000004  0x0004  .       0x14    0b_0001_0100
294    000005  0x0005  .       0x98    0b_1001_1000
295    000006  0x0006  .       0xAD    0b_1010_1101
296    000007  0x0007  ^       0x5E    0b_0101_1110
297    000008  0x0008  >       0x3E    0b_0011_1110
298    000009  0x0009  {       0x7B    0b_0111_1011
299    000010  0x000A  .       0xDF    0b_1101_1111
300    000011  0x000B  .       0xB8    0b_1011_1000
301    000012  0x000C  }       0x7D    0b_0111_1101
302    000013  0x000D  .       0xC9    0b_1100_1001
303    000014  0x000E  .       0xA1    0b_1010_0001
304    000015  0x000F  .       0xA3    0b_1010_0011
305    000016  0x0010  a       0x61    0b_0110_0001
306    000017  0x0011  .       0xC3    0b_1100_0011
307    000018  0x0012  0       0x30    0b_0011_0000
308    etc
309    000147  0x0093  >       0x3E    0b_0011_1110
310    000148  0x0094  .       0xC4    0b_1100_0100
311    etc
312    000286  0x011E  _       0x5F    0b_0101_1111
313    000287  0x011F  .       0xEA    0b_1110_1010
314    000288  0x0120  .       0x9C    0b_1001_1100
315    000289  0x0121  .       0xFC    0b_1111_1100
316    000290  0x0122  .       0xD4    0b_1101_0100
317    000291  0x0123  .       0xA4    0b_1010_0100
318    etc
319    005407  0x151F  .       0xBB    0b_1011_1011
320    005408  0x1520  .       0xFB    0b_1111_1011
321    005409  0x1521  .       0x00    0b_0000_0000
322    005410  0x1522  .       0xE1    0b_1110_0001
323    005411  0x1523  .       0xA1    0b_1010_0001
324    005412  0x1524  .       0xC3    0b_1100_0011
325    005413  0x1525  @       0x40    0b_0100_0000
326    etc
327    013378  0x3442  .       0xD3    0b_1101_0011
328    013379  0x3443  .       0x02    0b_0000_0010
329    013380  0x3444  .       0x06    0b_0000_0110
330    013381  0x3445  .       0x04    0b_0000_0100
331
332
333and on code boundaries give:
334
335    binary                              width   code
336    0b_...._...._...._...1_0000_0000     9      0x0100
337    0b_...._...._...._..01_1011_100.     9      0x00DC
338    0b_...._...._...._.100_0000_10..     9      0x0102
339    0b_...._...._...._1000_0001_0...     9      0x0102
340    0b_...._...._...0_1101_1001_....     9      0x00D9
341    0b_...._...._..01_1110_101._....     9      0x00F5
342    0b_...._...._.011_1110_01.._....     9      0x00F9
343    0b_...._...._0111_1011_0..._....     9      0x00F6
344    0b_...._...._...._...0_1101_1111     9      0x00DF
345    0b_...._...._...._..01_1011_100.     9      0x00DC
346    0b_...._...._...._.001_0111_11..     9      0x005F
347    0b_...._...._...._0001_1100_1...     9      0x0039
348    0b_...._...._...0_0011_1010_....     9      0x003A
349    0b_...._...._..10_0001_101._....     9      0x010D
350    0b_...._...._.100_0011_01.._....     9      0x010D
351    0b_...._...._0011_0000_1..._....     9      0x0061
352    etc
353    0b_...._...._...._.100_0011_11..     9      0x010F
354    etc
355    0b_...._...._.110_1010_01.._....     9      0x01A9
356    0b_...._...._1001_1100_1..._....     9      0x0139
357    0b_...._...._...._..00_1111_1100    10      0x00FC  (implicit width increase)
358    0b_...._...._...._0100_1101_01..    10      0x0135
359    etc
360    0b_...._...._1111_1011_1011_....    12      0x0FBB
361    0b_...._...._...._0001_0000_0000    12      0x0100  (code == 0x0100 means clear)
362    0b_...._...._...0_0001_1110_....     9      0x001E  (explicit width reset)
363    0b_...._...._..00_0011_101._....     9      0x001D
364    0b_...._...._.100_0000_11.._....     9      0x0103
365    etc
366    0b_...._..10_0000_0010_11.._....    12      0x080B
367    0b_...._...._..00_0100_0000_01..    12      0x0101
368
369The implicit `width` increase is because `max` ticked over from 0x01FF to
3700x200. Processing these codes give:
371
372    code      meaning   max     output
373    0x0100    clear     0x0101  .
374    0x00DC    literal   0x0101  0xDC
375    0x0102    back-ref  0x0102  0xDC 0xDC
376    0x0102    back-ref  0x0103  0xDC 0xDC
377    0x00D9    literal   0x0104  0xD9
378    0x00F5    literal   0x0105  0xF5
379    0x00F9    literal   0x0106  0xF9
380    0x00F6    literal   0x0107  0xF6
381    0x00DF    literal   0x0108  0xDF
382    0x00DC    literal   0x0109  0xDC
383    0x005F    literal   0x010A  0x5F
384    0x0039    literal   0x010B  0x39
385    0x003A    literal   0x010C  0x3A
386    0x010D    back-ref  0x010D  0x3A 0x3A
387    0x010D    back-ref  0x010E  0x3A 0x3A
388    0x0061    literal   0x010F  0x61
389    etc
390    0x010F    back-ref  0x0182  0x3A 0x3A 0x61
391    etc
392    0x01A9    back-ref  0x01FE  0xFB 0xFB 0xFB 0xFB
393    0x0139    back-ref  0x01FF  0xFB 0xFC
394    0x00FC    literal   0x0200  0xFC
395    0x0135    back-ref  0x0201  0xFA 0xFA
396    etc
397    0x0FBB    back-ref  0x0FFF  0xC1 0xC1 0xC1 0xC1 0xC1 0xC1 0xDF
398    0x0100    clear     0x0FFF  .
399    0x001E    literal   0x0101  0x1E
400    0x001D    literal   0x0102  0x1D
401    0x0103    back-ref  0x0103  0x1D 0x1D
402    etc
403    0x080B    back-ref  0x0896  0x4F 0x4F 0x4F 0x4F 0x4F
404    0x0101    end       0x0897  .
405
406The `max` column here is the `max` code allowed when processing that row's
407code. The initial clear code is redundant (as the initial value of `max` is
408already 0x0101 here), but some encoders produce them because the specification
409says that "encoders should output a Clear code as the first code of each image
410data stream".
411
412Other than clear and end codes, each code produces some output: a variable
413length prefix and a one byte suffix. For literal codes, the prefix is empty and
414the suffix is the literal.
415
416When `max` is greater than the end code (e.g. greater than 0x0101), the row
417defines a new back-reference, and that definition persists up until the next
418clear code. For example, the third, fourth, fifth, etc. rows define what the
4190x0102, 0x0103, 0x0104, etc. codes output. That code's output (prefix + suffix)
420is defined by the prefix being the previous row's output and the suffix being
421the first byte of the current row's output.
422
423For example, the 15th and 16th rows are:
424
425    code      meaning   max     output
426    0x010D    back-ref  0x010E  0x3A 0x3A
427    0x0061    literal   0x010F  0x61
428
429which defines the 0x010F code to output "0x3A 0x3A 0x61". Later on, an 0x010F
430code is encountered, and its output is exactly that.
431
432A row's code can be the code that that row itself defines, in which case the
433prefix is the previous row and the suffix is the first byte of the previous
434row. Consider the opening few codes:
435
436    code      meaning   max     output
437    0x0100    clear     0x0101  .
438    0x00DC    literal   0x0101  0xDC
439    0x0102    back-ref  0x0102  0xDC 0xDC
440    0x0102    back-ref  0x0103  0xDC 0xDC
441    0x00D9    literal   0x0104  0xD9
442
443The third row's code is 0x0102, which that row also defines. Its output is
444therefore the concatenation of "0xDC" and the first byte of "0xDC", both of
445those strings being the previous row's output. The fourth row executes the same
4460x0102 code (and outputs "0xDC 0xDC" again) and defines the 0x0103 code. The
447next code is a literal 0xD9, and hence the decoded output starts with five 0xDC
448bytes and then an 0xD9.
449
450
451## Actual Pixel Indexes
452
453It is also not an official format, but for reference, the
454`test/data/bricks-nodither.indexes` contains the decoded
455`test/data/bricks-nodither.indexes.giflzw` data:
456
457    offset  xoffset ASCII   hex     binary
458    000000  0x0000  .       0xDC    0b_1101_1100
459    000001  0x0001  .       0xDC    0b_1101_1100
460    000002  0x0002  .       0xDC    0b_1101_1100
461    000003  0x0003  .       0xDC    0b_1101_1100
462    000004  0x0004  .       0xDC    0b_1101_1100
463    000005  0x0005  .       0xD9    0b_1101_1001
464    000006  0x0006  .       0xF5    0b_1111_0101
465    000007  0x0007  .       0xF9    0b_1111_1001
466    000008  0x0008  .       0xF6    0b_1111_0110
467    000009  0x0009  .       0xDF    0b_1101_1111
468    000010  0x000A  .       0xDC    0b_1101_1100
469    000011  0x000B  _       0x5F    0b_0101_1111
470    000012  0x000C  9       0x39    0b_0011_1001
471    000013  0x000D  :       0x3A    0b_0011_1010
472    000014  0x000E  :       0x3A    0b_0011_1010
473    000015  0x000F  :       0x3A    0b_0011_1010
474    000016  0x0010  :       0x3A    0b_0011_1010
475    000017  0x0011  :       0x3A    0b_0011_1010
476    000018  0x0012  a       0x61    0b_0110_0001
477    etc
478    019195  0x4AFB  O       0x4F    0b_0100_1111
479    019196  0x4AFC  O       0x4F    0b_0100_1111
480    019197  0x4AFD  O       0x4F    0b_0100_1111
481    019198  0x4AFE  O       0x4F    0b_0100_1111
482    019199  0x4AFF  O       0x4F    0b_0100_1111
483
484We have 19200 = 160 × 120 pixels. The top row starts with five pixels with RGB
485values indexed by 0xDC (or 220 in decimal, and recall that the GCT maps color
486220 to {0x0F, 0xA7, 0xF6}), the bottom row ends with index 0x4F.
487
488Indeed, opening `test/data/bricks-nodither.gif` in an image editor should
489verify that the top left pixel's RGB is {0x0F, 0xA7, 0xF6}, and likewise the
490bottom right pixel's RGB is {0x01, 0x24, 0x63}.
491