• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Zstandard Seekable Format
2
3### Notices
4
5Copyright (c) 2017-present Facebook, Inc.
6
7Permission is granted to copy and distribute this document
8for any purpose and without charge,
9including translations into other languages
10and incorporation into compilations,
11provided that the copyright notice and this notice are preserved,
12and that any substantive changes or deletions from the original
13are clearly marked.
14Distribution of this document is unlimited.
15
16### Version
170.1.0 (11/04/17)
18
19## Introduction
20This document defines a format for compressed data to be stored so that subranges of the data can be efficiently decompressed without requiring the entire document to be decompressed.
21This is done by splitting up the input data into frames,
22each of which are compressed independently,
23and so can be decompressed independently.
24Decompression then takes advantage of a provided 'seek table', which allows the decompressor to immediately jump to the desired data.  This is done in a way that is compatible with the original Zstandard format by placing the seek table in a Zstandard skippable frame.
25
26### Overall conventions
27In this document:
28- square brackets i.e. `[` and `]` are used to indicate optional fields or parameters.
29- the naming convention for identifiers is `Mixed_Case_With_Underscores`
30- All numeric fields are little-endian unless specified otherwise
31
32## Format
33
34The format consists of a number of frames (Zstandard compressed frames and skippable frames), followed by a final skippable frame at the end containing the seek table.
35
36### Seek Table Format
37The structure of the seek table frame is as follows:
38
39|`Skippable_Magic_Number`|`Frame_Size`|`[Seek_Table_Entries]`|`Seek_Table_Footer`|
40|------------------------|------------|----------------------|-------------------|
41| 4 bytes                | 4 bytes    | 8-12 bytes each      | 9 bytes           |
42
43__`Skippable_Magic_Number`__
44
45Value : 0x184D2A5E.
46This is for compatibility with [Zstandard skippable frames].
47Since it is legal for other Zstandard skippable frames to use the same
48magic number, it is not recommended for a decoder to recognize frames
49solely on this.
50
51__`Frame_Size`__
52
53The total size of the skippable frame, not including the `Skippable_Magic_Number` or `Frame_Size`.
54This is for compatibility with [Zstandard skippable frames].
55
56[Zstandard skippable frames]: https://github.com/facebook/zstd/blob/master/doc/zstd_compression_format.md#skippable-frames
57
58#### `Seek_Table_Footer`
59The seek table footer format is as follows:
60
61|`Number_Of_Frames`|`Seek_Table_Descriptor`|`Seekable_Magic_Number`|
62|------------------|-----------------------|-----------------------|
63| 4 bytes          | 1 byte                | 4 bytes               |
64
65__`Seekable_Magic_Number`__
66
67Value : 0x8F92EAB1.
68This value must be the last bytes present in the compressed file so that decoders
69can efficiently find it and determine if there is an actual seek table present.
70
71__`Number_Of_Frames`__
72
73The number of stored frames in the data.
74
75__`Seek_Table_Descriptor`__
76
77A bitfield describing the format of the seek table.
78
79| Bit number | Field name                |
80| ---------- | ----------                |
81| 7          | `Checksum_Flag`           |
82| 6-2        | `Reserved_Bits`           |
83| 1-0        | `Unused_Bits`             |
84
85While only `Checksum_Flag` currently exists, there are 7 other bits in this field that can be used for future changes to the format,
86for example the addition of inline dictionaries.
87
88__`Checksum_Flag`__
89
90If the checksum flag is set, each of the seek table entries contains a 4 byte checksum of the uncompressed data contained in its frame.
91
92`Reserved_Bits` are not currently used but may be used in the future for breaking changes, so a compliant decoder should ensure they are set to 0.  `Unused_Bits` may be used in the future for non-breaking changes, so a compliant decoder should not interpret these bits.
93
94#### __`Seek_Table_Entries`__
95
96`Seek_Table_Entries` consists of `Number_Of_Frames` (one for each frame in the data, not including the seek table frame) entries of the following form, in sequence:
97
98|`Compressed_Size`|`Decompressed_Size`|`[Checksum]`|
99|-----------------|-------------------|------------|
100| 4 bytes         | 4 bytes           | 4 bytes    |
101
102__`Compressed_Size`__
103
104The compressed size of the frame.
105The cumulative sum of the `Compressed_Size` fields of frames `0` to `i` gives the offset in the compressed file of frame `i+1`.
106
107__`Decompressed_Size`__
108
109The size of the decompressed data contained in the frame.  For skippable or otherwise empty frames, this value is 0.
110
111__`Checksum`__
112
113Only present if `Checksum_Flag` is set in the `Seek_Table_Descriptor`.  Value : the least significant 32 bits of the XXH64 digest of the uncompressed data, stored in little-endian format.
114
115## Version Changes
116- 0.1.0: initial version
117