• Home
  • Raw
  • Download

Lines Matching full:dictionary

40  * Zstd dictionary builder
44 * Why should I use a dictionary?
51 * structure, you can train a dictionary on ahead of time on some samples of
52 * these files. Then, zstd can use the dictionary to find repetitions that are
55 * When is a dictionary useful?
59 * The larger a file is, the less benefit a dictionary will have. Generally,
60 * we don't expect dictionary compression to be effective past 100KB. And the
61 * smaller a file is, the more we would expect the dictionary to help.
63 * How do I use a dictionary?
66 * Simply pass the dictionary to the zstd compressor with
67 * `ZSTD_CCtx_loadDictionary()`. The same dictionary must then be passed to
72 * What is a zstd dictionary?
75 * A zstd dictionary has two pieces: Its header, and its content. The header
76 * contains a magic number, the dictionary ID, and entropy tables. These
81 * What is a raw content dictionary?
84 * A raw content dictionary is just bytes. It doesn't have a zstd dictionary
85 * header, a dictionary ID, or entropy tables. Any buffer is a valid raw
86 * content dictionary.
88 * How do I train a dictionary?
92 * other. If you have several use cases, you could try to train one dictionary
96 * dictionary. There are a few advanced versions of this function, but this
97 * is a great starting point. If you want to further tune your dictionary
101 * If the dictionary training function fails, that is likely because you
102 * either passed too few samples, or a dictionary would not be effective
103 * for your data. Look at the messages that the dictionary trainer printed,
104 * if it doesn't say too few samples, then a dictionary would not be effective.
106 * How large should my dictionary be?
109 * A reasonable dictionary size, the `dictBufferCapacity`, is about 100KB.
110 * The zstd CLI defaults to a 110KB dictionary. You likely don't need a
111 * dictionary larger than that. But, most use cases can get away with a
112 * smaller dictionary. The advanced dictionary builders can automatically
113 * shrink the dictionary for you, and select a the smallest size that
115 * A smaller dictionary can save memory, and potentially speed up
118 * How many samples should I provide to the dictionary builder?
121 * We generally recommend passing ~100x the size of the dictionary
125 * samples can slow down the dictionary builder.
127 * How do I determine if a dictionary will be effective?
130 * Simply train a dictionary and try it out. You can use zstd's built in
131 * benchmarking tool to test the dictionary effectiveness.
133 * # Benchmark levels 1-3 without a dictionary
135 * # Benchmark levels 1-3 with a dictionary
136 * zstd -b1e3 -r /path/to/my/files -D /path/to/my/dictionary
138 * When should I retrain a dictionary?
141 * You should retrain a dictionary when its effectiveness drops. Dictionary
145 * retrain dictionaries, and if the new dictionary performs significantly
146 * better than the old dictionary, we will ship the new dictionary.
148 * I have a raw content dictionary, how do I turn it into a zstd dictionary?
151 * If you have a raw content dictionary, e.g. by manually constructing it, or
152 * using a third-party dictionary builder, you can turn it into a zstd
153 * dictionary by using `ZDICT_finalizeDictionary()`. You'll also have to
155 * raw content, which contains a dictionary ID and entropy tables, which
156 * will improve compression ratio, and allow zstd to write the dictionary ID
159 * Do I have to use zstd's dictionary builder?
162 * No! You can construct dictionary content however you please, it is just
163 * bytes. It will always be valid as a raw content dictionary. If you want
164 * a zstd dictionary, which can improve compression ratio, use
167 * What is the attack surface of a zstd dictionary?
172 * the dictionary is. However, if an attacker can control the dictionary
180 * Train a dictionary from an array of samples.
185 * The resulting dictionary will be saved into `dictBuffer`.
186 * @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
188 * Note: Dictionary training will fail if there are not enough samples to construct a
189 * dictionary, or if most of the samples are too small (< 8 bytes being the lower limit).
190 * If dictionary training fails, you should use zstd without a dictionary, as the dictionary
191 … would've been ineffective anyways. If you believe your samples would benefit from a dictionary
194 * Tips: In general, a reasonable dictionary has a size of ~ 100 KB.
197 …It's recommended that total size of all samples be about ~x100 times the target size of dictionary.
207 … * NOTE: The zstd format reserves some dictionary IDs for future use.
209 … * may be used by zstd in a public dictionary registry in the future.
210 * These dictionary IDs are:
217 * Given a custom content as a basis for dictionary, and a set of samples,
218 * finalize dictionary by adding headers and statistics according to the zstd
219 * dictionary format.
224 * should be representative of what you will compress with this dictionary.
228 * compression level differ, so tuning the dictionary for the compression level
231 * You can set an explicit dictionary ID in `parameters`, or allow us to pick
232 * a random dictionary ID for you, but we can't guarantee no collisions.
237 * is presumed that the most profitable content is at the end of the dictionary,
242 * @return: size of dictionary stored into `dstDictBuffer` (<= `maxDictSize`),
258 …tBuffer, size_t dictSize); /**< extracts dictID; @return zero if error (not a valid dictionary) */
288 …minimum size and selects the smallest dictionary that is shrinkDictMaxRegression% worse than the l…
289 …axRegression so that a smaller dictionary can be at worse shrinkDictMaxRegression% worse than the …
301 …minimum size and selects the smallest dictionary that is shrinkDictMaxRegression% worse than the l…
302 …axRegression so that a smaller dictionary can be at worse shrinkDictMaxRegression% worse than the …
308 * Train a dictionary from an array of samples using the COVER algorithm.
311 * The resulting dictionary will be saved into `dictBuffer`.
312 * @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
316 * Tips: In general, a reasonable dictionary has a size of ~ 100 KB.
319 …It's recommended that total size of all samples be about ~x100 times the target size of dictionary.
330 * dictionary constructed with those parameters is stored in `dictBuffer`.
337 * @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
349 * Train a dictionary from an array of samples using a modified version of COVER algorithm.
354 * The resulting dictionary will be saved into `dictBuffer`.
355 * @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
359 * Tips: In general, a reasonable dictionary has a size of ~ 100 KB.
362 …It's recommended that total size of all samples be about ~x100 times the target size of dictionary.
373 * dictionary constructed with those parameters is stored in `dictBuffer`.
381 * @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
393 unsigned selectivityLevel; /* 0 means default; larger => select more => larger dictionary */
398 * Train a dictionary from an array of samples.
401 * The resulting dictionary will be saved into `dictBuffer`.
403 * @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
406 * Tips: In general, a reasonable dictionary has a size of ~ 100 KB.
409 …It's recommended that total size of all samples be about ~x100 times the target size of dictionary.