zdict.h - OpenGrok cross reference for /external/zstd/lib/zdict.h

Lines Matching full:dictionary
40  * Zstd dictionary builder
44  * Why should I use a dictionary?
51  * structure, you can train a dictionary on ahead of time on some samples of
52  * these files. Then, zstd can use the dictionary to find repetitions that are
55  * When is a dictionary useful?
59  * The larger a file is, the less benefit a dictionary will have. Generally,
60  * we don't expect dictionary compression to be effective past 100KB. And the
61  * smaller a file is, the more we would expect the dictionary to help.
63  * How do I use a dictionary?
66  * Simply pass the dictionary to the zstd compressor with
67  * `ZSTD_CCtx_loadDictionary()`. The same dictionary must then be passed to
72  * What is a zstd dictionary?
75  * A zstd dictionary has two pieces: Its header, and its content. The header
76  * contains a magic number, the dictionary ID, and entropy tables. These
81  * What is a raw content dictionary?
84  * A raw content dictionary is just bytes. It doesn't have a zstd dictionary
85  * header, a dictionary ID, or entropy tables. Any buffer is a valid raw
86  * content dictionary.
88  * How do I train a dictionary?
92  * other. If you have several use cases, you could try to train one dictionary
96  * dictionary. There are a few advanced versions of this function, but this
97  * is a great starting point. If you want to further tune your dictionary
101  * If the dictionary training function fails, that is likely because you
102  * either passed too few samples, or a dictionary would not be effective
103  * for your data. Look at the messages that the dictionary trainer printed,
104  * if it doesn't say too few samples, then a dictionary would not be effective.
106  * How large should my dictionary be?
109  * A reasonable dictionary size, the `dictBufferCapacity`, is about 100KB.
110  * The zstd CLI defaults to a 110KB dictionary. You likely don't need a
111  * dictionary larger than that. But, most use cases can get away with a
112  * smaller dictionary. The advanced dictionary builders can automatically
113  * shrink the dictionary for you, and select a the smallest size that
115  * A smaller dictionary can save memory, and potentially speed up
118  * How many samples should I provide to the dictionary builder?
121  * We generally recommend passing ~100x the size of the dictionary
125  * samples can slow down the dictionary builder.
127  * How do I determine if a dictionary will be effective?
130  * Simply train a dictionary and try it out. You can use zstd's built in
131  * benchmarking tool to test the dictionary effectiveness.
133  *   # Benchmark levels 1-3 without a dictionary
135  *   # Benchmark levels 1-3 with a dictionary
136  *   zstd -b1e3 -r /path/to/my/files -D /path/to/my/dictionary
138  * When should I retrain a dictionary?
141  * You should retrain a dictionary when its effectiveness drops. Dictionary
145  * retrain dictionaries, and if the new dictionary performs significantly
146  * better than the old dictionary, we will ship the new dictionary.
148  * I have a raw content dictionary, how do I turn it into a zstd dictionary?
151  * If you have a raw content dictionary, e.g. by manually constructing it, or
152  * using a third-party dictionary builder, you can turn it into a zstd
153  * dictionary by using `ZDICT_finalizeDictionary()`. You'll also have to
155  * raw content, which contains a dictionary ID and entropy tables, which
156  * will improve compression ratio, and allow zstd to write the dictionary ID
159  * Do I have to use zstd's dictionary builder?
162  * No! You can construct dictionary content however you please, it is just
163  * bytes. It will always be valid as a raw content dictionary. If you want
164  * a zstd dictionary, which can improve compression ratio, use
167  * What is the attack surface of a zstd dictionary?
172  * the dictionary is. However, if an attacker can control the dictionary
180  *  Train a dictionary from an array of samples.
185  *  The resulting dictionary will be saved into `dictBuffer`.
186  * @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
188  *  Note:  Dictionary training will fail if there are not enough samples to construct a
189  *         dictionary, or if most of the samples are too small (< 8 bytes being the lower limit).
190  *         If dictionary training fails, you should use zstd without a dictionary, as the dictionary
191 …     would've been ineffective anyways. If you believe your samples would benefit from a dictionary
194  *  Tips: In general, a reasonable dictionary has a size of ~ 100 KB.
197 …It's recommended that total size of all samples be about ~x100 times the target size of dictionary.
207 …                             *   NOTE: The zstd format reserves some dictionary IDs for future use.
209 …                       *         may be used by zstd in a public dictionary registry in the future.
210                                   *         These dictionary IDs are:
217  * Given a custom content as a basis for dictionary, and a set of samples,
218  * finalize dictionary by adding headers and statistics according to the zstd
219  * dictionary format.
224  * should be representative of what you will compress with this dictionary.
228  * compression level differ, so tuning the dictionary for the compression level
231  * You can set an explicit dictionary ID in `parameters`, or allow us to pick
232  * a random dictionary ID for you, but we can't guarantee no collisions.
237  * is presumed that the most profitable content is at the end of the dictionary,
242  * @return: size of dictionary stored into `dstDictBuffer` (<= `maxDictSize`),
258 …tBuffer, size_t dictSize);  /**< extracts dictID; @return zero if error (not a valid dictionary) */
288 …minimum size and selects the smallest dictionary that is shrinkDictMaxRegression% worse than the l…
289 …axRegression so that a smaller dictionary can be at worse shrinkDictMaxRegression% worse than the …
301 …minimum size and selects the smallest dictionary that is shrinkDictMaxRegression% worse than the l…
302 …axRegression so that a smaller dictionary can be at worse shrinkDictMaxRegression% worse than the …
308  *  Train a dictionary from an array of samples using the COVER algorithm.
311  *  The resulting dictionary will be saved into `dictBuffer`.
312  * @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
316  *  Tips: In general, a reasonable dictionary has a size of ~ 100 KB.
319 …It's recommended that total size of all samples be about ~x100 times the target size of dictionary.
330  * dictionary constructed with those parameters is stored in `dictBuffer`.
337  * @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
349  *  Train a dictionary from an array of samples using a modified version of COVER algorithm.
354  *  The resulting dictionary will be saved into `dictBuffer`.
355  * @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
359  *  Tips: In general, a reasonable dictionary has a size of ~ 100 KB.
362 …It's recommended that total size of all samples be about ~x100 times the target size of dictionary.
373  * dictionary constructed with those parameters is stored in `dictBuffer`.
381  * @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
393     unsigned selectivityLevel;   /* 0 means default; larger => select more => larger dictionary */
398  *  Train a dictionary from an array of samples.
401  *  The resulting dictionary will be saved into `dictBuffer`.
403  * @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
406  *  Tips: In general, a reasonable dictionary has a size of ~ 100 KB.
409 …It's recommended that total size of all samples be about ~x100 times the target size of dictionary.