• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# third_party_lzma
2
3## Description
4
5---
6LZMA SDK provides the documentation, samples, header files,
7libraries, and tools you need to develop applications that
8use 7z / LZMA / LZMA2 / XZ compression.
9
10LZMA is an improved version of famous LZ77 compression algorithm.
11It was improved in way of maximum increasing of compression ratio,
12keeping high decompression speed and low memory requirements for
13decompressing.
14
15LZMA2 is a LZMA based compression method. LZMA2 provides better
16multithreading support for compression than LZMA and some other improvements.
17
187z is a file format for data compression and file archiving.
197z is a main file format for 7-Zip compression program (www.7-zip.org).
207z format supports different compression methods: LZMA, LZMA2 and others.
217z also supports AES-256 based encryption.
22
23XZ is a file format for data compression that uses LZMA2 compression.
24XZ format provides additional features: SHA/CRC check, filters for
25improved compression ratio, splitting to blocks and streams
26
27---
28
29## Software Architecture
30
31---
32Source code:
33| format/algorithm  | C | C++ | C# | Java |
34| :------ | :---------| :----- | :----- | :----- |
35| LZMA compression and decompression                |  ✓         | ✓      |  ✓    |  ✓    |
36| LZMA2 compression and decompression               |  ✓         | ✓      |       |        |
37| XZ compression and decompression                  |  ✓         | ✓      |       |        |
38| 7z decompression                                  | ✓          | ✓      |       |        |
39| 7z compression                                    |            | ✓      |       |        |
40| small SFXs for installers (7z decompression)      |  ✓         |         |       |        |
41| SFXs and SFXs for installers (7z decompression)   |            | ✓      |       |        |
42
43---
44Source code structure
45
46```bash
47/third_party/lzma
48├── Asm                             # asm files (optimized code for CRC calculation and Intel-AES encryption)
49│   ├── arm
50│   ├── arm64
51│   └── x86
52├── C                               # C files (compression / decompression and other)
53│   └── Util
54│       ├── 7z                      # 7z decoder program (decoding 7z files)
55│       ├── Lzma                    # LZMA program (file->file LZMA encoder/decoder)
56│       ├── LzmaLib                 # LZMA library (.DLL for Windows)
57│       └── SfxSetup                # small SFX module for installers
58├── CPP
59│   ├── Common                      # common files for C++ projects
60│   ├── Windows                     # common files for Windows related code
61│   └── 7zip                        # files related to 7-Zip
62│       ├── Archive                 # files related to archiving
63│       │   ├── Common              # common files for archive handling
64│       │   └── 7z                  # 7z C++ Encoder/Decoder
65│       ├── Bundles                 # Modules that are bundles of other modules (files)
66│       │   ├── Alone7z             # 7zr.exe: Standalone 7-Zip console program (reduced version)
67│       │   ├── Format7zExtractR    # 7zxr.dll: Reduced version of 7z DLL: extracting from 7z/LZMA/BCJ/BCJ2.
68│       │   ├── Format7zR           # 7zr.dll:  Reduced version of 7z DLL: extracting/compressing to 7z/LZMA/BCJ/BCJ2
69│       │   ├── LzmaCon             # lzma.exe: LZMA compression/decompression
70│       │   ├── LzmaSpec            # example code for LZMA Specification
71│       │   ├── SFXCon              # 7zCon.sfx: Console 7z SFX module
72│       │   ├── SFXSetup            # 7zS.sfx: 7z SFX module for installers
73│       │   └── SFXWin              # 7z.sfx: GUI 7z SFX module
74│       ├── Common                  # common files for 7-Zip
75│       ├── Compress                # files for compression/decompression
76│       ├── Crypto                  # files for encryption / decompression
77│       └── UI                      # User Interface files
78│           ├── Client7z            # Test application for 7za.dll, 7zr.dll, 7zxr.dll
79│           ├── Common              # Common UI files
80│           ├── Console             # Code for console program (7z.exe)
81│           ├── Explorer            # Some code from 7-Zip Shell extension
82│           ├── FileManager         # Some GUI code from 7-Zip File Manager
83│           └── GUI                 # Some GUI code from 7-Zip
84├── CS
85│   └── 7zip
86│       ├── Common                  # some common files for 7-Zip
87│       └── Compress                # files related to compression/decompression
88│           ├── LZ                  # files related to LZ (Lempel-Ziv) compression algorithm
89│           ├── LZMA                # LZMA compression/decompression
90│           ├── LzmaAlone           # file->file LZMA compression/decompression
91│           └── RangeCoder          # Range Coder (special code of compression/decompression)
92├── DOC
93│   ├── 7zC.txt                     # 7z ANSI-C Decoder description
94│   ├── 7zFormat.txt                # 7z Format description
95│   ├── installer.txt               # information about 7-Zip for installers
96│   ├── lzma-history.txt            # history of LZMA SDK
97│   ├── lzma-sdk.txt                # LZMA SDK description
98│   ├── lzma-specification.txt      # Specification of LZMA
99│   ├── lzma.txt                    # LZMA compression description
100│   └── Methods.txt                 # Compression method IDs for .7z
101└── Java
102    └── SevenZip
103        └── Compression             # files related to compression/decompression
104            ├── LZ                  # files related to LZ (Lempel-Ziv) compression algorithm
105            ├── LZMA                # LZMA compression/decompression
106            └── RangeCoder          # Range Coder (special code of compression/decompression)
107```
108
109---
110
111## NOTICES / LICENSE
112
113LZMA SDK is written and placed in the public domain by Igor Pavlov.
114
115Some code in LZMA SDK is based on public domain code from another developers:
116
117  1) PPMd var.H (2001): Dmitry Shkarin
118  2) SHA-256: Wei Dai (Crypto++ library)
119
120Anyone is free to copy, modify, publish, use, compile, sell, or distribute the
121original LZMA SDK code, either in source code form or as a compiled binary, for
122any purpose, commercial or non-commercial, and by any means.
123
124LZMA SDK code is compatible with open source licenses, for example, you can
125include it to GNU GPL or GNU LGPL code.
126
127## Build
128
129### ***UNIX/Linux version***
130
131There are several options to compile 7-Zip with different compilers: gcc and clang.
132Also 7-Zip code contains two versions for some critical parts of code: in C and in Assembler.
133So if you compile the version with Assembler code, you will get faster 7-Zip binary.
134
1357-Zip's assembler code uses the following syntax for different platforms:
136
137#### *arm64: GNU assembler for ARM64 with preprocessor*
138
139That systax of that arm64 assembler code in 7-Zip is supported by GCC and CLANG for ARM64.
140
141#### *x86 and x86_64(AMD64)*
142
143There are 2 programs that supports MASM syntax in Linux.
144Asmc Macro Assembler and JWasm. But JWasm now doesn't support some cpu instructions used in 7-Zip.
145So you must install Asmc Macro Assembler in Linux, if you want to compile fastest version of 7-Zip  x86 and x86-64: [https://github.com/nidud/asmc](https://github.com/nidud/asmc)
146
147### ***Building commands***
148
149There are different binaries that can be compiled from 7-Zip source.
150There are 2 main files in folder for compiling:
151  makefile        - that can be used for compiling Windows version of 7-Zip with nmake command
152  makefile.gcc    - that can be used for compiling Linux/macOS versions of 7-Zip with make command
153
154At first you must change the current folder to folder that contains `makefile.gcc`:
155
156```bash
157    cd CPP/7zip/Bundles/Alone7z
158```
159
160Then you can compile `makefile.gcc` with the command:
161
162```bash
163    make -j -f makefile.gcc
164```
165
166Also there are additional "*.mak" files in folder "CPP/7zip/" that can be used to compile
1677-Zip binaries with optimized code and optimzing options.
168
169To compile with GCC without assembler:
170
171```bash
172  cd CPP/7zip/Bundles/Alone7z
173  make -j -f ../../cmpl_gcc.mak
174```
175
176Also you can change some compiler options in the mak files:
177  cmpl_gcc.mak
178  var_gcc.mak
179  warn_gcc.mak
180
181## Interface Usage
182
183This section describes LZMA encoding and decoding functions written in C language.
184
185Note: you can read also LZMA Specification (lzma-specification.txt from LZMA SDK)
186
187Also you can look source code for LZMA encoding and decoding:
188
189  ***C/Util/Lzma/LzmaUtil.c***
190
191### ***LZMA compressed file format***
192
193```bash
194Offset Size Description
195  0     1   Special LZMA properties (lc,lp, pb in encoded form)
196  1     4   Dictionary size (little endian)
197  5     8   Uncompressed size (little endian). -1 means unknown size
198 13         Compressed data
199```
200
201ANSI-C LZMA Decoder
202
203Please note that interfaces for ANSI-C code were changed in LZMA SDK 4.58.
204If you want to use old interfaces you can download previous version of LZMA SDK
205from sourceforge.net site.
206
207To use ANSI-C LZMA Decoder you need the following files:
208
209```bash
210  LzmaDec.h
211  LzmaDec.c
212  7zTypes.h
213  Precomp.h
214  Compiler.h
215```
216
217Look example code:
218  C/Util/Lzma/LzmaUtil.c
219
220Memory requirements for LZMA decoding
221
2221. Stack usage of LZMA decoding function for local variables is not larger than 200-400 bytes.
2232. LZMA Decoder uses dictionary buffer and internal state structure.
2243. Internal state structure consumes state_size = (4 + (1.5 << (lc + lp))) KB by default (lc=3, lp=0), state_size = 16 KB.
225
226### ***How To decompress data***
227
228LZMA Decoder (ANSI-C version) now supports 2 interfaces:
229
230**1)** Single-call Decompressing
231
232**2)** Multi-call State Decompressing (zlib-like interface)
233
234**You must use external allocator:**
235
236Example:
237
238```c
239void *SzAlloc(void *p, size_t size) { p = p; return malloc(size); }
240void SzFree(void *p, void *address) { p = p; free(address); }
241ISzAlloc alloc = { SzAlloc, SzFree };
242```
243
244You can use p = p; operator to disable compiler warnings.
245
246#### ***Single-call Decompressing***
247
2481. When to use: RAM->RAM decompressing
2492. Compile files: LzmaDec.h + LzmaDec.c + 7zTypes.h
2503. Compile defines: no defines
2514. Memory Requirements:
252
253- Input buffer: compressed size
254- Output buffer: uncompressed size
255- LZMA Internal Structures: state_size (16 KB for default settings)
256
257**Interface:**
258
259```c
260  int LzmaDecode(Byte *dest, SizeT *destLen, const Byte *src, SizeT *srcLen,
261      const Byte *propData, unsigned propSize, ELzmaFinishMode finishMode,
262      ELzmaStatus *status, ISzAlloc *alloc);
263  In:
264    dest     - output data
265    destLen  - output data size
266    src      - input data
267    srcLen   - input data size
268    propData - LZMA properties  (5 bytes)
269    propSize - size of propData buffer (5 bytes)
270    finishMode - It has meaning only if the decoding reaches output limit (*destLen).
271         LZMA_FINISH_ANY - Decode just destLen bytes.
272         LZMA_FINISH_END - Stream must be finished after (*destLen).
273                           You can use LZMA_FINISH_END, when you know that
274                           current output buffer covers last bytes of stream.
275    alloc    - Memory allocator.
276
277  Out:
278    destLen  - processed output size
279    srcLen   - processed input size
280
281  Output:
282    SZ_OK
283      status:
284        LZMA_STATUS_FINISHED_WITH_MARK
285        LZMA_STATUS_NOT_FINISHED
286        LZMA_STATUS_MAYBE_FINISHED_WITHOUT_MARK
287    SZ_ERROR_DATA - Data error
288    SZ_ERROR_MEM  - Memory allocation error
289    SZ_ERROR_UNSUPPORTED - Unsupported properties
290    SZ_ERROR_INPUT_EOF - It needs more bytes in input buffer (src).
291```
292
293  If LZMA decoder sees end_marker before reaching output limit, it returns OK result,
294  and output value of destLen will be less than output buffer size limit.
295
296  You can use multiple checks to test data integrity after full decompression:
297
298   1. Check Result and "status" variable.
299   2. Check that output(destLen) = uncompressedSize, if you know real uncompressedSize.
300   3. Check that output(srcLen) = compressedSize, if you know real compressedSize.
301       You must use correct finish mode in that case.
302
303#### ***Multi-call State Decompressing (zlib-like interface)***
304
3051. When to use: file->file decompressing
3062. Compile files: LzmaDec.h + LzmaDec.c + 7zTypes.h
3073. Memory Requirements:
308
309- Buffer for input stream: any size (for example, 16 KB)
310- Buffer for output stream: any size (for example, 16 KB)
311- LZMA Internal Structures: state_size (16 KB for default settings)
312- LZMA dictionary (dictionary size is encoded in LZMA properties header)
313
314**1)** read LZMA properties (5 bytes) and uncompressed size (8 bytes, little-endian) to header:
315
316```c
317   unsigned char header[LZMA_PROPS_SIZE + 8];
318   ReadFile(inFile, header, sizeof(header)
319```
320
321**2)** Allocate CLzmaDec structures (state + dictionary) using LZMA properties
322
323```c
324  CLzmaDec state;
325  LzmaDec_Constr(&state);
326  res = LzmaDec_Allocate(&state, header, LZMA_PROPS_SIZE, &g_Alloc);
327  if (res != SZ_OK)
328    return res;
329```
330
331**3)** Init LzmaDec structure before any new LZMA stream. And call LzmaDec_DecodeToBuf in loop
332
333```c
334  LzmaDec_Init(&state);
335  for (;;)
336  {
337    ...
338    int res = LzmaDec_DecodeToBuf(CLzmaDec *p, Byte *dest, SizeT *destLen,
339        const Byte *src, SizeT *srcLen, ELzmaFinishMode finishMode);
340    ...
341  }
342```
343
344**4)** Free all allocated structures
345
346```c
347  LzmaDec_Free(&state, &g_Alloc);
348```
349
350Look example code:
351  C/Util/Lzma/LzmaUtil.c
352
353### ***How To compress data***
354
3551 Compile files:
356
357```bash
358  7zTypes.h
359  Threads.h
360  LzmaEnc.h
361  LzmaEnc.c
362  LzFind.h
363  LzFind.c
364  LzFindMt.h
365  LzFindMt.c
366  LzHash.h
367```
368
3692 Memory Requirements:
370
371- (dictSize * 11.5 + 6 MB) + state_size
372
3733 Lzma Encoder can use two memory allocators:
374
375- alloc - for small arrays.
376- allocBig - for big arrays.
377
378For example, you can use Large RAM Pages (2 MB) in allocBig allocator for better compression speed. Note that Windows has bad implementation for Large RAM Pages.
379It's OK to use same allocator for alloc and allocBig.
380
381#### ***Single-call Compression with callbacks***
382
383Look example code:
384  C/Util/Lzma/LzmaUtil.c
385
386When to use: file->file compressing
387
388**1)** you must implement callback structures for interfaces:
389
390```c
391ISeqInStream
392ISeqOutStream
393ICompressProgress
394ISzAlloc
395
396static void *SzAlloc(void *p, size_t size) { p = p; return MyAlloc(size); }
397static void SzFree(void *p, void *address) {  p = p; MyFree(address); }
398static ISzAlloc g_Alloc = { SzAlloc, SzFree };
399
400  CFileSeqInStream inStream;
401  CFileSeqOutStream outStream;
402
403  inStream.funcTable.Read = MyRead;
404  inStream.file = inFile;
405  outStream.funcTable.Write = MyWrite;
406  outStream.file = outFile;
407```
408
409**2)** Create CLzmaEncHandle object;
410
411```c
412  CLzmaEncHandle enc;
413
414  enc = LzmaEnc_Create(&g_Alloc);
415  if (enc == 0)
416    return SZ_ERROR_MEM;
417```
418
419**3)** initialize CLzmaEncProps properties;
420
421```c
422  LzmaEncProps_Init(&props);
423```
424
425  Then you can change some properties in that structure.
426
427**4)** Send LZMA properties to LZMA Encoder
428
429```c
430  res = LzmaEnc_SetProps(enc, &props);
431```
432
433**5)** Write encoded properties to header
434
435```c
436    Byte header[LZMA_PROPS_SIZE + 8];
437    size_t headerSize = LZMA_PROPS_SIZE;
438    UInt64 fileSize;
439    int i;
440
441    res = LzmaEnc_WriteProperties(enc, header, &headerSize);
442    fileSize = MyGetFileLength(inFile);
443    for (i = 0; i < 8; i++)
444      header[headerSize++] = (Byte)(fileSize >> (8 * i));
445    MyWriteFileAndCheck(outFile, header, headerSize)
446```
447
448**6)** Call encoding function:
449
450```c
451      res = LzmaEnc_Encode(enc, &outStream.funcTable, &inStream.funcTable,
452        NULL, &g_Alloc, &g_Alloc);
453```
454
455**7)** Destroy LZMA Encoder Object
456
457```c
458  LzmaEnc_Destroy(enc, &g_Alloc, &g_Alloc);
459```
460
461If callback function return some error code, LzmaEnc_Encode also returns that code
462or it can return the code like SZ_ERROR_READ, SZ_ERROR_WRITE or SZ_ERROR_PROGRESS.
463
464---
465
466#### ***Single-call RAM->RAM Compression***
467
468Single-call RAM->RAM Compression is similar to Compression with callbacks,
469but you provide pointers to buffers instead of pointers to stream callbacks:
470
471```c
472SRes LzmaEncode(Byte *dest, SizeT *destLen, const Byte *src, SizeT srcLen,
473    const CLzmaEncProps *props, Byte *propsEncoded, SizeT *propsSize, int writeEndMark,
474    ICompressProgress *progress, ISzAlloc *alloc, ISzAlloc *allocBig);
475Return code:
476  SZ_OK               - OK
477  SZ_ERROR_MEM        - Memory allocation error
478  SZ_ERROR_PARAM      - Incorrect paramater
479  SZ_ERROR_OUTPUT_EOF - output buffer overflow
480  SZ_ERROR_THREAD     - errors in multithreading functions (only for Mt version)
481```
482
483Defines
484
485```bash
486_LZMA_SIZE_OPT          - Enable some optimizations in LZMA Decoder to get smaller executable code.
487_LZMA_PROB32            - It can increase the speed on some 32-bit CPUs, but memory usage for
488                        - some structures will be doubled in that case.
489_LZMA_UINT32_IS_ULONG   - Define it if int is 16-bit on your compiler and long is 32-bit.
490_LZMA_NO_SYSTEM_SIZE_T  - Define it if you don't want to use size_t type.
491_7ZIP_PPMD_SUPPPORT     - Define it if you don't want to support PPMD method in AMSI-C .7z decoder.
492```
493
494C++ LZMA Encoder/Decoder
495
496C++ LZMA code use COM-like interfaces. So if you want to use it, you can study basics of COM/OLE.
497
498C++ LZMA code is just wrapper over ANSI-C code.
499
500C++ Notes
501
502If you use some C++ code folders in 7-Zip (for example, C++ code for .7z handling),
503you must check that you correctly work with "new" operator.
504
5057-Zip can be compiled with MSVC 6.0 that doesn't throw "exception" from "new" operator.
506So 7-Zip uses "CPP\Common\NewHandler.cpp" that redefines "new" operator:
507
508```cpp
509operator new(size_t size)
510{
511  void *p = ::malloc(size);
512  if (p == 0)
513    throw CNewException();
514  return p;
515}
516```
517
518If you use MSCV that throws exception for "new" operator, you can compile without
519"NewHandler.cpp". So standard exception will be used. Actually some code of
5207-Zip catches any exception in internal code and converts it to HRESULT code.
521So you don't need to catch CNewException, if you call COM interfaces of 7-Zip.
522
523### ***Interface Examples:***
524
525Look example code : C/Util/Lzma/LzmaUtil.c
526
527```bash
528    cd C/Util/Lzma
529    make -j -f makefile.gcc
530    output: ./_o/7lzma
531```
532
533```bash
534    LZMA-C 22.01 (x64) : Igor Pavlov : Public domain : 2022-07-15
535
536    Usage:  lzma <e|d> inputFile outputFile
537    e: encode file
538    d: decode file
539```
540
541## Contribution
542
543[https://sourceforge.net/p/sevenzip/_list/tickets](https://sourceforge.net/p/sevenzip/_list/tickets)
544
545## Repositories Involved
546
547[**developtools\hiperf**](https://gitee.com/openharmony/developtools_hiperf)
548