1# third_party_lzma 2 3## Description 4 5--- 6LZMA SDK provides the documentation, samples, header files, 7libraries, and tools you need to develop applications that 8use 7z / LZMA / LZMA2 / XZ compression. 9 10LZMA is an improved version of famous LZ77 compression algorithm. 11It was improved in way of maximum increasing of compression ratio, 12keeping high decompression speed and low memory requirements for 13decompressing. 14 15LZMA2 is a LZMA based compression method. LZMA2 provides better 16multithreading support for compression than LZMA and some other improvements. 17 187z is a file format for data compression and file archiving. 197z is a main file format for 7-Zip compression program (www.7-zip.org). 207z format supports different compression methods: LZMA, LZMA2 and others. 217z also supports AES-256 based encryption. 22 23XZ is a file format for data compression that uses LZMA2 compression. 24XZ format provides additional features: SHA/CRC check, filters for 25improved compression ratio, splitting to blocks and streams 26 27--- 28 29## Software Architecture 30 31--- 32Source code: 33| format/algorithm | C | C++ | C# | Java | 34| :------ | :---------| :----- | :----- | :----- | 35| LZMA compression and decompression | ✓ | ✓ | ✓ | ✓ | 36| LZMA2 compression and decompression | ✓ | ✓ | | | 37| XZ compression and decompression | ✓ | ✓ | | | 38| 7z decompression | ✓ | ✓ | | | 39| 7z compression | | ✓ | | | 40| small SFXs for installers (7z decompression) | ✓ | | | | 41| SFXs and SFXs for installers (7z decompression) | | ✓ | | | 42 43--- 44Source code structure 45 46```bash 47/third_party/lzma 48├── Asm # asm files (optimized code for CRC calculation and Intel-AES encryption) 49│ ├── arm 50│ ├── arm64 51│ └── x86 52├── C # C files (compression / decompression and other) 53│ └── Util 54│ ├── 7z # 7z decoder program (decoding 7z files) 55│ ├── Lzma # LZMA program (file->file LZMA encoder/decoder) 56│ ├── LzmaLib # LZMA library (.DLL for Windows) 57│ └── SfxSetup # small SFX module for installers 58├── CPP 59│ ├── Common # common files for C++ projects 60│ ├── Windows # common files for Windows related code 61│ └── 7zip # files related to 7-Zip 62│ ├── Archive # files related to archiving 63│ │ ├── Common # common files for archive handling 64│ │ └── 7z # 7z C++ Encoder/Decoder 65│ ├── Bundles # Modules that are bundles of other modules (files) 66│ │ ├── Alone7z # 7zr.exe: Standalone 7-Zip console program (reduced version) 67│ │ ├── Format7zExtractR # 7zxr.dll: Reduced version of 7z DLL: extracting from 7z/LZMA/BCJ/BCJ2. 68│ │ ├── Format7zR # 7zr.dll: Reduced version of 7z DLL: extracting/compressing to 7z/LZMA/BCJ/BCJ2 69│ │ ├── LzmaCon # lzma.exe: LZMA compression/decompression 70│ │ ├── LzmaSpec # example code for LZMA Specification 71│ │ ├── SFXCon # 7zCon.sfx: Console 7z SFX module 72│ │ ├── SFXSetup # 7zS.sfx: 7z SFX module for installers 73│ │ └── SFXWin # 7z.sfx: GUI 7z SFX module 74│ ├── Common # common files for 7-Zip 75│ ├── Compress # files for compression/decompression 76│ ├── Crypto # files for encryption / decompression 77│ └── UI # User Interface files 78│ ├── Client7z # Test application for 7za.dll, 7zr.dll, 7zxr.dll 79│ ├── Common # Common UI files 80│ ├── Console # Code for console program (7z.exe) 81│ ├── Explorer # Some code from 7-Zip Shell extension 82│ ├── FileManager # Some GUI code from 7-Zip File Manager 83│ └── GUI # Some GUI code from 7-Zip 84├── CS 85│ └── 7zip 86│ ├── Common # some common files for 7-Zip 87│ └── Compress # files related to compression/decompression 88│ ├── LZ # files related to LZ (Lempel-Ziv) compression algorithm 89│ ├── LZMA # LZMA compression/decompression 90│ ├── LzmaAlone # file->file LZMA compression/decompression 91│ └── RangeCoder # Range Coder (special code of compression/decompression) 92├── DOC 93│ ├── 7zC.txt # 7z ANSI-C Decoder description 94│ ├── 7zFormat.txt # 7z Format description 95│ ├── installer.txt # information about 7-Zip for installers 96│ ├── lzma-history.txt # history of LZMA SDK 97│ ├── lzma-sdk.txt # LZMA SDK description 98│ ├── lzma-specification.txt # Specification of LZMA 99│ ├── lzma.txt # LZMA compression description 100│ └── Methods.txt # Compression method IDs for .7z 101└── Java 102 └── SevenZip 103 └── Compression # files related to compression/decompression 104 ├── LZ # files related to LZ (Lempel-Ziv) compression algorithm 105 ├── LZMA # LZMA compression/decompression 106 └── RangeCoder # Range Coder (special code of compression/decompression) 107``` 108 109--- 110 111## NOTICES / LICENSE 112 113LZMA SDK is written and placed in the public domain by Igor Pavlov. 114 115Some code in LZMA SDK is based on public domain code from another developers: 116 117 1) PPMd var.H (2001): Dmitry Shkarin 118 2) SHA-256: Wei Dai (Crypto++ library) 119 120Anyone is free to copy, modify, publish, use, compile, sell, or distribute the 121original LZMA SDK code, either in source code form or as a compiled binary, for 122any purpose, commercial or non-commercial, and by any means. 123 124LZMA SDK code is compatible with open source licenses, for example, you can 125include it to GNU GPL or GNU LGPL code. 126 127## Build 128 129### ***UNIX/Linux version*** 130 131There are several options to compile 7-Zip with different compilers: gcc and clang. 132Also 7-Zip code contains two versions for some critical parts of code: in C and in Assembler. 133So if you compile the version with Assembler code, you will get faster 7-Zip binary. 134 1357-Zip's assembler code uses the following syntax for different platforms: 136 137#### *arm64: GNU assembler for ARM64 with preprocessor* 138 139That systax of that arm64 assembler code in 7-Zip is supported by GCC and CLANG for ARM64. 140 141#### *x86 and x86_64(AMD64)* 142 143There are 2 programs that supports MASM syntax in Linux. 144Asmc Macro Assembler and JWasm. But JWasm now doesn't support some cpu instructions used in 7-Zip. 145So you must install Asmc Macro Assembler in Linux, if you want to compile fastest version of 7-Zip x86 and x86-64: [https://github.com/nidud/asmc](https://github.com/nidud/asmc) 146 147### ***Building commands*** 148 149There are different binaries that can be compiled from 7-Zip source. 150There are 2 main files in folder for compiling: 151 makefile - that can be used for compiling Windows version of 7-Zip with nmake command 152 makefile.gcc - that can be used for compiling Linux/macOS versions of 7-Zip with make command 153 154At first you must change the current folder to folder that contains `makefile.gcc`: 155 156```bash 157 cd CPP/7zip/Bundles/Alone7z 158``` 159 160Then you can compile `makefile.gcc` with the command: 161 162```bash 163 make -j -f makefile.gcc 164``` 165 166Also there are additional "*.mak" files in folder "CPP/7zip/" that can be used to compile 1677-Zip binaries with optimized code and optimzing options. 168 169To compile with GCC without assembler: 170 171```bash 172 cd CPP/7zip/Bundles/Alone7z 173 make -j -f ../../cmpl_gcc.mak 174``` 175 176Also you can change some compiler options in the mak files: 177 cmpl_gcc.mak 178 var_gcc.mak 179 warn_gcc.mak 180 181## Interface Usage 182 183This section describes LZMA encoding and decoding functions written in C language. 184 185Note: you can read also LZMA Specification (lzma-specification.txt from LZMA SDK) 186 187Also you can look source code for LZMA encoding and decoding: 188 189 ***C/Util/Lzma/LzmaUtil.c*** 190 191### ***LZMA compressed file format*** 192 193```bash 194Offset Size Description 195 0 1 Special LZMA properties (lc,lp, pb in encoded form) 196 1 4 Dictionary size (little endian) 197 5 8 Uncompressed size (little endian). -1 means unknown size 198 13 Compressed data 199``` 200 201ANSI-C LZMA Decoder 202 203Please note that interfaces for ANSI-C code were changed in LZMA SDK 4.58. 204If you want to use old interfaces you can download previous version of LZMA SDK 205from sourceforge.net site. 206 207To use ANSI-C LZMA Decoder you need the following files: 208 209```bash 210 LzmaDec.h 211 LzmaDec.c 212 7zTypes.h 213 Precomp.h 214 Compiler.h 215``` 216 217Look example code: 218 C/Util/Lzma/LzmaUtil.c 219 220Memory requirements for LZMA decoding 221 2221. Stack usage of LZMA decoding function for local variables is not larger than 200-400 bytes. 2232. LZMA Decoder uses dictionary buffer and internal state structure. 2243. Internal state structure consumes state_size = (4 + (1.5 << (lc + lp))) KB by default (lc=3, lp=0), state_size = 16 KB. 225 226### ***How To decompress data*** 227 228LZMA Decoder (ANSI-C version) now supports 2 interfaces: 229 230**1)** Single-call Decompressing 231 232**2)** Multi-call State Decompressing (zlib-like interface) 233 234**You must use external allocator:** 235 236Example: 237 238```c 239void *SzAlloc(void *p, size_t size) { p = p; return malloc(size); } 240void SzFree(void *p, void *address) { p = p; free(address); } 241ISzAlloc alloc = { SzAlloc, SzFree }; 242``` 243 244You can use p = p; operator to disable compiler warnings. 245 246#### ***Single-call Decompressing*** 247 2481. When to use: RAM->RAM decompressing 2492. Compile files: LzmaDec.h + LzmaDec.c + 7zTypes.h 2503. Compile defines: no defines 2514. Memory Requirements: 252 253- Input buffer: compressed size 254- Output buffer: uncompressed size 255- LZMA Internal Structures: state_size (16 KB for default settings) 256 257**Interface:** 258 259```c 260 int LzmaDecode(Byte *dest, SizeT *destLen, const Byte *src, SizeT *srcLen, 261 const Byte *propData, unsigned propSize, ELzmaFinishMode finishMode, 262 ELzmaStatus *status, ISzAlloc *alloc); 263 In: 264 dest - output data 265 destLen - output data size 266 src - input data 267 srcLen - input data size 268 propData - LZMA properties (5 bytes) 269 propSize - size of propData buffer (5 bytes) 270 finishMode - It has meaning only if the decoding reaches output limit (*destLen). 271 LZMA_FINISH_ANY - Decode just destLen bytes. 272 LZMA_FINISH_END - Stream must be finished after (*destLen). 273 You can use LZMA_FINISH_END, when you know that 274 current output buffer covers last bytes of stream. 275 alloc - Memory allocator. 276 277 Out: 278 destLen - processed output size 279 srcLen - processed input size 280 281 Output: 282 SZ_OK 283 status: 284 LZMA_STATUS_FINISHED_WITH_MARK 285 LZMA_STATUS_NOT_FINISHED 286 LZMA_STATUS_MAYBE_FINISHED_WITHOUT_MARK 287 SZ_ERROR_DATA - Data error 288 SZ_ERROR_MEM - Memory allocation error 289 SZ_ERROR_UNSUPPORTED - Unsupported properties 290 SZ_ERROR_INPUT_EOF - It needs more bytes in input buffer (src). 291``` 292 293 If LZMA decoder sees end_marker before reaching output limit, it returns OK result, 294 and output value of destLen will be less than output buffer size limit. 295 296 You can use multiple checks to test data integrity after full decompression: 297 298 1. Check Result and "status" variable. 299 2. Check that output(destLen) = uncompressedSize, if you know real uncompressedSize. 300 3. Check that output(srcLen) = compressedSize, if you know real compressedSize. 301 You must use correct finish mode in that case. 302 303#### ***Multi-call State Decompressing (zlib-like interface)*** 304 3051. When to use: file->file decompressing 3062. Compile files: LzmaDec.h + LzmaDec.c + 7zTypes.h 3073. Memory Requirements: 308 309- Buffer for input stream: any size (for example, 16 KB) 310- Buffer for output stream: any size (for example, 16 KB) 311- LZMA Internal Structures: state_size (16 KB for default settings) 312- LZMA dictionary (dictionary size is encoded in LZMA properties header) 313 314**1)** read LZMA properties (5 bytes) and uncompressed size (8 bytes, little-endian) to header: 315 316```c 317 unsigned char header[LZMA_PROPS_SIZE + 8]; 318 ReadFile(inFile, header, sizeof(header) 319``` 320 321**2)** Allocate CLzmaDec structures (state + dictionary) using LZMA properties 322 323```c 324 CLzmaDec state; 325 LzmaDec_Constr(&state); 326 res = LzmaDec_Allocate(&state, header, LZMA_PROPS_SIZE, &g_Alloc); 327 if (res != SZ_OK) 328 return res; 329``` 330 331**3)** Init LzmaDec structure before any new LZMA stream. And call LzmaDec_DecodeToBuf in loop 332 333```c 334 LzmaDec_Init(&state); 335 for (;;) 336 { 337 ... 338 int res = LzmaDec_DecodeToBuf(CLzmaDec *p, Byte *dest, SizeT *destLen, 339 const Byte *src, SizeT *srcLen, ELzmaFinishMode finishMode); 340 ... 341 } 342``` 343 344**4)** Free all allocated structures 345 346```c 347 LzmaDec_Free(&state, &g_Alloc); 348``` 349 350Look example code: 351 C/Util/Lzma/LzmaUtil.c 352 353### ***How To compress data*** 354 3551 Compile files: 356 357```bash 358 7zTypes.h 359 Threads.h 360 LzmaEnc.h 361 LzmaEnc.c 362 LzFind.h 363 LzFind.c 364 LzFindMt.h 365 LzFindMt.c 366 LzHash.h 367``` 368 3692 Memory Requirements: 370 371- (dictSize * 11.5 + 6 MB) + state_size 372 3733 Lzma Encoder can use two memory allocators: 374 375- alloc - for small arrays. 376- allocBig - for big arrays. 377 378For example, you can use Large RAM Pages (2 MB) in allocBig allocator for better compression speed. Note that Windows has bad implementation for Large RAM Pages. 379It's OK to use same allocator for alloc and allocBig. 380 381#### ***Single-call Compression with callbacks*** 382 383Look example code: 384 C/Util/Lzma/LzmaUtil.c 385 386When to use: file->file compressing 387 388**1)** you must implement callback structures for interfaces: 389 390```c 391ISeqInStream 392ISeqOutStream 393ICompressProgress 394ISzAlloc 395 396static void *SzAlloc(void *p, size_t size) { p = p; return MyAlloc(size); } 397static void SzFree(void *p, void *address) { p = p; MyFree(address); } 398static ISzAlloc g_Alloc = { SzAlloc, SzFree }; 399 400 CFileSeqInStream inStream; 401 CFileSeqOutStream outStream; 402 403 inStream.funcTable.Read = MyRead; 404 inStream.file = inFile; 405 outStream.funcTable.Write = MyWrite; 406 outStream.file = outFile; 407``` 408 409**2)** Create CLzmaEncHandle object; 410 411```c 412 CLzmaEncHandle enc; 413 414 enc = LzmaEnc_Create(&g_Alloc); 415 if (enc == 0) 416 return SZ_ERROR_MEM; 417``` 418 419**3)** initialize CLzmaEncProps properties; 420 421```c 422 LzmaEncProps_Init(&props); 423``` 424 425 Then you can change some properties in that structure. 426 427**4)** Send LZMA properties to LZMA Encoder 428 429```c 430 res = LzmaEnc_SetProps(enc, &props); 431``` 432 433**5)** Write encoded properties to header 434 435```c 436 Byte header[LZMA_PROPS_SIZE + 8]; 437 size_t headerSize = LZMA_PROPS_SIZE; 438 UInt64 fileSize; 439 int i; 440 441 res = LzmaEnc_WriteProperties(enc, header, &headerSize); 442 fileSize = MyGetFileLength(inFile); 443 for (i = 0; i < 8; i++) 444 header[headerSize++] = (Byte)(fileSize >> (8 * i)); 445 MyWriteFileAndCheck(outFile, header, headerSize) 446``` 447 448**6)** Call encoding function: 449 450```c 451 res = LzmaEnc_Encode(enc, &outStream.funcTable, &inStream.funcTable, 452 NULL, &g_Alloc, &g_Alloc); 453``` 454 455**7)** Destroy LZMA Encoder Object 456 457```c 458 LzmaEnc_Destroy(enc, &g_Alloc, &g_Alloc); 459``` 460 461If callback function return some error code, LzmaEnc_Encode also returns that code 462or it can return the code like SZ_ERROR_READ, SZ_ERROR_WRITE or SZ_ERROR_PROGRESS. 463 464--- 465 466#### ***Single-call RAM->RAM Compression*** 467 468Single-call RAM->RAM Compression is similar to Compression with callbacks, 469but you provide pointers to buffers instead of pointers to stream callbacks: 470 471```c 472SRes LzmaEncode(Byte *dest, SizeT *destLen, const Byte *src, SizeT srcLen, 473 const CLzmaEncProps *props, Byte *propsEncoded, SizeT *propsSize, int writeEndMark, 474 ICompressProgress *progress, ISzAlloc *alloc, ISzAlloc *allocBig); 475Return code: 476 SZ_OK - OK 477 SZ_ERROR_MEM - Memory allocation error 478 SZ_ERROR_PARAM - Incorrect paramater 479 SZ_ERROR_OUTPUT_EOF - output buffer overflow 480 SZ_ERROR_THREAD - errors in multithreading functions (only for Mt version) 481``` 482 483Defines 484 485```bash 486_LZMA_SIZE_OPT - Enable some optimizations in LZMA Decoder to get smaller executable code. 487_LZMA_PROB32 - It can increase the speed on some 32-bit CPUs, but memory usage for 488 - some structures will be doubled in that case. 489_LZMA_UINT32_IS_ULONG - Define it if int is 16-bit on your compiler and long is 32-bit. 490_LZMA_NO_SYSTEM_SIZE_T - Define it if you don't want to use size_t type. 491_7ZIP_PPMD_SUPPPORT - Define it if you don't want to support PPMD method in AMSI-C .7z decoder. 492``` 493 494C++ LZMA Encoder/Decoder 495 496C++ LZMA code use COM-like interfaces. So if you want to use it, you can study basics of COM/OLE. 497 498C++ LZMA code is just wrapper over ANSI-C code. 499 500C++ Notes 501 502If you use some C++ code folders in 7-Zip (for example, C++ code for .7z handling), 503you must check that you correctly work with "new" operator. 504 5057-Zip can be compiled with MSVC 6.0 that doesn't throw "exception" from "new" operator. 506So 7-Zip uses "CPP\Common\NewHandler.cpp" that redefines "new" operator: 507 508```cpp 509operator new(size_t size) 510{ 511 void *p = ::malloc(size); 512 if (p == 0) 513 throw CNewException(); 514 return p; 515} 516``` 517 518If you use MSCV that throws exception for "new" operator, you can compile without 519"NewHandler.cpp". So standard exception will be used. Actually some code of 5207-Zip catches any exception in internal code and converts it to HRESULT code. 521So you don't need to catch CNewException, if you call COM interfaces of 7-Zip. 522 523### ***Interface Examples:*** 524 525Look example code : C/Util/Lzma/LzmaUtil.c 526 527```bash 528 cd C/Util/Lzma 529 make -j -f makefile.gcc 530 output: ./_o/7lzma 531``` 532 533```bash 534 LZMA-C 22.01 (x64) : Igor Pavlov : Public domain : 2022-07-15 535 536 Usage: lzma <e|d> inputFile outputFile 537 e: encode file 538 d: decode file 539``` 540 541## Contribution 542 543[https://sourceforge.net/p/sevenzip/_list/tickets](https://sourceforge.net/p/sevenzip/_list/tickets) 544 545## Repositories Involved 546 547[**developtools\hiperf**](https://gitee.com/openharmony/developtools_hiperf) 548