1<?xml version="1.0"?> 2<!-- 3 4 Licensed to the Apache Software Foundation (ASF) under one or more 5 contributor license agreements. See the NOTICE file distributed with 6 this work for additional information regarding copyright ownership. 7 The ASF licenses this file to You under the Apache License, Version 2.0 8 (the "License"); you may not use this file except in compliance with 9 the License. You may obtain a copy of the License at 10 11 http://www.apache.org/licenses/LICENSE-2.0 12 13 Unless required by applicable law or agreed to in writing, software 14 distributed under the License is distributed on an "AS IS" BASIS, 15 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 16 See the License for the specific language governing permissions and 17 limitations under the License. 18 19--> 20<document> 21 <properties> 22 <title>Commons Compress User Guide</title> 23 <author email="dev@commons.apache.org">Commons Documentation Team</author> 24 </properties> 25 <body> 26 <section name="General Notes"> 27 28 <subsection name="Archivers and Compressors"> 29 <p>Commons Compress calls all formats that compress a single 30 stream of data compressor formats while all formats that 31 collect multiple entries inside a single (potentially 32 compressed) archive are archiver formats.</p> 33 34 <p>The compressor formats supported are gzip, bzip2, xz, lzma, 35 Pack200, DEFLATE, Brotli, DEFLATE64, ZStandard and Z, the archiver formats are 7z, ar, arj, 36 cpio, dump, tar and zip. Pack200 is a special case as it can 37 only compress JAR files.</p> 38 39 <p>We currently only provide read support for arj, 40 dump, Brotli, DEFLATE64 and Z. arj can only read uncompressed archives, 7z can read 41 archives with many compression and encryption algorithms 42 supported by 7z but doesn't support encryption when writing 43 archives.</p> 44 </subsection> 45 46 <subsection name="Buffering"> 47 <p>The stream classes all wrap around streams provided by the 48 calling code and they work on them directly without any 49 additional buffering. On the other hand most of them will 50 benefit from buffering so it is highly recommended that 51 users wrap their stream 52 in <code>Buffered<em>(In|Out)</em>putStream</code>s before 53 using the Commons Compress API.</p> 54 55 </subsection> 56 57 <subsection name="Factories"> 58 59 <p>Compress provides factory methods to create input/output 60 streams based on the names of the compressor or archiver 61 format as well as factory methods that try to guess the 62 format of an input stream.</p> 63 64 <p>To create a compressor writing to a given output by using 65 the algorithm name:</p> 66 <source><![CDATA[ 67CompressorOutputStream gzippedOut = new CompressorStreamFactory() 68 .createCompressorOutputStream(CompressorStreamFactory.GZIP, myOutputStream); 69]]></source> 70 71 <p>Make the factory guess the input format for a given 72 archiver stream:</p> 73 <source><![CDATA[ 74ArchiveInputStream input = new ArchiveStreamFactory() 75 .createArchiveInputStream(originalInput); 76]]></source> 77 78 <p>Make the factory guess the input format for a given 79 compressor stream:</p> 80 <source><![CDATA[ 81CompressorInputStream input = new CompressorStreamFactory() 82 .createCompressorInputStream(originalInput); 83]]></source> 84 85 <p>Note that there is no way to detect the lzma or Brotli formats so only 86 the two-arg version of 87 <code>createCompressorInputStream</code> can be used. Prior 88 to Compress 1.9 the .Z format hasn't been auto-detected 89 either.</p> 90 91 </subsection> 92 93 <subsection name="Restricting Memory Usage"> 94 <p>Starting with Compress 1.14 95 <code>CompressorStreamFactory</code> has an optional 96 constructor argument that can be used to set an upper limit of 97 memory that may be used while decompressing or compressing a 98 stream. As of 1.14 this setting only affects decompressing Z, 99 XZ and LZMA compressed streams.</p> 100 <p>For the Snappy and LZ4 formats the amount of memory used 101 during compression is directly proportional to the window 102 size.</p> 103 </subsection> 104 105 <subsection name="Statistics"> 106 <p>Starting with Compress 1.17 most of the 107 <code>CompressorInputStream</code> implementations as well as 108 <code>ZipArchiveInputStream</code> and all streams returned by 109 <code>ZipFile.getInputStream</code> implement the 110 <code>InputStreamStatistics</code> 111 interface. <code>SevenZFile</code> provides statistics for the 112 current entry via the 113 <code>getStatisticsForCurrentEntry</code> method. This 114 interface can be used to track progress while extracting a 115 stream or to detect potential <a 116 href="https://en.wikipedia.org/wiki/Zip_bomb">zip bombs</a> 117 when the compression ration becomes suspiciously large.</p> 118 </subsection> 119 120 </section> 121 <section name="Archivers"> 122 123 <subsection name="Unsupported Features"> 124 <p>Many of the supported formats have developed different 125 dialects and extensions and some formats allow for features 126 (not yet) supported by Commons Compress.</p> 127 128 <p>The <code>ArchiveInputStream</code> class provides a method 129 <code>canReadEntryData</code> that will return false if 130 Commons Compress can detect that an archive uses a feature 131 that is not supported by the current implementation. If it 132 returns false you should not try to read the entry but skip 133 over it.</p> 134 135 </subsection> 136 137 <subsection name="Entry Names"> 138 <p>All archive formats provide meta data about the individual 139 archive entries via instances of <code>ArchiveEntry</code> (or 140 rather subclasses of it). When reading from an archive the 141 information provided the <code>getName</code> method is the 142 raw name as stored inside of the archive. There is no 143 guarantee the name represents a relative file name or even a 144 valid file name on your target operating system at all. You 145 should double check the outcome when you try to create file 146 names from entry names.</p> 147 </subsection> 148 149 <subsection name="Common Extraction Logic"> 150 <p>Apart from 7z all formats provide a subclass of 151 <code>ArchiveInputStream</code> that can be used to create an 152 archive. For 7z <code>SevenZFile</code> provides a similar API 153 that does not represent a stream as our implementation 154 requires random access to the input and cannot be used for 155 general streams. The ZIP implementation can benefit a lot from 156 random access as well, see the <a 157 href="zip.html#ZipArchiveInputStream_vs_ZipFile">zip 158 page</a> for details.</p> 159 160 <p>Assuming you want to extract an archive to a target 161 directory you'd call <code>getNextEntry</code>, verify the 162 entry can be read, construct a sane file name from the entry's 163 name, create a <codee>File</codee> and write all contents to 164 it - here <code>IOUtils.copy</code> may come handy. You do so 165 for every entry until <code>getNextEntry</code> returns 166 <code>null</code>.</p> 167 168 <p>A skeleton might look like:</p> 169 170 <source><![CDATA[ 171File targetDir = ... 172try (ArchiveInputStream i = ... create the stream for your format, use buffering...) { 173 ArchiveEntry entry = null; 174 while ((entry = i.getNextEntry()) != null) { 175 if (!i.canReadEntryData(entry)) { 176 // log something? 177 continue; 178 } 179 String name = fileName(targetDir, entry); 180 File f = new File(name); 181 if (entry.isDirectory()) { 182 if (!f.isDirectory() && !f.mkdirs()) { 183 throw new IOException("failed to create directory " + f); 184 } 185 } else { 186 File parent = f.getParentFile(); 187 if (!parent.isDirectory() && !parent.mkdirs()) { 188 throw new IOException("failed to create directory " + parent); 189 } 190 try (OutputStream o = Files.newOutputStream(f.toPath())) { 191 IOUtils.copy(i, o); 192 } 193 } 194 } 195} 196]]></source> 197 198 <p>where the hypothetical <code>fileName</code> method is 199 written by you and provides the absolute name for the file 200 that is going to be written on disk. Here you should perform 201 checks that ensure the resulting file name actually is a valid 202 file name on your operating system or belongs to a file inside 203 of <code>targetDir</code> when using the entry's name as 204 input.</p> 205 206 <p>If you want to combine an archive format with a compression 207 format - like when reading a "tar.gz" file - you wrap the 208 <code>ArchiveInputStream</code> around 209 <code>CompressorInputStream</code> for example:</p> 210 211 <source><![CDATA[ 212try (InputStream fi = Files.newInputStream(Paths.get("my.tar.gz")); 213 InputStream bi = new BufferedInputStream(fi); 214 InputStream gzi = new GzipCompressorInputStream(bi); 215 ArchiveInputStream o = new TarArchiveInputStream(gzi)) { 216} 217]]></source> 218 219 </subsection> 220 221 <subsection name="Common Archival Logic"> 222 <p>Apart from 7z all formats that support writing provide a 223 subclass of <code>ArchiveOutputStream</code> that can be used 224 to create an archive. For 7z <code>SevenZOutputFile</code> 225 provides a similar API that does not represent a stream as our 226 implementation requires random access to the output and cannot 227 be used for general streams. The 228 <code>ZipArchiveOutputStream</code> class will benefit from 229 random access as well but can be used for non-seekable streams 230 - but not all features will be available and the archive size 231 might be slightly bigger, see <a 232 href="zip.html#ZipArchiveOutputStream">the zip page</a> for 233 details.</p> 234 235 <p>Assuming you want to add a collection of files to an 236 archive, you can first use <code>createArchiveEntry</code> for 237 each file. In general this will set a few flags (usually the 238 last modified time, the size and the information whether this 239 is a file or directory) based on the <code>File</code> 240 instance. Alternatively you can create the 241 <code>ArchiveEntry</code> subclass corresponding to your 242 format directly. Often you may want to set additional flags 243 like file permissions or owner information before adding the 244 entry to the archive.</p> 245 246 <p>Next you use <code>putArchiveEntry</code> in order to add 247 the entry and then start using <code>write</code> to add the 248 content of the entry - here <code>IOUtils.copy</code> may 249 come handy. Finally you invoke 250 <code>closeArchiveEntry</code> once you've written all content 251 and before you add the next entry.</p> 252 253 <p>Once all entries have been added you'd invoke 254 <code>finish</code> and finally <code>close</code> the 255 stream.</p> 256 257 <p>A skeleton might look like:</p> 258 259 <source><![CDATA[ 260Collection<File> filesToArchive = ... 261try (ArchiveOutputStream o = ... create the stream for your format ...) { 262 for (File f : filesToArchive) { 263 // maybe skip directories for formats like AR that don't store directories 264 ArchiveEntry entry = o.createArchiveEntry(f, entryName(f)); 265 // potentially add more flags to entry 266 o.putArchiveEntry(entry); 267 if (f.isFile()) { 268 try (InputStream i = Files.newInputStream(f.toPath())) { 269 IOUtils.copy(i, o); 270 } 271 } 272 o.closeArchiveEntry(); 273 } 274 out.finish(); 275} 276]]></source> 277 278 <p>where the hypothetical <code>entryName</code> method is 279 written by you and provides the name for the entry as it is 280 going to be written to the archive.</p> 281 282 <p>If you want to combine an archive format with a compression 283 format - like when creating a "tar.gz" file - you wrap the 284 <code>ArchiveOutputStream</code> around a 285 <code>CompressorOutputStream</code> for example:</p> 286 287 <source><![CDATA[ 288try (OutputStream fo = Files.newOutputStream(Paths.get("my.tar.gz")); 289 OutputStream gzo = new GzipCompressorOutputStream(fo); 290 ArchiveOutputStream o = new TarArchiveOutputStream(gzo)) { 291} 292]]></source> 293 294 </subsection> 295 296 <subsection name="7z"> 297 298 <p>Note that Commons Compress currently only supports a subset 299 of compression and encryption algorithms used for 7z archives. 300 For writing only uncompressed entries, LZMA, LZMA2, BZIP2 and 301 Deflate are supported - in addition to those reading supports 302 AES-256/SHA-256 and DEFLATE64.</p> 303 304 <p>Multipart archives are not supported at all.</p> 305 306 <p>7z archives can use multiple compression and encryption 307 methods as well as filters combined as a pipeline of methods 308 for its entries. Prior to Compress 1.8 you could only specify 309 a single method when creating archives - reading archives 310 using more than one method has been possible before. Starting 311 with Compress 1.8 it is possible to configure the full 312 pipeline using the <code>setContentMethods</code> method of 313 <code>SevenZOutputFile</code>. Methods are specified in the 314 order they appear inside the pipeline when creating the 315 archive, you can also specify certain parameters for some of 316 the methods - see the Javadocs of 317 <code>SevenZMethodConfiguration</code> for details.</p> 318 319 <p>When reading entries from an archive the 320 <code>getContentMethods</code> method of 321 <code>SevenZArchiveEntry</code> will properly represent the 322 compression/encryption/filter methods but may fail to 323 determine the configuration options used. As of Compress 1.8 324 only the dictionary size used for LZMA2 can be read.</p> 325 326 <p>Currently solid compression - compressing multiple files 327 as a single block to benefit from patterns repeating accross 328 files - is only supported when reading archives. This also 329 means compression ratio will likely be worse when using 330 Commons Compress compared to the native 7z executable.</p> 331 332 <p>Reading or writing requires a 333 <code>SeekableByteChannel</code> that will be obtained 334 transparently when reading from or writing to a file. The 335 class 336 <code>org.apache.commons.compress.utils.SeekableInMemoryByteChannel</code> 337 allows you to read from or write to an in-memory archive.</p> 338 339 <p>Adding an entry to a 7z archive:</p> 340<source><![CDATA[ 341SevenZOutputFile sevenZOutput = new SevenZOutputFile(file); 342SevenZArchiveEntry entry = sevenZOutput.createArchiveEntry(fileToArchive, name); 343sevenZOutput.putArchiveEntry(entry); 344sevenZOutput.write(contentOfEntry); 345sevenZOutput.closeArchiveEntry(); 346]]></source> 347 348 <p>Uncompressing a given 7z archive (you would 349 certainly add exception handling and make sure all streams 350 get closed properly):</p> 351<source><![CDATA[ 352SevenZFile sevenZFile = new SevenZFile(new File("archive.7z")); 353SevenZArchiveEntry entry = sevenZFile.getNextEntry(); 354byte[] content = new byte[entry.getSize()]; 355LOOP UNTIL entry.getSize() HAS BEEN READ { 356 sevenZFile.read(content, offset, content.length - offset); 357} 358]]></source> 359 360 <p>Uncompressing a given in-memory 7z archive:</p> 361 <source><![CDATA[ 362byte[] inputData; // 7z archive contents 363SeekableInMemoryByteChannel inMemoryByteChannel = new SeekableInMemoryByteChannel(inputData); 364SevenZFile sevenZFile = new SevenZFile(inMemoryByteChannel); 365SevenZArchiveEntry entry = sevenZFile.getNextEntry(); 366sevenZFile.read(); // read current entry's data 367]]></source> 368 369 <h4><a name="Encrypted 7z Archives"></a>Encrypted 7z Archives</h4> 370 371 <p>Currently Compress supports reading but not writing of 372 encrypted archives. When reading an encrypted archive a 373 password has to be provided to one of 374 <code>SevenZFile</code>'s constructors. If you try to read 375 an encrypted archive without specifying a password a 376 <code>PasswordRequiredException</code> (a subclass of 377 <code>IOException</code>) will be thrown.</p> 378 379 <p>When specifying the password as a <code>byte[]</code> one 380 common mistake is to use the wrong encoding when creating 381 the <code>byte[]</code> from a <code>String</code>. The 382 <code>SevenZFile</code> class expects the bytes to 383 correspond to the UTF16-LE encoding of the password. An 384 example of reading an encrypted archive is</p> 385 386<source><![CDATA[ 387SevenZFile sevenZFile = new SevenZFile(new File("archive.7z"), "secret".getBytes(StandardCharsets.UTF_16LE)); 388SevenZArchiveEntry entry = sevenZFile.getNextEntry(); 389byte[] content = new byte[entry.getSize()]; 390LOOP UNTIL entry.getSize() HAS BEEN READ { 391 sevenZFile.read(content, offset, content.length - offset); 392} 393]]></source> 394 395 <p>Starting with Compress 1.17 new constructors have been 396 added that accept the password as <code>char[]</code> rather 397 than a <code>byte[]</code>. We recommend you use these in 398 order to avoid the problem above.</p> 399 400<source><![CDATA[ 401SevenZFile sevenZFile = new SevenZFile(new File("archive.7z"), "secret".toCharArray()); 402SevenZArchiveEntry entry = sevenZFile.getNextEntry(); 403byte[] content = new byte[entry.getSize()]; 404LOOP UNTIL entry.getSize() HAS BEEN READ { 405 sevenZFile.read(content, offset, content.length - offset); 406} 407]]></source> 408 409 </subsection> 410 411 <subsection name="ar"> 412 413 <p>In addition to the information stored 414 in <code>ArchiveEntry</code> a <code>ArArchiveEntry</code> 415 stores information about the owner user and group as well as 416 Unix permissions.</p> 417 418 <p>Adding an entry to an ar archive:</p> 419<source><![CDATA[ 420ArArchiveEntry entry = new ArArchiveEntry(name, size); 421arOutput.putArchiveEntry(entry); 422arOutput.write(contentOfEntry); 423arOutput.closeArchiveEntry(); 424]]></source> 425 426 <p>Reading entries from an ar archive:</p> 427<source><![CDATA[ 428ArArchiveEntry entry = (ArArchiveEntry) arInput.getNextEntry(); 429byte[] content = new byte[entry.getSize()]; 430LOOP UNTIL entry.getSize() HAS BEEN READ { 431 arInput.read(content, offset, content.length - offset); 432} 433]]></source> 434 435 <p>Traditionally the AR format doesn't allow file names longer 436 than 16 characters. There are two variants that circumvent 437 this limitation in different ways, the GNU/SRV4 and the BSD 438 variant. Commons Compress 1.0 to 1.2 can only read archives 439 using the GNU/SRV4 variant, support for the BSD variant has 440 been added in Commons Compress 1.3. Commons Compress 1.3 441 also optionally supports writing archives with file names 442 longer than 16 characters using the BSD dialect, writing 443 the SVR4/GNU dialect is not supported.</p> 444 445 <table> 446 <thead> 447 <tr> 448 <th>Version of Apache Commons Compress</th> 449 <th>Support for Traditional AR Format</th> 450 <th>Support for GNU/SRV4 Dialect</th> 451 <th>Support for BSD Dialect</th> 452 </tr> 453 </thead> 454 <tbody> 455 <tr> 456 <td>1.0 to 1.2</td> 457 <td>read/write</td> 458 <td>read</td> 459 <td>-</td> 460 </tr> 461 <tr> 462 <td>1.3 and later</td> 463 <td>read/write</td> 464 <td>read</td> 465 <td>read/write</td> 466 </tr> 467 </tbody> 468 </table> 469 470 <p>It is not possible to detect the end of an AR archive in a 471 reliable way so <code>ArArchiveInputStream</code> will read 472 until it reaches the end of the stream or fails to parse the 473 stream's content as AR entries.</p> 474 475 </subsection> 476 477 <subsection name="arj"> 478 479 <p>Note that Commons Compress doesn't support compressed, 480 encrypted or multi-volume ARJ archives, yet.</p> 481 482 <p>Uncompressing a given arj archive (you would 483 certainly add exception handling and make sure all streams 484 get closed properly):</p> 485<source><![CDATA[ 486ArjArchiveEntry entry = arjInput.getNextEntry(); 487byte[] content = new byte[entry.getSize()]; 488LOOP UNTIL entry.getSize() HAS BEEN READ { 489 arjInput.read(content, offset, content.length - offset); 490} 491]]></source> 492 </subsection> 493 494 <subsection name="cpio"> 495 496 <p>In addition to the information stored 497 in <code>ArchiveEntry</code> a <code>CpioArchiveEntry</code> 498 stores various attributes including information about the 499 original owner and permissions.</p> 500 501 <p>The cpio package supports the "new portable" as well as the 502 "old" format of CPIO archives in their binary, ASCII and 503 "with CRC" variants.</p> 504 505 <p>Adding an entry to a cpio archive:</p> 506<source><![CDATA[ 507CpioArchiveEntry entry = new CpioArchiveEntry(name, size); 508cpioOutput.putArchiveEntry(entry); 509cpioOutput.write(contentOfEntry); 510cpioOutput.closeArchiveEntry(); 511]]></source> 512 513 <p>Reading entries from an cpio archive:</p> 514<source><![CDATA[ 515CpioArchiveEntry entry = cpioInput.getNextCPIOEntry(); 516byte[] content = new byte[entry.getSize()]; 517LOOP UNTIL entry.getSize() HAS BEEN READ { 518 cpioInput.read(content, offset, content.length - offset); 519} 520]]></source> 521 522 <p>Traditionally CPIO archives are written in blocks of 512 523 bytes - the block size is a configuration parameter of the 524 <code>Cpio*Stream</code>'s constuctors. Starting with version 525 1.5 <code>CpioArchiveInputStream</code> will consume the 526 padding written to fill the current block when the end of the 527 archive is reached. Unfortunately many CPIO implementations 528 use larger block sizes so there may be more zero-byte padding 529 left inside the original input stream after the archive has 530 been consumed completely.</p> 531 532 </subsection> 533 534 <subsection name="jar"> 535 <p>In general, JAR archives are ZIP files, so the JAR package 536 supports all options provided by the <a href="#zip">ZIP</a> package.</p> 537 538 <p>To be interoperable JAR archives should always be created 539 using the UTF-8 encoding for file names (which is the 540 default).</p> 541 542 <p>Archives created using <code>JarArchiveOutputStream</code> 543 will implicitly add a <code>JarMarker</code> extra field to 544 the very first archive entry of the archive which will make 545 Solaris recognize them as Java archives and allows them to 546 be used as executables.</p> 547 548 <p>Note that <code>ArchiveStreamFactory</code> doesn't 549 distinguish ZIP archives from JAR archives, so if you use 550 the one-argument <code>createArchiveInputStream</code> 551 method on a JAR archive, it will still return the more 552 generic <code>ZipArchiveInputStream</code>.</p> 553 554 <p>The <code>JarArchiveEntry</code> class contains fields for 555 certificates and attributes that are planned to be supported 556 in the future but are not supported as of Compress 1.0.</p> 557 558 <p>Adding an entry to a jar archive:</p> 559<source><![CDATA[ 560JarArchiveEntry entry = new JarArchiveEntry(name, size); 561entry.setSize(size); 562jarOutput.putArchiveEntry(entry); 563jarOutput.write(contentOfEntry); 564jarOutput.closeArchiveEntry(); 565]]></source> 566 567 <p>Reading entries from an jar archive:</p> 568<source><![CDATA[ 569JarArchiveEntry entry = jarInput.getNextJarEntry(); 570byte[] content = new byte[entry.getSize()]; 571LOOP UNTIL entry.getSize() HAS BEEN READ { 572 jarInput.read(content, offset, content.length - offset); 573} 574]]></source> 575 </subsection> 576 577 <subsection name="dump"> 578 579 <p>In addition to the information stored 580 in <code>ArchiveEntry</code> a <code>DumpArchiveEntry</code> 581 stores various attributes including information about the 582 original owner and permissions.</p> 583 584 <p>As of Commons Compress 1.3 only dump archives using the 585 new-fs format - this is the most common variant - are 586 supported. Right now this library supports uncompressed and 587 ZLIB compressed archives and can not write archives at 588 all.</p> 589 590 <p>Reading entries from an dump archive:</p> 591<source><![CDATA[ 592DumpArchiveEntry entry = dumpInput.getNextDumpEntry(); 593byte[] content = new byte[entry.getSize()]; 594LOOP UNTIL entry.getSize() HAS BEEN READ { 595 dumpInput.read(content, offset, content.length - offset); 596} 597]]></source> 598 599 <p>Prior to version 1.5 <code>DumpArchiveInputStream</code> 600 would close the original input once it had read the last 601 record. Starting with version 1.5 it will not close the 602 stream implicitly.</p> 603 604 </subsection> 605 606 <subsection name="tar"> 607 608 <p>The TAR package has a <a href="tar.html">dedicated 609 documentation page</a>.</p> 610 611 <p>Adding an entry to a tar archive:</p> 612<source><![CDATA[ 613TarArchiveEntry entry = new TarArchiveEntry(name); 614entry.setSize(size); 615tarOutput.putArchiveEntry(entry); 616tarOutput.write(contentOfEntry); 617tarOutput.closeArchiveEntry(); 618]]></source> 619 620 <p>Reading entries from an tar archive:</p> 621<source><![CDATA[ 622TarArchiveEntry entry = tarInput.getNextTarEntry(); 623byte[] content = new byte[entry.getSize()]; 624LOOP UNTIL entry.getSize() HAS BEEN READ { 625 tarInput.read(content, offset, content.length - offset); 626} 627]]></source> 628 </subsection> 629 630 <subsection name="zip"> 631 <p>The ZIP package has a <a href="zip.html">dedicated 632 documentation page</a>.</p> 633 634 <p>Adding an entry to a zip archive:</p> 635<source><![CDATA[ 636ZipArchiveEntry entry = new ZipArchiveEntry(name); 637entry.setSize(size); 638zipOutput.putArchiveEntry(entry); 639zipOutput.write(contentOfEntry); 640zipOutput.closeArchiveEntry(); 641]]></source> 642 643 <p><code>ZipArchiveOutputStream</code> can use some internal 644 optimizations exploiting <code>SeekableByteChannel</code> if it 645 knows it is writing to a seekable output rather than a non-seekable 646 stream. If you are writing to a file, you should use the 647 constructor that accepts a <code>File</code> or 648 <code>SeekableByteChannel</code> argument rather 649 than the one using an <code>OutputStream</code> or the 650 factory method in <code>ArchiveStreamFactory</code>.</p> 651 652 <p>Reading entries from an zip archive:</p> 653<source><![CDATA[ 654ZipArchiveEntry entry = zipInput.getNextZipEntry(); 655byte[] content = new byte[entry.getSize()]; 656LOOP UNTIL entry.getSize() HAS BEEN READ { 657 zipInput.read(content, offset, content.length - offset); 658} 659]]></source> 660 661 <p>Reading entries from an zip archive using the 662 recommended <code>ZipFile</code> class:</p> 663<source><![CDATA[ 664ZipArchiveEntry entry = zipFile.getEntry(name); 665InputStream content = zipFile.getInputStream(entry); 666try { 667 READ UNTIL content IS EXHAUSTED 668} finally { 669 content.close(); 670} 671]]></source> 672 673 <p>Reading entries from an in-memory zip archive using 674 <code>SeekableInMemoryByteChannel</code> and <code>ZipFile</code> class:</p> 675<source><![CDATA[ 676byte[] inputData; // zip archive contents 677SeekableInMemoryByteChannel inMemoryByteChannel = new SeekableInMemoryByteChannel(inputData); 678ZipFile zipFile = new ZipFile(inMemoryByteChannel); 679ZipArchiveEntry archiveEntry = zipFile.getEntry("entryName"); 680InputStream inputStream = zipFile.getInputStream(archiveEntry); 681inputStream.read() // read data from the input stream 682]]></source> 683 684 <p>Creating a zip file with multiple threads:</p> 685 686 A simple implementation to create a zip file might look like this: 687 688<source> 689public class ScatterSample { 690 691 ParallelScatterZipCreator scatterZipCreator = new ParallelScatterZipCreator(); 692 ScatterZipOutputStream dirs = ScatterZipOutputStream.fileBased(File.createTempFile("scatter-dirs", "tmp")); 693 694 public ScatterSample() throws IOException { 695 } 696 697 public void addEntry(ZipArchiveEntry zipArchiveEntry, InputStreamSupplier streamSupplier) throws IOException { 698 if (zipArchiveEntry.isDirectory() && !zipArchiveEntry.isUnixSymlink()) 699 dirs.addArchiveEntry(ZipArchiveEntryRequest.createZipArchiveEntryRequest(zipArchiveEntry, streamSupplier)); 700 else 701 scatterZipCreator.addArchiveEntry( zipArchiveEntry, streamSupplier); 702 } 703 704 public void writeTo(ZipArchiveOutputStream zipArchiveOutputStream) 705 throws IOException, ExecutionException, InterruptedException { 706 dirs.writeTo(zipArchiveOutputStream); 707 dirs.close(); 708 scatterZipCreator.writeTo(zipArchiveOutputStream); 709 } 710} 711</source> 712 </subsection> 713 714 </section> 715 <section name="Compressors"> 716 717 <subsection name="Concatenated Streams"> 718 <p>For the bzip2, gzip and xz formats as well as the framed 719 lz4 format a single compressed file 720 may actually consist of several streams that will be 721 concatenated by the command line utilities when decompressing 722 them. Starting with Commons Compress 1.4 the 723 <code>*CompressorInputStream</code>s for these formats support 724 concatenating streams as well, but they won't do so by 725 default. You must use the two-arg constructor and explicitly 726 enable the support.</p> 727 </subsection> 728 729 <subsection name="Brotli"> 730 731 <p>The implementation of this package is provided by the 732 <a href="https://github.com/google/brotli">Google Brotli dec</a> library.</p> 733 734 <p>Uncompressing a given Brotli compressed file (you would 735 certainly add exception handling and make sure all streams 736 get closed properly):</p> 737<source><![CDATA[ 738InputStream fin = Files.newInputStream(Paths.get("archive.tar.br")); 739BufferedInputStream in = new BufferedInputStream(fin); 740OutputStream out = Files.newOutputStream(Paths.get("archive.tar")); 741BrotliCompressorInputStream brIn = new BrotliCompressorInputStream(in); 742final byte[] buffer = new byte[buffersize]; 743int n = 0; 744while (-1 != (n = brIn.read(buffer))) { 745 out.write(buffer, 0, n); 746} 747out.close(); 748brIn.close(); 749]]></source> 750 </subsection> 751 752 <subsection name="bzip2"> 753 754 <p>Note that <code>BZipCompressorOutputStream</code> keeps 755 hold of some big data structures in memory. While it is 756 recommended for <em>any</em> stream that you close it as soon as 757 you no longer need it, this is even more important 758 for <code>BZipCompressorOutputStream</code>.</p> 759 760 <p>Uncompressing a given bzip2 compressed file (you would 761 certainly add exception handling and make sure all streams 762 get closed properly):</p> 763<source><![CDATA[ 764InputStream fin = Files.newInputStream(Paths.get("archive.tar.bz2")); 765BufferedInputStream in = new BufferedInputStream(fin); 766OutputStream out = Files.newOutputStream(Paths.get("archive.tar")); 767BZip2CompressorInputStream bzIn = new BZip2CompressorInputStream(in); 768final byte[] buffer = new byte[buffersize]; 769int n = 0; 770while (-1 != (n = bzIn.read(buffer))) { 771 out.write(buffer, 0, n); 772} 773out.close(); 774bzIn.close(); 775]]></source> 776 777 <p>Compressing a given file using bzip2 (you would 778 certainly add exception handling and make sure all streams 779 get closed properly):</p> 780<source><![CDATA[ 781InputStream in = Files.newInputStream(Paths.get("archive.tar")); 782OutputStream fout = Files.newOutputStream(Paths.get("archive.tar.gz")); 783BufferedOutputStream out = new BufferedOutputStream(fout); 784BZip2CompressorOutputStream bzOut = new BZip2CompressorOutputStream(out); 785final byte[] buffer = new byte[buffersize]; 786int n = 0; 787while (-1 != (n = in.read(buffer))) { 788 bzOut.write(buffer, 0, n); 789} 790bzOut.close(); 791in.close(); 792]]></source> 793 794 </subsection> 795 796 <subsection name="DEFLATE"> 797 798 <p>The implementation of the DEFLATE/INFLATE code used by this 799 package is provided by the <code>java.util.zip</code> package 800 of the Java class library.</p> 801 802 <p>Uncompressing a given DEFLATE compressed file (you would 803 certainly add exception handling and make sure all streams 804 get closed properly):</p> 805<source><![CDATA[ 806InputStream fin = Files.newInputStream(Paths.get("some-file")); 807BufferedInputStream in = new BufferedInputStream(fin); 808OutputStream out = Files.newOutputStream(Paths.get("archive.tar")); 809DeflateCompressorInputStream defIn = new DeflateCompressorInputStream(in); 810final byte[] buffer = new byte[buffersize]; 811int n = 0; 812while (-1 != (n = defIn.read(buffer))) { 813 out.write(buffer, 0, n); 814} 815out.close(); 816defIn.close(); 817]]></source> 818 819 <p>Compressing a given file using DEFLATE (you would 820 certainly add exception handling and make sure all streams 821 get closed properly):</p> 822<source><![CDATA[ 823InputStream in = Files.newInputStream(Paths.get("archive.tar")); 824OutputStream fout = Files.newOutputStream(Paths.get("some-file")); 825BufferedOutputStream out = new BufferedOutputStream(fout); 826DeflateCompressorOutputStream defOut = new DeflateCompressorOutputStream(out); 827final byte[] buffer = new byte[buffersize]; 828int n = 0; 829while (-1 != (n = in.read(buffer))) { 830 defOut.write(buffer, 0, n); 831} 832defOut.close(); 833in.close(); 834]]></source> 835 836 </subsection> 837 838 <subsection name="DEFLATE64"> 839 840 <p>Uncompressing a given DEFLATE64 compressed file (you would 841 certainly add exception handling and make sure all streams 842 get closed properly):</p> 843<source><![CDATA[ 844InputStream fin = Files.newInputStream(Paths.get("some-file")); 845BufferedInputStream in = new BufferedInputStream(fin); 846OutputStream out = Files.newOutputStream(Paths.get("archive.tar")); 847Deflate64CompressorInputStream defIn = new Deflate64CompressorInputStream(in); 848final byte[] buffer = new byte[buffersize]; 849int n = 0; 850while (-1 != (n = defIn.read(buffer))) { 851 out.write(buffer, 0, n); 852} 853out.close(); 854defIn.close(); 855]]></source> 856 857 </subsection> 858 859 <subsection name="gzip"> 860 861 <p>The implementation of the DEFLATE/INFLATE code used by this 862 package is provided by the <code>java.util.zip</code> package 863 of the Java class library.</p> 864 865 <p>Uncompressing a given gzip compressed file (you would 866 certainly add exception handling and make sure all streams 867 get closed properly):</p> 868<source><![CDATA[ 869InputStream fin = Files.newInputStream(Paths.get("archive.tar.gz")); 870BufferedInputStream in = new BufferedInputStream(fin); 871OutputStream out = Files.newOutputStream(Paths.get("archive.tar")); 872GzipCompressorInputStream gzIn = new GzipCompressorInputStream(in); 873final byte[] buffer = new byte[buffersize]; 874int n = 0; 875while (-1 != (n = gzIn.read(buffer))) { 876 out.write(buffer, 0, n); 877} 878out.close(); 879gzIn.close(); 880]]></source> 881 882 <p>Compressing a given file using gzip (you would 883 certainly add exception handling and make sure all streams 884 get closed properly):</p> 885<source><![CDATA[ 886InputStream in = Files.newInputStream(Paths.get("archive.tar")); 887OutputStream fout = Files.newOutputStream(Paths.get("archive.tar.gz")); 888BufferedOutputStream out = new BufferedOutputStream(fout); 889GzipCompressorOutputStream gzOut = new GzipCompressorOutputStream(out); 890final byte[] buffer = new byte[buffersize]; 891int n = 0; 892while (-1 != (n = in.read(buffer))) { 893 gzOut.write(buffer, 0, n); 894} 895gzOut.close(); 896in.close(); 897]]></source> 898 899 </subsection> 900 901 <subsection name="LZ4"> 902 903 <p>There are two different "formats" used for <a 904 href="http://lz4.github.io/lz4/">lz4</a>. The format called 905 "block format" only contains the raw compressed data while the 906 other provides a higher level "frame format" - Commons 907 Compress offers two different stream classes for reading or 908 writing either format.</p> 909 910 <p>Uncompressing a given frame LZ4 file (you would 911 certainly add exception handling and make sure all streams 912 get closed properly):</p> 913<source><![CDATA[ 914InputStream fin = Files.newInputStream(Paths.get("archive.tar.lz4")); 915BufferedInputStream in = new BufferedInputStream(fin); 916OutputStream out = Files.newOutputStream(Paths.get("archive.tar")); 917FramedLZ4CompressorInputStream zIn = new FramedLZ4CompressorInputStream(in); 918final byte[] buffer = new byte[buffersize]; 919int n = 0; 920while (-1 != (n = zIn.read(buffer))) { 921 out.write(buffer, 0, n); 922} 923out.close(); 924zIn.close(); 925]]></source> 926 927 <p>Compressing a given file using the LZ4 frame format (you would 928 certainly add exception handling and make sure all streams 929 get closed properly):</p> 930<source><![CDATA[ 931InputStream in = Files.newInputStream(Paths.get("archive.tar")); 932OutputStream fout = Files.newOutputStream(Paths.get("archive.tar.lz4")); 933BufferedOutputStream out = new BufferedOutputStream(fout); 934FramedLZ4CompressorOutputStream lzOut = new FramedLZ4CompressorOutputStream(out); 935final byte[] buffer = new byte[buffersize]; 936int n = 0; 937while (-1 != (n = in.read(buffer))) { 938 lzOut.write(buffer, 0, n); 939} 940lzOut.close(); 941in.close(); 942]]></source> 943 944 </subsection> 945 946 <subsection name="lzma"> 947 948 <p>The implementation of this package is provided by the 949 public domain <a href="https://tukaani.org/xz/java.html">XZ 950 for Java</a> library.</p> 951 952 <p>Uncompressing a given lzma compressed file (you would 953 certainly add exception handling and make sure all streams 954 get closed properly):</p> 955<source><![CDATA[ 956InputStream fin = Files.newInputStream(Paths.get("archive.tar.lzma")); 957BufferedInputStream in = new BufferedInputStream(fin); 958OutputStream out = Files.newOutputStream(Paths.get("archive.tar")); 959LZMACompressorInputStream lzmaIn = new LZMACompressorInputStream(in); 960final byte[] buffer = new byte[buffersize]; 961int n = 0; 962while (-1 != (n = xzIn.read(buffer))) { 963 out.write(buffer, 0, n); 964} 965out.close(); 966lzmaIn.close(); 967]]></source> 968 969 <p>Compressing a given file using lzma (you would 970 certainly add exception handling and make sure all streams 971 get closed properly):</p> 972<source><![CDATA[ 973InputStream in = Files.newInputStream(Paths.get("archive.tar")); 974OutputStream fout = Files.newOutputStream(Paths.get("archive.tar.lzma")); 975BufferedOutputStream out = new BufferedOutputStream(fout); 976LZMACompressorOutputStream lzOut = new LZMACompressorOutputStream(out); 977final byte[] buffer = new byte[buffersize]; 978int n = 0; 979while (-1 != (n = in.read(buffer))) { 980 lzOut.write(buffer, 0, n); 981} 982lzOut.close(); 983in.close(); 984]]></source> 985 986 </subsection> 987 988 <subsection name="Pack200"> 989 990 <p>The Pack200 package has a <a href="pack200.html">dedicated 991 documentation page</a>.</p> 992 993 <p>The implementation of this package is provided by 994 the <code>java.util.zip</code> package of the Java class 995 library.</p> 996 997 <p>Uncompressing a given pack200 compressed file (you would 998 certainly add exception handling and make sure all streams 999 get closed properly):</p> 1000<source><![CDATA[ 1001InputStream fin = Files.newInputStream(Paths.get("archive.pack")); 1002BufferedInputStream in = new BufferedInputStream(fin); 1003OutputStream out = Files.newOutputStream(Paths.get("archive.jar")); 1004Pack200CompressorInputStream pIn = new Pack200CompressorInputStream(in); 1005final byte[] buffer = new byte[buffersize]; 1006int n = 0; 1007while (-1 != (n = pIn.read(buffer))) { 1008 out.write(buffer, 0, n); 1009} 1010out.close(); 1011pIn.close(); 1012]]></source> 1013 1014 <p>Compressing a given jar using pack200 (you would 1015 certainly add exception handling and make sure all streams 1016 get closed properly):</p> 1017<source><![CDATA[ 1018InputStream in = Files.newInputStream(Paths.get("archive.jar")); 1019OutputStream fout = Files.newOutputStream(Paths.get("archive.pack")); 1020BufferedOutputStream out = new BufferedInputStream(fout); 1021Pack200CompressorOutputStream pOut = new Pack200CompressorOutputStream(out); 1022final byte[] buffer = new byte[buffersize]; 1023int n = 0; 1024while (-1 != (n = in.read(buffer))) { 1025 pOut.write(buffer, 0, n); 1026} 1027pOut.close(); 1028in.close(); 1029]]></source> 1030 1031 </subsection> 1032 1033 <subsection name="Snappy"> 1034 1035 <p>There are two different "formats" used for <a 1036 href="https://github.com/google/snappy/">Snappy</a>, one only 1037 contains the raw compressed data while the other provides a 1038 higher level "framing format" - Commons Compress offers two 1039 different stream classes for reading either format.</p> 1040 1041 <p>Starting with 1.12 we've added support for different 1042 dialects of the framing format that can be specified when 1043 constructing the stream. The <code>STANDARD</code> dialect 1044 follows the "framing format" specification while the 1045 <code>IWORK_ARCHIVE</code> dialect can be used to parse IWA 1046 files that are part of Apple's iWork 13 format. If no dialect 1047 has been specified, <code>STANDARD</code> is used. Only the 1048 <code>STANDARD</code> format can be detected by 1049 <code>CompressorStreamFactory</code>.</p> 1050 1051 <p>Uncompressing a given framed Snappy file (you would 1052 certainly add exception handling and make sure all streams 1053 get closed properly):</p> 1054<source><![CDATA[ 1055InputStream fin = Files.newInputStream(Paths.get("archive.tar.sz")); 1056BufferedInputStream in = new BufferedInputStream(fin); 1057OutputStream out = Files.newOutputStream(Paths.get("archive.tar")); 1058FramedSnappyCompressorInputStream zIn = new FramedSnappyCompressorInputStream(in); 1059final byte[] buffer = new byte[buffersize]; 1060int n = 0; 1061while (-1 != (n = zIn.read(buffer))) { 1062 out.write(buffer, 0, n); 1063} 1064out.close(); 1065zIn.close(); 1066]]></source> 1067 1068 <p>Compressing a given file using framed Snappy (you would 1069 certainly add exception handling and make sure all streams 1070 get closed properly):</p> 1071<source><![CDATA[ 1072InputStream in = Files.newInputStream(Paths.get("archive.tar")); 1073OutputStream fout = Files.newOutputStream(Paths.get("archive.tar.sz")); 1074BufferedOutputStream out = new BufferedOutputStream(fout); 1075FramedSnappyCompressorOutputStream snOut = new FramedSnappyCompressorOutputStream(out); 1076final byte[] buffer = new byte[buffersize]; 1077int n = 0; 1078while (-1 != (n = in.read(buffer))) { 1079 snOut.write(buffer, 0, n); 1080} 1081snOut.close(); 1082in.close(); 1083]]></source> 1084 1085 </subsection> 1086 1087 <subsection name="XZ"> 1088 1089 <p>The implementation of this package is provided by the 1090 public domain <a href="https://tukaani.org/xz/java.html">XZ 1091 for Java</a> library.</p> 1092 1093 <p>When you try to open an XZ stream for reading using 1094 <code>CompressorStreamFactory</code>, Commons Compress will 1095 check whether the XZ for Java library is available. Starting 1096 with Compress 1.9 the result of this check will be cached 1097 unless Compress finds OSGi classes in its classpath. You can 1098 use <code>XZUtils#setCacheXZAvailability</code> to overrride 1099 this default behavior.</p> 1100 1101 <p>Uncompressing a given XZ compressed file (you would 1102 certainly add exception handling and make sure all streams 1103 get closed properly):</p> 1104<source><![CDATA[ 1105InputStream fin = Files.newInputStream(Paths.get("archive.tar.xz")); 1106BufferedInputStream in = new BufferedInputStream(fin); 1107OutputStream out = Files.newOutputStream(Paths.get("archive.tar")); 1108XZCompressorInputStream xzIn = new XZCompressorInputStream(in); 1109final byte[] buffer = new byte[buffersize]; 1110int n = 0; 1111while (-1 != (n = xzIn.read(buffer))) { 1112 out.write(buffer, 0, n); 1113} 1114out.close(); 1115xzIn.close(); 1116]]></source> 1117 1118 <p>Compressing a given file using XZ (you would 1119 certainly add exception handling and make sure all streams 1120 get closed properly):</p> 1121<source><![CDATA[ 1122InputStream in = Files.newInputStream(Paths.get("archive.tar")); 1123OutputStream fout = Files.newOutputStream(Paths.get("archive.tar.xz")); 1124BufferedOutputStream out = new BufferedInputStream(fout); 1125XZCompressorOutputStream xzOut = new XZCompressorOutputStream(out); 1126final byte[] buffer = new byte[buffersize]; 1127int n = 0; 1128while (-1 != (n = in.read(buffer))) { 1129 xzOut.write(buffer, 0, n); 1130} 1131xzOut.close(); 1132in.close(); 1133]]></source> 1134 1135 </subsection> 1136 1137 <subsection name="Z"> 1138 1139 <p>Uncompressing a given Z compressed file (you would 1140 certainly add exception handling and make sure all streams 1141 get closed properly):</p> 1142<source><![CDATA[ 1143InputStream fin = Files.newInputStream(Paths.get("archive.tar.Z")); 1144BufferedInputStream in = new BufferedInputStream(fin); 1145OutputStream out = Files.newOutputStream(Paths.get("archive.tar")); 1146ZCompressorInputStream zIn = new ZCompressorInputStream(in); 1147final byte[] buffer = new byte[buffersize]; 1148int n = 0; 1149while (-1 != (n = zIn.read(buffer))) { 1150 out.write(buffer, 0, n); 1151} 1152out.close(); 1153zIn.close(); 1154]]></source> 1155 1156 </subsection> 1157 1158 <subsection name="Zstandard"> 1159 1160 <p>The implementation of this package is provided by the 1161 <a href="https://github.com/luben/zstd-jni">Zstandard JNI</a> library.</p> 1162 1163 <p>Uncompressing a given Zstandard compressed file (you would 1164 certainly add exception handling and make sure all streams 1165 get closed properly):</p> 1166<source><![CDATA[ 1167InputStream fin = Files.newInputStream(Paths.get("archive.tar.zstd")); 1168BufferedInputStream in = new BufferedInputStream(fin); 1169OutputStream out = Files.newOutputStream(Paths.get("archive.tar")); 1170ZstdCompressorInputStream zsIn = new ZstdCompressorInputStream(in); 1171final byte[] buffer = new byte[buffersize]; 1172int n = 0; 1173while (-1 != (n = zsIn.read(buffer))) { 1174 out.write(buffer, 0, n); 1175} 1176out.close(); 1177zsIn.close(); 1178]]></source> 1179 1180 <p>Compressing a given file using the Zstandard format (you 1181 would certainly add exception handling and make sure all 1182 streams get closed properly):</p> 1183<source><![CDATA[ 1184InputStream in = Files.newInputStream(Paths.get("archive.tar")); 1185OutputStream fout = Files.newOutputStream(Paths.get("archive.tar.zstd")); 1186BufferedOutputStream out = new BufferedOutputStream(fout); 1187ZstdCompressorOutputStream zOut = new ZstdCompressorOutputStream(out); 1188final byte[] buffer = new byte[buffersize]; 1189int n = 0; 1190while (-1 != (n = in.read(buffer))) { 1191 zOut.write(buffer, 0, n); 1192} 1193zOut.close(); 1194in.close(); 1195]]></source> 1196 1197 </subsection> 1198 </section> 1199 1200 <section name="Extending Commons Compress"> 1201 1202 <p> 1203 Starting in release 1.13, it is now possible to add Compressor- and ArchiverStream implementations using the 1204 Java's <a href="https://docs.oracle.com/javase/7/docs/api/java/util/ServiceLoader.html">ServiceLoader</a> 1205 mechanism. 1206 </p> 1207 1208 <subsection name="Extending Commons Compress Compressors"> 1209 1210 <p> 1211 To provide your own compressor, you must make available on the classpath a file called 1212 <code>META-INF/services/org.apache.commons.compress.compressors.CompressorStreamProvider</code>. 1213 </p> 1214 <p> 1215 This file MUST contain one fully-qualified class name per line. 1216 </p> 1217 <p> 1218 For example: 1219 </p> 1220 <pre>org.apache.commons.compress.compressors.TestCompressorStreamProvider</pre> 1221 <p> 1222 This class MUST implement the Commons Compress interface 1223 <a href="apidocs/org/apache/commons/compress/compressors/CompressorStreamProvider.html">org.apache.commons.compress.compressors.CompressorStreamProvider</a>. 1224 </p> 1225 </subsection> 1226 1227 <subsection name="Extending Commons Compress Archivers"> 1228 1229 <p> 1230 To provide your own compressor, you must make available on the classpath a file called 1231 <code>META-INF/services/org.apache.commons.compress.archivers.ArchiveStreamProvider</code>. 1232 </p> 1233 <p> 1234 This file MUST contain one fully-qualified class name per line. 1235 </p> 1236 <p> 1237 For example: 1238 </p> 1239 <pre>org.apache.commons.compress.archivers.TestArchiveStreamProvider</pre> 1240 <p> 1241 This class MUST implement the Commons Compress interface 1242 <a href="apidocs/org/apache/commons/compress/archivers/ArchiveStreamProvider.html">org.apache.commons.compress.archivers.ArchiveStreamProvider</a>. 1243 </p> 1244 </subsection> 1245 1246 </section> 1247 </body> 1248</document> 1249