1<?xml version="1.0"?> 2<!-- 3 4 Licensed to the Apache Software Foundation (ASF) under one or more 5 contributor license agreements. See the NOTICE file distributed with 6 this work for additional information regarding copyright ownership. 7 The ASF licenses this file to You under the Apache License, Version 2.0 8 (the "License"); you may not use this file except in compliance with 9 the License. You may obtain a copy of the License at 10 11 http://www.apache.org/licenses/LICENSE-2.0 12 13 Unless required by applicable law or agreed to in writing, software 14 distributed under the License is distributed on an "AS IS" BASIS, 15 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 16 See the License for the specific language governing permissions and 17 limitations under the License. 18 19--> 20<document> 21 <properties> 22 <title>Commons Compress TAR package</title> 23 <author email="dev@commons.apache.org">Commons Documentation Team</author> 24 </properties> 25 <body> 26 <section name="The TAR package"> 27 28 <p>In addition to the information stored 29 in <code>ArchiveEntry</code> a <code>TarArchiveEntry</code> 30 stores various attributes including information about the 31 original owner and permissions.</p> 32 33 <p>There are several different dialects of the TAR format, maybe 34 even different TAR formats. The tar package contains special 35 cases in order to read many of the existing dialects and will by 36 default try to create archives in the original format (often 37 called "ustar"). This original format didn't support file names 38 longer than 100 characters or bigger than 8 GiB and the tar 39 package will by default fail if you try to write an entry that 40 goes beyond those limits. "ustar" is the common denominator of 41 all the existing tar dialects and is understood by most of the 42 existing tools.</p> 43 44 <p>The tar package does not support the full POSIX tar standard 45 nor more modern GNU extension of said standard.</p> 46 47 <subsection name="Long File Names"> 48 49 <p>The <code>longFileMode</code> option of 50 <code>TarArchiveOutputStream</code> controls how files with 51 names longer than 100 characters are handled. The possible 52 choices are:</p> 53 54 <ul> 55 <li><code>LONGFILE_ERROR</code>: throw an exception if such a 56 file is added. This is the default.</li> 57 <li><code>LONGFILE_TRUNCATE</code>: truncate such names.</li> 58 <li><code>LONGFILE_GNU</code>: use a GNU tar variant now 59 refered to as "oldgnu" of storing such names. If you choose 60 the GNU tar option, the archive can not be extracted using 61 many other tar implementations like the ones of OpenBSD, 62 Solaris or MacOS X.</li> 63 <li><code>LONGFILE_POSIX</code>: use a PAX <a 64 href="http://pubs.opengroup.org/onlinepubs/009695399/utilities/pax.html#tag_04_100_13_03">extended 65 header</a> as defined by POSIX 1003.1. Most modern tar 66 implementations are able to extract such archives. <em>since 67 Commons Compress 1.4</em></li> 68 </ul> 69 70 <p><code>TarArchiveInputStream</code> will recognize the GNU 71 tar as well as the POSIX extensions (starting with Commons 72 Compress 1.2) for long file names and reads the longer names 73 transparently.</p> 74 </subsection> 75 76 <subsection name="Big Numeric Values"> 77 78 <p>The <code>bigNumberMode</code> option of 79 <code>TarArchiveOutputStream</code> controls how files larger 80 than 8GiB or with other big numeric values that can't be 81 encoded in traditional header fields are handled. The 82 possible choices are:</p> 83 84 <ul> 85 <li><code>BIGNUMBER_ERROR</code>: throw an exception if such an 86 entry is added. This is the default.</li> 87 <li><code>BIGNUMBER_STAR</code>: use a variant first 88 introduced by Jörg Schilling's <a 89 href="http://developer.berlios.de/projects/star">star</a> 90 and later adopted by GNU and BSD tar. This method is not 91 supported by all implementations.</li> 92 <li><code>BIGNUMBER_POSIX</code>: use a PAX <a 93 href="http://pubs.opengroup.org/onlinepubs/009695399/utilities/pax.html#tag_04_100_13_03">extended 94 header</a> as defined by POSIX 1003.1. Most modern tar 95 implementations are able to extract such archives.</li> 96 </ul> 97 98 <p>Starting with Commons Compress 1.4 99 <code>TarArchiveInputStream</code> will recognize the star as 100 well as the POSIX extensions for big numeric values and reads them 101 transparently.</p> 102 </subsection> 103 104 <subsection name="File Name Encoding"> 105 <p>The original ustar format only supports 7-Bit ASCII file 106 names, later implementations use the platform's default 107 encoding to encode file names. The POSIX standard recommends 108 using PAX extension headers for non-ASCII file names 109 instead.</p> 110 111 <p>Commons Compress 1.1 to 1.3 assumed file names would be 112 encoded using ISO-8859-1. Starting with Commons Compress 1.4 113 you can specify the encoding to expect (to use when writing) 114 as a parameter to <code>TarArchiveInputStream</code> 115 (<code>TarArchiveOutputStream</code>), it now defaults to the 116 platform's default encoding.</p> 117 118 <p>Since Commons Compress 1.4 another optional parameter - 119 <code>addPaxHeadersForNonAsciiNames</code> - of 120 <code>TarArchiveOutputStream</code> controls whether PAX 121 extension headers will be written for non-ASCII file names. 122 By default they will not be written to preserve space. 123 <code>TarArchiveInputStream</code> will read them 124 transparently if present.</p> 125 </subsection> 126 127 <subsection name="Sparse files"> 128 129 <p><code>TarArchiveInputStream</code> will recognize sparse 130 file entries stored using the "oldgnu" format 131 (<code>--sparse-version=0.0</code> in GNU tar) but is not 132 able to extract them correctly. <a href="#Unsupported 133 Features"><code>canReadEntryData</code></a> will return false 134 on such entries. The other variants of sparse files can 135 currently not be detected at all.</p> 136 </subsection> 137 138 <subsection name="Consuming Archives Completely"> 139 140 <p>The end of a tar archive is signalled by two consecutive 141 records of all zeros. Unfortunately not all tar 142 implementations adhere to this and some only write one record 143 to end the archive. Commons Compress will always write two 144 records but stop reading an archive as soon as finds one 145 record of all zeros.</p> 146 147 <p>Prior to version 1.5 this could leave the second EOF record 148 inside the stream when <code>getNextEntry</code> or 149 <code>getNextTarEntry</code> returned <code>null</code> 150 Starting with version 1.5 <code>TarArchiveInputStream</code> 151 will try to read a second record as well if present, 152 effectively consuming the archive completely.</p> 153 154 </subsection> 155 156 <subsection name="PAX Extended Header"> 157 <p>The tar package has supported reading PAX extended headers 158 since 1.3 for local headers and 1.11 for global headers. The 159 following entries of PAX headers are applied when reading:</p> 160 161 <dl> 162 <dt>path</dt> 163 <dd>set the entry's name</dd> 164 165 <dt>linkpath</dt> 166 <dd>set the entry's link name</dd> 167 168 <dt>gid</dt> 169 <dd>set the entry's group id</dd> 170 171 <dt>gname</dt> 172 <dd>set the entry's group name</dd> 173 174 <dt>uid</dt> 175 <dd>set the entry's user id</dd> 176 177 <dt>uname</dt> 178 <dd>set the entry's user name</dd> 179 180 <dt>size</dt> 181 <dd>set the entry's size</dd> 182 183 <dt>mtime</dt> 184 <dd>set the entry's modification time</dd> 185 186 <dt>SCHILY.devminor</dt> 187 <dd>set the entry's minor device number</dd> 188 189 <dt>SCHILY.devmajor</dt> 190 <dd>set the entry's major device number</dd> 191 </dl> 192 193 <p>in addition some fields used by GNU tar and star used to 194 signal sparse entries are supported and are used for the 195 <code>is*GNUSparse</code> and <code>isStarSparse</code> 196 methods.</p> 197 198 <p>Some PAX extra headers may be set when writing archives, 199 for example for non-ASCII names or big numeric values. This 200 depends on various setting of the output stream - see the 201 previous sections.</p> 202 203 <p>Since 1.15 you can directly access all PAX extension 204 headers that have been found when reading an entry or specify 205 extra headers to be written to a (local) PAX extended header 206 entry.</p> 207 208 <p>Some hints if you try to set extended headers:</p> 209 210 <ul> 211 <li>pax header keywords should be ascii. star/gnutar 212 (SCHILY.xattr.* ) do not check for this. libarchive/bsdtar 213 (LIBARCHIVE.xattr.*) uses URL-Encoding.</li> 214 <li>pax header values should be encoded as UTF-8 characters 215 (including trailing <code>\0</code>). star/gnutar 216 (SCHILY.xattr.*) do not check for this. libarchive/bsdtar 217 (LIBARCHIVE.xattr.*) encode values using Base64.</li> 218 <li>libarchive/bsdtar will read SCHILY.xattr headers, but 219 will not generate them.</li> 220 <li>gnutar will complain about LIBARCHIVE.xattr (and any 221 other unknown) headers and will neither encode nor decode 222 them.</li> 223 </ul> 224 </subsection> 225 226 </section> 227 </body> 228</document> 229