1SQUASHFS 4.0 FILESYSTEM 2======================= 3 4Squashfs is a compressed read-only filesystem for Linux. 5It uses zlib compression to compress files, inodes and directories. 6Inodes in the system are very small and all blocks are packed to minimise 7data overhead. Block sizes greater than 4K are supported up to a maximum 8of 1Mbytes (default block size 128K). 9 10Squashfs is intended for general read-only filesystem use, for archival 11use (i.e. in cases where a .tar.gz file may be used), and in constrained 12block device/memory systems (e.g. embedded systems) where low overhead is 13needed. 14 15Mailing list: squashfs-devel@lists.sourceforge.net 16Web site: www.squashfs.org 17 181. FILESYSTEM FEATURES 19---------------------- 20 21Squashfs filesystem features versus Cramfs: 22 23 Squashfs Cramfs 24 25Max filesystem size: 2^64 16 MiB 26Max file size: ~ 2 TiB 16 MiB 27Max files: unlimited unlimited 28Max directories: unlimited unlimited 29Max entries per directory: unlimited unlimited 30Max block size: 1 MiB 4 KiB 31Metadata compression: yes no 32Directory indexes: yes no 33Sparse file support: yes no 34Tail-end packing (fragments): yes no 35Exportable (NFS etc.): yes no 36Hard link support: yes no 37"." and ".." in readdir: yes no 38Real inode numbers: yes no 3932-bit uids/gids: yes no 40File creation time: yes no 41Xattr and ACL support: no no 42 43Squashfs compresses data, inodes and directories. In addition, inode and 44directory data are highly compacted, and packed on byte boundaries. Each 45compressed inode is on average 8 bytes in length (the exact length varies on 46file type, i.e. regular file, directory, symbolic link, and block/char device 47inodes have different sizes). 48 492. USING SQUASHFS 50----------------- 51 52As squashfs is a read-only filesystem, the mksquashfs program must be used to 53create populated squashfs filesystems. This and other squashfs utilities 54can be obtained from http://www.squashfs.org. Usage instructions can be 55obtained from this site also. 56 57 583. SQUASHFS FILESYSTEM DESIGN 59----------------------------- 60 61A squashfs filesystem consists of seven parts, packed together on a byte 62alignment: 63 64 --------------- 65 | superblock | 66 |---------------| 67 | datablocks | 68 | & fragments | 69 |---------------| 70 | inode table | 71 |---------------| 72 | directory | 73 | table | 74 |---------------| 75 | fragment | 76 | table | 77 |---------------| 78 | export | 79 | table | 80 |---------------| 81 | uid/gid | 82 | lookup table | 83 --------------- 84 85Compressed data blocks are written to the filesystem as files are read from 86the source directory, and checked for duplicates. Once all file data has been 87written the completed inode, directory, fragment, export and uid/gid lookup 88tables are written. 89 903.1 Inodes 91---------- 92 93Metadata (inodes and directories) are compressed in 8Kbyte blocks. Each 94compressed block is prefixed by a two byte length, the top bit is set if the 95block is uncompressed. A block will be uncompressed if the -noI option is set, 96or if the compressed block was larger than the uncompressed block. 97 98Inodes are packed into the metadata blocks, and are not aligned to block 99boundaries, therefore inodes overlap compressed blocks. Inodes are identified 100by a 48-bit number which encodes the location of the compressed metadata block 101containing the inode, and the byte offset into that block where the inode is 102placed (<block, offset>). 103 104To maximise compression there are different inodes for each file type 105(regular file, directory, device, etc.), the inode contents and length 106varying with the type. 107 108To further maximise compression, two types of regular file inode and 109directory inode are defined: inodes optimised for frequently occurring 110regular files and directories, and extended types where extra 111information has to be stored. 112 1133.2 Directories 114--------------- 115 116Like inodes, directories are packed into compressed metadata blocks, stored 117in a directory table. Directories are accessed using the start address of 118the metablock containing the directory and the offset into the 119decompressed block (<block, offset>). 120 121Directories are organised in a slightly complex way, and are not simply 122a list of file names. The organisation takes advantage of the 123fact that (in most cases) the inodes of the files will be in the same 124compressed metadata block, and therefore, can share the start block. 125Directories are therefore organised in a two level list, a directory 126header containing the shared start block value, and a sequence of directory 127entries, each of which share the shared start block. A new directory header 128is written once/if the inode start block changes. The directory 129header/directory entry list is repeated as many times as necessary. 130 131Directories are sorted, and can contain a directory index to speed up 132file lookup. Directory indexes store one entry per metablock, each entry 133storing the index/filename mapping to the first directory header 134in each metadata block. Directories are sorted in alphabetical order, 135and at lookup the index is scanned linearly looking for the first filename 136alphabetically larger than the filename being looked up. At this point the 137location of the metadata block the filename is in has been found. 138The general idea of the index is ensure only one metadata block needs to be 139decompressed to do a lookup irrespective of the length of the directory. 140This scheme has the advantage that it doesn't require extra memory overhead 141and doesn't require much extra storage on disk. 142 1433.3 File data 144------------- 145 146Regular files consist of a sequence of contiguous compressed blocks, and/or a 147compressed fragment block (tail-end packed block). The compressed size 148of each datablock is stored in a block list contained within the 149file inode. 150 151To speed up access to datablocks when reading 'large' files (256 Mbytes or 152larger), the code implements an index cache that caches the mapping from 153block index to datablock location on disk. 154 155The index cache allows Squashfs to handle large files (up to 1.75 TiB) while 156retaining a simple and space-efficient block list on disk. The cache 157is split into slots, caching up to eight 224 GiB files (128 KiB blocks). 158Larger files use multiple slots, with 1.75 TiB files using all 8 slots. 159The index cache is designed to be memory efficient, and by default uses 16016 KiB. 161 1623.4 Fragment lookup table 163------------------------- 164 165Regular files can contain a fragment index which is mapped to a fragment 166location on disk and compressed size using a fragment lookup table. This 167fragment lookup table is itself stored compressed into metadata blocks. 168A second index table is used to locate these. This second index table for 169speed of access (and because it is small) is read at mount time and cached 170in memory. 171 1723.5 Uid/gid lookup table 173------------------------ 174 175For space efficiency regular files store uid and gid indexes, which are 176converted to 32-bit uids/gids using an id look up table. This table is 177stored compressed into metadata blocks. A second index table is used to 178locate these. This second index table for speed of access (and because it 179is small) is read at mount time and cached in memory. 180 1813.6 Export table 182---------------- 183 184To enable Squashfs filesystems to be exportable (via NFS etc.) filesystems 185can optionally (disabled with the -no-exports Mksquashfs option) contain 186an inode number to inode disk location lookup table. This is required to 187enable Squashfs to map inode numbers passed in filehandles to the inode 188location on disk, which is necessary when the export code reinstantiates 189expired/flushed inodes. 190 191This table is stored compressed into metadata blocks. A second index table is 192used to locate these. This second index table for speed of access (and because 193it is small) is read at mount time and cached in memory. 194 195 1964. TODOS AND OUTSTANDING ISSUES 197------------------------------- 198 1994.1 Todo list 200------------- 201 202Implement Xattr and ACL support. The Squashfs 4.0 filesystem layout has hooks 203for these but the code has not been written. Once the code has been written 204the existing layout should not require modification. 205 2064.2 Squashfs internal cache 207--------------------------- 208 209Blocks in Squashfs are compressed. To avoid repeatedly decompressing 210recently accessed data Squashfs uses two small metadata and fragment caches. 211 212The cache is not used for file datablocks, these are decompressed and cached in 213the page-cache in the normal way. The cache is used to temporarily cache 214fragment and metadata blocks which have been read as a result of a metadata 215(i.e. inode or directory) or fragment access. Because metadata and fragments 216are packed together into blocks (to gain greater compression) the read of a 217particular piece of metadata or fragment will retrieve other metadata/fragments 218which have been packed with it, these because of locality-of-reference may be 219read in the near future. Temporarily caching them ensures they are available 220for near future access without requiring an additional read and decompress. 221 222In the future this internal cache may be replaced with an implementation which 223uses the kernel page cache. Because the page cache operates on page sized 224units this may introduce additional complexity in terms of locking and 225associated race conditions. 226