1Overview 2======== 3 4EROFS file-system stands for Enhanced Read-Only File System. Different 5from other read-only file systems, it aims to be designed for flexibility, 6scalability, but be kept simple and high performance. 7 8It is designed as a better filesystem solution for the following scenarios: 9 - read-only storage media or 10 11 - part of a fully trusted read-only solution, which means it needs to be 12 immutable and bit-for-bit identical to the official golden image for 13 their releases due to security and other considerations and 14 15 - hope to save some extra storage space with guaranteed end-to-end performance 16 by using reduced metadata and transparent file compression, especially 17 for those embedded devices with limited memory (ex, smartphone); 18 19Here is the main features of EROFS: 20 - Little endian on-disk design; 21 22 - Currently 4KB block size (nobh) and therefore maximum 16TB address space; 23 24 - Metadata & data could be mixed by design; 25 26 - 2 inode versions for different requirements: 27 v1 v2 28 Inode metadata size: 32 bytes 64 bytes 29 Max file size: 4 GB 16 EB (also limited by max. vol size) 30 Max uids/gids: 65536 4294967296 31 File creation time: no yes (64 + 32-bit timestamp) 32 Max hardlinks: 65536 4294967296 33 Metadata reserved: 4 bytes 14 bytes 34 35 - Support extended attributes (xattrs) as an option; 36 37 - Support xattr inline and tail-end data inline for all files; 38 39 - Support POSIX.1e ACLs by using xattrs; 40 41 - Support transparent file compression as an option: 42 LZ4 algorithm with 4 KB fixed-output compression for high performance; 43 44The following git tree provides the file system user-space tools under 45development (ex, formatting tool mkfs.erofs): 46>> git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git 47 48Bugs and patches are welcome, please kindly help us and send to the following 49linux-erofs mailing list: 50>> linux-erofs mailing list <linux-erofs@lists.ozlabs.org> 51 52Mount options 53============= 54 55(no)user_xattr Setup Extended User Attributes. Note: xattr is enabled 56 by default if CONFIG_EROFS_FS_XATTR is selected. 57(no)acl Setup POSIX Access Control List. Note: acl is enabled 58 by default if CONFIG_EROFS_FS_POSIX_ACL is selected. 59cache_strategy=%s Select a strategy for cached decompression from now on: 60 disabled: In-place I/O decompression only; 61 readahead: Cache the last incomplete compressed physical 62 cluster for further reading. It still does 63 in-place I/O decompression for the rest 64 compressed physical clusters; 65 readaround: Cache the both ends of incomplete compressed 66 physical clusters for further reading. 67 It still does in-place I/O decompression 68 for the rest compressed physical clusters. 69 70On-disk details 71=============== 72 73Summary 74------- 75Different from other read-only file systems, an EROFS volume is designed 76to be as simple as possible: 77 78 |-> aligned with the block size 79 ____________________________________________________________ 80 | |SB| | ... | Metadata | ... | Data | Metadata | ... | Data | 81 |_|__|_|_____|__________|_____|______|__________|_____|______| 82 0 +1K 83 84All data areas should be aligned with the block size, but metadata areas 85may not. All metadatas can be now observed in two different spaces (views): 86 1. Inode metadata space 87 Each valid inode should be aligned with an inode slot, which is a fixed 88 value (32 bytes) and designed to be kept in line with v1 inode size. 89 90 Each inode can be directly found with the following formula: 91 inode offset = meta_blkaddr * block_size + 32 * nid 92 93 |-> aligned with 8B 94 |-> followed closely 95 + meta_blkaddr blocks |-> another slot 96 _____________________________________________________________________ 97 | ... | inode | xattrs | extents | data inline | ... | inode ... 98 |________|_______|(optional)|(optional)|__(optional)_|_____|__________ 99 |-> aligned with the inode slot size 100 . . 101 . . 102 . . 103 . . 104 . . 105 . . 106 .____________________________________________________|-> aligned with 4B 107 | xattr_ibody_header | shared xattrs | inline xattrs | 108 |____________________|_______________|_______________| 109 |-> 12 bytes <-|->x * 4 bytes<-| . 110 . . . 111 . . . 112 . . . 113 ._______________________________.______________________. 114 | id | id | id | id | ... | id | ent | ... | ent| ... | 115 |____|____|____|____|______|____|_____|_____|____|_____| 116 |-> aligned with 4B 117 |-> aligned with 4B 118 119 Inode could be 32 or 64 bytes, which can be distinguished from a common 120 field which all inode versions have -- i_advise: 121 122 __________________ __________________ 123 | i_advise | | i_advise | 124 |__________________| |__________________| 125 | ... | | ... | 126 | | | | 127 |__________________| 32 bytes | | 128 | | 129 |__________________| 64 bytes 130 131 Xattrs, extents, data inline are followed by the corresponding inode with 132 proper alignes, and they could be optional for different data mappings, 133 _currently_ there are totally 3 valid data mappings supported: 134 135 1) flat file data without data inline (no extent); 136 2) fixed-output size data compression (must have extents); 137 3) flat file data with tail-end data inline (no extent); 138 139 The size of the optional xattrs is indicated by i_xattr_count in inode 140 header. Large xattrs or xattrs shared by many different files can be 141 stored in shared xattrs metadata rather than inlined right after inode. 142 143 2. Shared xattrs metadata space 144 Shared xattrs space is similar to the above inode space, started with 145 a specific block indicated by xattr_blkaddr, organized one by one with 146 proper align. 147 148 Each share xattr can also be directly found by the following formula: 149 xattr offset = xattr_blkaddr * block_size + 4 * xattr_id 150 151 |-> aligned by 4 bytes 152 + xattr_blkaddr blocks |-> aligned with 4 bytes 153 _________________________________________________________________________ 154 | ... | xattr_entry | xattr data | ... | xattr_entry | xattr data ... 155 |________|_____________|_____________|_____|______________|_______________ 156 157Directories 158----------- 159All directories are now organized in a compact on-disk format. Note that 160each directory block is divided into index and name areas in order to support 161random file lookup, and all directory entries are _strictly_ recorded in 162alphabetical order in order to support improved prefix binary search 163algorithm (could refer to the related source code). 164 165 ___________________________ 166 / | 167 / ______________|________________ 168 / / | nameoff1 | nameoffN-1 169 ____________.______________._______________v________________v__________ 170| dirent | dirent | ... | dirent | filename | filename | ... | filename | 171|___.0___|____1___|_____|___N-1__|____0_____|____1_____|_____|___N-1____| 172 \ ^ 173 \ | * could have 174 \ | trailing '\0' 175 \________________________| nameoff0 176 177 Directory block 178 179Note that apart from the offset of the first filename, nameoff0 also indicates 180the total number of directory entries in this block since it is no need to 181introduce another on-disk field at all. 182 183Compression 184----------- 185Currently, EROFS supports 4KB fixed-output clustersize transparent file 186compression, as illustrated below: 187 188 |---- Variant-Length Extent ----|-------- VLE --------|----- VLE ----- 189 clusterofs clusterofs clusterofs 190 | | | logical data 191_________v_______________________________v_____________________v_______________ 192... | . | | . | | . | ... 193____|____.________|_____________|________.____|_____________|__.__________|____ 194 |-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-| 195 size size size size size 196 . . . . 197 . . . . 198 . . . . 199 _______._____________._____________._____________._____________________ 200 ... | | | | ... physical data 201 _______|_____________|_____________|_____________|_____________________ 202 |-> cluster <-|-> cluster <-|-> cluster <-| 203 size size size 204 205Currently each on-disk physical cluster can contain 4KB (un)compressed data 206at most. For each logical cluster, there is a corresponding on-disk index to 207describe its cluster type, physical cluster address, etc. 208 209See "struct z_erofs_vle_decompressed_index" in erofs_fs.h for more details. 210 211