Lines Matching +full:sub +full:- +full:block
1 .. SPDX-License-Identifier: GPL-2.0
4 ZoneFS - Zone filesystem for Zoned block devices
10 zonefs is a very simple file system exposing each zone of a zoned block device
11 as a file. Unlike a regular POSIX-compliant file system with native zoned block
13 constraint of zoned block devices to the user. Files representing sequential
17 As such, zonefs is in essence closer to a raw block device access interface
18 than to a full-featured POSIX file system. The goal of zonefs is to simplify
19 the implementation of zoned block device support in applications by replacing
20 raw block device file accesses with a richer file API, avoiding relying on
21 direct block device file ioctls which may be more obscure to developers. One
22 example of this approach is the implementation of LSM (log-structured merge)
23 tree structures (such as used in RocksDB and LevelDB) on zoned block devices
30 Zoned block devices
31 -------------------
39 regular block device.
49 Block Commands (ZBC) and Zoned ATA Commands (ZAC) interfaces on Shingled
60 Zonefs exposes the zones of a zoned block device as files. The files
62 by sub-directories. This file structure is built entirely using zone information
63 provided by the device and so does not require any complex on-disk metadata
66 On-disk metadata
67 ----------------
69 zonefs on-disk metadata is reduced to an immutable super block which
76 The super block is always written on disk at sector 0. The first zone of the
77 device storing the super block is never exposed as a zone file by zonefs. If
78 the zone containing the super block is a sequential zone, the mkzonefs format
80 state to make it read-only, preventing any data write.
82 Zone type sub-directories
83 -------------------------
86 sub-directory automatically created on mount.
88 For conventional zones, the sub-directory "cnv" is used. This directory is
91 be exposed as a file as it will be used to store the zonefs super block. For
92 such devices, the "cnv" sub-directory will not be created.
94 For sequential write zones, the sub-directory "seq" is used.
98 "seq" sub-directories.
105 ----------
114 capacity is failed with the -EFBIG error.
117 sub-directories is not allowed.
123 -----------------------
133 ---------------------
135 The size of sequential zone files grouped in the "seq" sub-directory represents
142 write issued and still in-flight (for asynchronous I/O operations).
148 implemented by the block layer elevator. An elevator implementing the sequential
149 write feature for zoned block device (ELEVATOR_F_ZBD_SEQ_WRITE elevator feature)
150 must be used. This type of elevator (e.g. mq-deadline) is set by default
151 for zoned block devices on device initialization.
163 --------------
174 -----------------
176 Zoned block devices may fail I/O requests for reasons similar to regular block
178 failure pattern, the standards governing zoned block devices behavior define
181 * A zone may transition to the read-only condition (BLK_ZONE_COND_READONLY):
185 state. While the reasons for the device to transition a zone to read-only
188 changed to read-only).
192 offline zone back to an operational good state. Similarly to zone read-only
194 condition are undefined. A typical cause would be a defective read-write head
208 * Delayed write errors: similarly to regular block devices, if the device side
228 * A zone condition change to read-only or offline also always triggers zonefs
237 the file zone. For instance, the partial failure of a multi-BIO large write
250 A zone condition change to read-only is indicated with a change in the file
251 access permissions to render the file read-only. This disables changes to the
260 +--------------+-----------+-----------------------------------------+
265 +--------------+-----------+-----------------------------------------+
267 | remount-ro | read-only | as is yes no yes no |
269 +--------------+-----------+-----------------------------------------+
271 | zone-ro | read-only | as is yes no yes no |
273 +--------------+-----------+-----------------------------------------+
275 | zone-offline | read-only | 0 no no yes no |
277 +--------------+-----------+-----------------------------------------+
279 | repair | read-only | as is yes no yes no |
281 +--------------+-----------+-----------------------------------------+
285 * The "errors=remount-ro" mount option is the default behavior of zonefs I/O
287 * With the "errors=remount-ro" mount option, the change of the file access
288 permissions to read-only applies to all files. The file system is remounted
289 read-only.
294 * File access permission changes to read-only due to the device transitioning
295 zones to the read-only condition are permanent. Remounting or reformatting
296 the device will not re-enable file write access.
297 * File access permission changes implied by the remount-ro, zone-ro and
298 zone-offline mount options are temporary for zones in a good condition.
303 indicated as being read-only or offline by the device still imply changes to
307 -------------
313 * remount-ro (default)
314 * zone-ro
315 * zone-offline
318 The run-time I/O error actions defined for each behavior are detailed in the
320 The handling of read-only zones also differs between mount-time and run-time.
321 If a read-only zone is found at mount time, the zone is always treated in the
323 file size set to 0. This is necessary as the write pointer of read-only zones
326 read-only zone discovered at run-time, as indicated in the previous section.
329 A zoned block device (e.g. an NVMe Zoned Namespace device) may have limits on
336 To avoid these potential errors, the "explicit-open" mount option forces zones
340 "explicit-open" mount option will result in a zone close command being issued
347 The mkzonefs tool is used to format zoned block devices for use with zonefs.
350 https://github.com/damien-lemoal/zonefs-tools
352 zonefs-tools also includes a test suite which can be run against any zoned
353 block device, including null_blk block device created with zoned mode.
356 --------
358 The following formats a 15TB host-managed SMR HDD with 256 MB zones
361 # mkzonefs -o aggr_cnv /dev/sdX
362 # mount -t zonefs /dev/sdX /mnt
363 # ls -l /mnt/
365 dr-xr-xr-x 2 root root 1 Nov 25 13:23 cnv
366 dr-xr-xr-x 2 root root 55356 Nov 25 13:23 seq
368 The size of the zone files sub-directories indicate the number of files
373 # ls -l /mnt/cnv
375 -rw-r----- 1 root root 140391743488 Nov 25 13:23 0
380 # mount -o loop /mnt/cnv/0 /data
382 The "seq" sub-directory grouping files for sequential write zones has in this
385 # ls -lv /mnt/seq
387 -rw-r----- 1 root root 0 Nov 25 13:23 0
388 -rw-r----- 1 root root 0 Nov 25 13:23 1
389 -rw-r----- 1 root root 0 Nov 25 13:23 2
391 -rw-r----- 1 root root 0 Nov 25 13:23 55354
392 -rw-r----- 1 root root 0 Nov 25 13:23 55355
402 # ls -l /mnt/seq/0
403 -rw-r----- 1 root root 4096 Nov 25 13:23 /mnt/seq/0
408 # truncate -s 268435456 /mnt/seq/0
409 # ls -l /mnt/seq/0
410 -rw-r----- 1 root root 268435456 Nov 25 13:49 /mnt/seq/0
413 append-writes to the file::
415 # truncate -s 0 /mnt/seq/0
416 # ls -l /mnt/seq/0
417 -rw-r----- 1 root root 0 Nov 25 13:49 /mnt/seq/0
425 Size: 0 Blocks: 524288 IO Block: 4096 regular empty file
427 Access: (0640/-rw-r-----) Uid: ( 0/ root) Gid: ( 0/ root)
428 Access: 2019-11-25 13:23:57.048971997 +0900
429 Modify: 2019-11-25 13:52:25.553805765 +0900
430 Change: 2019-11-25 13:52:25.553805765 +0900
431 Birth: -
435 capacity in this example. Of note is that the "IO block" field always