• Home
  • Raw
  • Download

Lines Matching +full:wait +full:- +full:on +full:- +full:write

1 .. SPDX-License-Identifier: GPL-2.0-only
4 Design of dm-vdo
7 The dm-vdo (virtual data optimizer) target provides inline deduplication,
8 compression, zero-block elimination, and thin provisioning. A dm-vdo target
12 production environments ever since. It was made open-source in 2017 after
14 dm-vdo. For usage, see vdo.rst in the same directory as this file.
25 The design of dm-vdo is based on the idea that deduplication is a two-part
27 storing multiple copies of those duplicates. Therefore, dm-vdo has two main
34 -------------------
37 structures involved in a single write operation to a vdo target is larger
38 than most other targets. Furthermore, because vdo must operate on small
41 design attempts to be lock-free.
54 each zone has an implicit lock on the structures it manages for all its
59 reflected in the on-disk representation of each data structure. Therefore,
64 -----------------------
79 trade-off between the storage saved and the resources expended to achieve
83 Each block of data is hashed to produce a 16-byte block name. An index
85 that data on the underlying storage. However, it is not possible to
87 because it is too costly to update the index when a block is over-written
89 with the blocks, which is difficult to do efficiently in block-based
106 reading. The records are written to a series of record pages based on the
108 locality should be on a small number of pages, reducing the I/O required to
115 there is a record for this name, it will be on the indicated page. Closed
116 chapters are read-only structures and their contents are never altered in
149 memory-efficient structure called a delta index. Instead of storing the
160 splitting its key space into many sub-lists, each starting at a fixed key
172 longer be in the index by the time the second write begins (assuming there
190 indexing, the memory requirements do not increase. The trade-off is
193 duplicate data, sparse indexing will detect 97-99% of the deduplication
197 -------------------------------
200 fields and data to track vdo-specific information. A struct vio maintains a
221 --------------
243 memory and are written out, a block at a time in oldest-dirtied-order, only
244 when there is a need to reclaim slab journal space. The write operations
249 zones" in round-robin fashion. If there are P physical zones, then slab n
261 and write requests can be serviced, perhaps with degraded performance,
283 0-811 belong to tree 0, logical addresses 812-1623 belong to tree 1, and so
284 on. The interleaving is maintained all the way up to the 60 root nodes.
289 need to pre-allocate space for the entire set of logical mappings and also
296 time, and is large enough to hold all the non-leaf pages of the entire
302 slab depot. Each write request causes an entry to be made in the journal.
311 before each journal block write to ensure that the physical data for the
312 new block mappings in that block are stable on storage, and journal block
314 entries themselves are stable. The journal entry and the data write it
315 represents must be stable on disk before the other metadata structures may
320 *Write Path*
322 All write I/O to vdo is asynchronous. Each bio will be acknowledged as soon
323 as vdo has done enough work to guarantee that it can complete the write
324 eventually. Generally, the data for acknowledged but unflushed write I/O
326 requires data to be stable on storage, it must issue a flush or write the
330 Application write bios follow the steps outlined below.
338 the data_vio if it is a write and the data is not all zeroes. The data
346 2. The data_vio places a claim (the "logical lock") on the logical address
354 already operating on that logical address, it waits until the previous
361 This stage requires the data_vio to get an implicit lock on the
372 a. If any page-node in the tree has not yet been allocated, it must be
373 allocated before the write can continue. This step requires the
374 data_vio to lock the page-node that needs to be allocated. This
376 that causes other data_vios to wait for the allocation process to
389 waiting on this allocation also proceed.
391 b. In the steady-state case, the block map tree nodes will already be
400 data_vio can write its data somewhere even if deduplication and
401 compression are not possible. This stage gets an implicit lock on a
409 struct pbn_lock (the "physical block lock") on the free block. The
411 claims that data_vios can have on physical blocks. The pbn_lock is
416 sub-component of the slab and are thus also covered by the implicit
420 needs to complete the write. The application bio can safely be
421 acknowledged at this point. The acknowledgment happens on a separate
436 tracked in step 2. This hashtable is covered by the implicit lock on
446 data_vio will wait for the agent to complete its work and then share
451 step 8h and attempts to write its data directly. This can happen if two
464 obtain a physical block lock on the indicated physical address, in
472 agent and any other data_vios waiting on it will record this
480 it has an allocated physical block (from step 3) that it can write
487 compress, the data_vio will continue to step 8h to write its data
493 data_vios. All compression operations require the implicit lock on
499 wait in the packer for an arbitrarily long time for other data_vios
501 evict waiting data_vios when continuing to wait would cause
507 data_vio will proceed to step 8h to write its data directly.
515 The data_vio obtains an implicit lock on the physical zone and
521 step 3. It will write its data to that allocated physical block.
541 on the block map cache structures, covered by the implicit logical zone
547 recovery journal lock. The data_vio will wait in the journal until all
549 and flushed to ensure the transaction is stable on storage.
554 holding a lock on the affected physical slab, covered by its implicit
565 logical-to-physical mapping in the block map to point to the new
566 physical block. At this point the write operation is complete.
585 1 and 2 in the write path to obtain a data_vio and lock its logical
586 address. If there is already a write data_vio in progress for that logical
588 data from the write data_vio and return it. Otherwise, it will look up the
589 logical-to-physical mapping by traversing the block map tree as in step 3,
601 as small as 512 bytes. Processing a write that is smaller than 4K requires
602 a read-modify-write operation that reads the relevant 4K block, copies the
604 write operation for the modified data block. The read and write stages of
605 this operation are nearly identical to the normal read and write
611 recovery journal. During the pre-resume phase of the next start, the
620 *Read-only Rebuild*
622 If a vdo encounters an unrecoverable error, it will enter read-only mode.
626 to the possibility that data has been lost. During a read-only rebuild, the