DESIGN.md - OpenGrok cross reference for /third

Lines Matching refs:we
61    For ROM, this means we need to keep our design simple and reuse code paths
62    were possible. For RAM we have a stronger requirement, all RAM usage is
71 however they often share and borrow feature from each other. If we look at
72 power-loss resilience and wear leveling, we can narrow these down to a handful
75 1. First we have the non-resilient, block based filesystems, such as [FAT] and
109 2. In a completely different direction, we have logging filesystems, such as
126    Logging filesystem are beautifully elegant. With a checksum, we can easily
131    The main downside is performance. If we look at garbage collection, the
144    filesystem. [ext4] and [NTFS] are good examples. Here, we take a normal
145    block based filesystem and add a bounded log where we note every change
184 4. Last but not least we have copy-on-write (COW) filesystems, such as
188    our new block. This recursively pushes all of our problems upwards until we
231 If we look at existing filesystems, there are two interesting design patterns
236 Can we work around these limitations?
239 can't avoid these costs, _but_ if we put an upper bound on the size we can at
244 In the case of COW data structures, we can try twisting the definition a bit.
249 each level we divide the propagation of wear by _n_. With a sufficiently
253 have weaknesses that limit their usefulness. But if we merge the two they can
305 incrementally program new data onto erased blocks, but we need to erase a full
306 block at a time. This means that in order for our circular buffer to work, we
310 do we store references to these logs? Because the blocks themselves are erased
313 pair. This has the added advantage that we can change out blocks in the
314 metadata pair independently, and we don't reduce our block granularity for
317 In order to determine which metadata block is the most recent, we store a
318 revision count that we compare using [sequence arithmetic][wikipedia-sna]
351 So how do we atomically update our metadata pairs? Atomicity (a type of
353 Error detection can be provided with a checksum, and in littlefs's case we
358    append more entries, we can simply append the entries to the log. Because
359    we don't overwrite the original entries (remember rewriting flash requires
360    an erase), we still have the original entries if we lose power during the
383    operation. What we can do instead is group multiple entries into a commit
405 2. If our block _is_ full of entries, we need to somehow remove outdated
407    collection, but because littlefs has multiple garbage collectors, we
411    simple. We want to avoid RAM consumption, so we use a sort of brute force
412    solution where for each entry we check to see if a newer entry has been
413    written. If the entry is the most recent we append it to our new block. This
414    is where having two blocks becomes important, if we lose power we still have
417    During this compaction step we also erase the metadata block and increment
418    the revision count. Because we can commit multiple entries at once, we can
441 3. If our block is full of entries _and_ we can't find any garbage, then what?
443    more space is available, but because we have small logs, overflowing a log
446    Instead, we split our original metadata pair into two metadata pairs, each
449    associated with larger logs, we form a linked list of small bounded logs.
453    Despite writing to two metadata pairs, we can still maintain power
502 Clearly we need to be more aggressive than waiting for our metadata pair to
509 garbage collection). If we look at the amortized runtime complexity of updating
510 this log we get this formula:
514 If we let ![r] be the ratio of static space to the size of our log in bytes, we
526 Assuming 100 byte entries in a 4 KiB log, we can graph this using the entry
531 So at 50% usage, we're seeing an average of 2x cost per update, and at 75%
532 usage, we're already at an average of 4x cost per update.
535 to be full, we split the metadata pair once we exceed 50% capacity. We do this
536 lazily, waiting until we need to compact before checking if we fit in our 50%
542 If we look at metadata pairs and linked-lists of metadata pairs at a high
550 metadata pairs will only be 1/2 full. If we include the overhead of the second
554 provide a mechanism for performing atomic updates, but we need a separate
560 storage cost. But we can work around this storage cost by only using the
581 So, what can we do? First let's consider storing files in a simple COW
582 linked-list. Appending a block, which is the basis for writing files, means we
584 operation, which means we need to update the second-to-last block, and then the
585 third-to-last, and so on until we've copied out the entire file.
596 To avoid a full copy during appends, we can store the data backwards. Appending
598 updated. If we update a block in the middle, we still need to copy the
616 Fortunately we can do better. Instead of a singly linked list, littlefs
666 To get _to_ the block with the most pointers, we can perform the same steps
668 note is that this optimal path occurs naturally if we greedily choose the
671 So now we have a [COW] data structure that is cheap to append with a runtime
673 that this runtime is also divided by the amount of data we can store in a
678 This is a new data structure, so we still have several questions. What is the
680 we store a CTZ skip-list in our metadata pairs?
682 To find the storage overhead, we can look at the data structure as multiple
685 the previous. As we approach infinity, the storage overhead forms a geometric
691 Because our file size is limited the word width we use to store sizes, we can
692 also solve for the maximum number of pointers we would ever need to store in a
693 block. If we set the overhead of pointers equal to the block size, we get the
708 is larger than 104 bytes, we can avoid the extra logic needed to handle
711 This last question is how do we store CTZ skip-lists? We need a pointer to the
714 index + offset pair. So in theory we can store only a single pointer and size.
728 However, despite the integration of a bitwise operation, we can actually reduce
745 infinity, we end up with an average overhead of 2 pointers, which matches our
750 Now we can substitute into our original equation to find a more efficient
755 Unfortunately, the popcount function is non-injective, so we can't solve this
756 equation for our index. But what we can do is solve for an ![n'] index that
759 is smaller than our integer resolution. As it turns out, we only need to
764 Now that we have our index ![n], we can just plug it back into the above
766 overflow, but we can avoid this by rearranging the equation a bit:
771 Now we can find both our block index and offset from a size in _O(1)_, letting
839 So we now have the framework for an atomic, wear leveling filesystem. Small two
843 But now we need to look at the [elephant] in the room. Where do all these
888 languages, there are only a handful of data structures we need to traverse.
904 strategy here, as it violates our constant RAM requirement, but we may be able
919 a brute force traversal. Instead of a bitmap the size of storage, we keep track
921 allocation, we take blocks from the lookahead buffer. If the lookahead buffer
922 is empty, we scan the filesystem for more free blocks, populating our lookahead
923 buffer. In each scan we use an increasing offset, circling the storage as
978 All writes must be sourced by some form of data in RAM, so immediately after we
979 write to a block, we can read the data back and verify that it was written
980 correctly. If we find that the data on disk does not match the copy we have in
981 RAM, a write error has occurred and we most likely have a bad block.
983 Once we detect a bad block, we need to recover from it. In the case of write
984 errors, we have a copy of the corrupted data in RAM, so all we need to do is
992 nothing prevents us from triggering a COW operation as soon as we find a bad
1173 cycle we'll eventually find a new block where a write succeeds. If we don't,
1174 that means that all blocks in our storage are bad, and we've reached the end of
1183 a copy of the data lingering around in RAM, so we need a way to reconstruct the
1191 errors approaches the Hamming bound, we may still be able to detect errors, but
1192 can no longer fix the data. If we've reached this point the block is
1210 To avoid read errors, we need to be proactive, as opposed to reactive as we
1214 some threshold, but is still recoverable. With ECC we can do this at write
1225 1. [Dynamic wear leveling][wikipedia-dynamic-wear-leveling], where we
1229 2. [Static wear leveling][wikipedia-static-wear-leveling], where we
1231    we need to consider all blocks, including blocks that already contain data.
1239 means is that we don’t actively track wear, instead we rely on a uniform
1246 is powered, in which case we allocate the blocks linearly, circling the device.
1249 Instead, we start the allocator as a random offset every time we mount the
1293 random numbers when the filesystem is modified. This is exactly what we want
1309 Now that we have our building blocks out of the way, we can start looking at
1312 The first step: How do we actually store our files?
1315 so following the precedent found in other filesystems we could give each file
1338 the CTZ skip-list, we find ourselves using a full 3 blocks. On most NOR flash
1378 metadata pair, we can store multiple files in a single metadata pair. One way
1424 The second improvement we can make is noticing that for very small files, our
1429 In this case, we can store the file directly in our directory's metadata pair.
1470 Once the file exceeds 1/4 the block size, we switch to a CTZ skip-list. This
1478 Now we just need directories to store our files. As mentioned above we want
1480 complications we need to sort out.
1483 store an unlimited number of files in each directory, and we don't need to
1517 skip-lists, we're not limited to strict COW operations. One thing we can do is
1546 whenever we want to manipulate the directory tree, multiple pointers need to be
1555 With the possibility of orphans, we can build power loss resilient operations
1688 In addition to normal directory tree operations, we can use orphans to evict
1690 erases. If we lose power while evicting a metadata block we may end up with
1810 Fortunately, we only need to check for orphans on the first allocation after
1813 If we only had some sort of global state, then we could also store a flag and
1814 avoid searching for orphans unless we knew we were specifically interrupted
1823 In littlefs we can atomically commit to directories, but we can't create
1828 filesystems. As a filesystem designed for power-loss, it's important we get
1831 So what can we do?
1836   it isn't user facing and we can handle the corner cases internally.
1842 - In a previous version of littlefs we tried to solve this problem by going
1876 To update the global state from a metadata pair, we take the global state we
1907 To make this efficient, we always keep a copy of the global state in RAM. We
1912 RAM and a delta in an unbounded number of metadata pairs. Even if we reset
1913 the global state to its initial value, we can't easily clean up the deltas on
1914 disk. For this reason, it's very important that we keep the size of global
1920 Now we can solve the move problem. We can create global state describing our
1921 move atomically with the creation of the new file, and we can clear this move
1995 If, after building our global state during mount, we find information
1996 describing an ongoing move, we know we lost power during a move and the file
1998 we can resolve the move using the information in the global state to remove
2027 We can also move directories the same way we move files. There is the threaded
2104 Global state gives us a powerful tool we can use to solve the move problem.
2107 global state gives us a bit of persistent state we can use for some other