xfs-delayed-logging-design.rst - OpenGrok cross reference for /Documentation/filesystems/xfs/xfs-delayed-logging-design.rst

Lines Matching +full:record +full:- +full:size
1 .. SPDX-License-Identifier: GPL-2.0
33 details logged are made up of the changes to in-core structures rather than
34 on-disk structures. Other objects - typically buffers - have their physical
63 The type and size of reservation must be matched to the modification taking
64 place.  This means that permanent transactions can be used for one-shot
65 modifications, but one-shot reservations cannot be used for permanent
68 In the code, a one-shot transaction pattern looks somewhat like this::
97 While this might look similar to a one-shot transaction, there is an important
123 the on-disk journal.
165 transaction, we have to reserve enough space to record a full leaf-to-root split
183 For one-shot transactions, a single unit space reservation is all that is
185 also have a "log count" that affects the size of the reservation that is to be
190 transaction rolling mechanism to re-reserve space on every transaction roll. We
194 For example, an inode allocation is typically two transactions - one to
205 means we can roll the transaction multiple times before we have to re-reserve
210 re-reserve physical space in the log. This is somewhat complex, and requires
219 of a cycle number - the number of times the log has been overwritten - and the
233 reservations currently held by active transactions. It is a purely in-memory
251 - and it mostly does track exactly the same location as the reserve grant head -
269 grant head does not track physical space - it only accounts for the amount of
278 xfs_trans_commit() calls, while the physical log space reservation - tracked by
279 the write head - is then reserved separately by a call to xfs_log_reserve()
287 "Re-logging" the locked items on every transaction roll ensures that the items
292 move the tail of the log forwards to free up write grant space. Re-logging the
294 making cannot self-deadlock.
303 Re-logging Explained
309 method called "re-logging". Conceptually, this is quite simple - all it requires
334 implement long-running, multiple-commit permanent transactions. 
338 of reservation size limitations. Hence a rolling extent removal transaction
347 the log - repeated operations to the same objects write the same changes to
357 in memory - batching them, if you like - to minimise the impact of the log IO on
360 The limitation on asynchronous transaction throughput is the number and size of
362 buffers available and the size of each is 32kB - the size can be increased up
366 that can be made to the filesystem at any point in time - if all the log
383 but only one of those copies needs to be there - the last one "D", as it
402 actually relatively easy to do - all the changes to logged items are already
410 metadata changes from the size and number of log buffers available. In other
438 	4. No on-disk format change (metadata or log format).
446 ---------------
463 The solution is relatively simple - it just took a long time to recognise it.
486     Object    +---------------------------------------------+
487     Vector 1      +----+
488     Vector 2                    +----+
489     Vector 3                                   +----------+
493     Log Buffer    +-V1-+-V2-+----V3----+
497     Object    +---------------------------------------------+
498     Vector 1      +----+
499     Vector 2                    +----+
500     Vector 3                                   +----------+
504     Memory Buffer +-V1-+-V2-+----V3----+
505     Vector 1      +----+
506     Vector 2           +----+
507     Vector 3                +----------+
518 buffer writing (i.e. double encapsulation). This would be an on-disk format
525 self-describing object that can be passed to the log buffer write code to be
527 Hence we avoid needing a new on-disk format to handle items that have been
532 ----------------
534 Now that we can record transactional changes in memory in a form that allows
543 and as such are stored in the Active Item List (AIL) which is a LSN-ordered
561 its place in the list and re-inserted at the tail. This is entirely arbitrary
562 and done to make it easy for debugging - the last items in the list are the
569 ----------------------------
576 log replay - all the changes in all the objects in a given transaction must
582 transaction. Fortunately, the XFS log code has no fixed limit on the size of a
584 the transaction cannot be larger than just under half the size of the log.  The
591 size of a checkpoint to be slightly less than a half the log.
593 Apart from this size requirement, a checkpoint transaction looks no different
594 to any other transaction - it contains a transaction header, a series of
595 formatted log items and a commit record at the tail. From a recovery
596 perspective, the checkpoint transaction is also no different - just a lot
598 might need to tune the recovery transaction object hash size.
606 the transaction commit record, but tracking this requires us to have a
607 per-checkpoint context that travels through the log write process through to
638 	Log Item <-> log vector 1	-> memory buffer
639 	   |				-> vector array
641 	Log Item <-> log vector 2	-> memory buffer
642 	   |				-> vector array
647 	Log Item <-> log vector N-1	-> memory buffer
648 	   |				-> vector array
650 	Log Item <-> log vector N	-> memory buffer
651 					-> vector array
659 	log vector 1	-> memory buffer
660 	   |		-> vector array
661 	   |		-> Log Item
663 	log vector 2	-> memory buffer
664 	   |		-> vector array
665 	   |		-> Log Item
670 	log vector N-1	-> memory buffer
671 	   |		-> vector array
672 	   |		-> Log Item
674 	log vector N	-> memory buffer
675 			-> vector array
676 			-> Log Item
683 attached to the log buffer that the commit record was written to along with a
703 --------------------------------------
710 re-using a freed metadata extent for a data extent), a special, optimised log
713 To do this, transactions need to record the LSN of the commit record of the
720 As discussed in the checkpoint section, delayed logging uses per-checkpoint
725 atomic counter - we can just take the current context sequence number and add
739 the checkpoint context records the LSN of the commit record for the checkpoint,
740 we can also wait on the log buffer that contains the commit record, thereby
754 else for such serialisation - it only matters when we do a log force.
767 ------------------------------------------------
777 usage of the transaction. The reservation accounts for log record headers,
781 the size of the transaction and the number of regions being logged (the number
785 inode changes. If you modify lots of inode cores (e.g. ``chmod -R g+w *``), then
792 buffer format structure for each buffer - roughly 800 vectors or 1.51MB total
805 problematic. Typically log record headers use at least 16KB of log space per
810 reservation of around 150KB, which is a non-trivial amount of space.
812 A static reservation needs to manipulate the log grant counters - we can take a
840 As mentioned early, transactions can't grow to more than half the size of the
841 log. Hence as part of the reservation growing, we need to also check the size
842 of the reservation against the maximum allowed transaction size. If we reach
859 ---------------------------------
875 That is, we now have a many-to-one relationship between transaction commit and
883 pin the object the first time it is inserted into the CIL - if it is already in
900 ---------------------------------------
910 points in the design - the three important ones are:
917 that we have a many-to-one interaction here. That is, the only restriction on
924 relatively long period of time - the pinning of log items needs to be done
932 really needs to be a sleeping lock - if the CIL flush takes the lock, we do not
941 compared to transaction commit for asynchronous transaction workloads - only
942 time will tell if using a read-write semaphore for exclusion will limit
954 The final serialisation point is the checkpoint commit record ordering code
958 before writing the commit record. This loop walks the list of committing
960 record write. As a result it needs a lock and a wait variable. Log force
965 events they are waiting for are different. The checkpoint commit record
967 (obtained through completion of a commit record write) while log force
979 -----------------
992 		Record modifications in log item
1019 Essentially, steps 1-6 operate independently from step 7, which is also
1020 independent of steps 8-9. An item can be locked in steps 1-6 or steps 8-9
1021 at the same time step 7 is occurring, but only steps 1-6 or 8-9 can occur
1023 and steps 1-6 are re-entered, then the item is relogged. Only when steps 8-9
1037 		Record modifications in log item
1075 logging methods are in the middle of the life cycle - they still have the same
1081 As a result of this zero-impact "insertion" of delayed logging infrastructure