xfs-delayed-logging-design.txt - OpenGrok cross reference for /kernel/linux/linux-4.19/Documentation/filesystems/xfs-delayed-logging-design.txt

Lines Matching +full:reserved +full:- +full:cpu +full:- +full:vectors
2 --------------------------
4 Introduction to Re-logging in XFS
5 ---------------------------------
9 logged are made up of the changes to in-core structures rather than on-disk
10 structures. Other objects - typically buffers - have their physical changes
21 "re-logging". Conceptually, this is quite simple - all it requires is that any
45 (increasing) LSN of each subsequent transaction - the LSN is effectively a
48 This relogging is also used to implement long-running, multiple-commit
62 the log - repeated operations to the same objects write the same changes to
71 doing aggregation of transactions in memory - batching them, if you like - to
76 buffers available and the size of each is 32kB - the size can be increased up
80 that can be made to the filesystem at any point in time - if all the log
82 the current batch completes. It is now common for a single current CPU core to
88 -------------------------
97 but only one of those copies needs to be there - the last one "D", as it
116 actually relatively easy to do - all the changes to logged items are already
152 	4. No on-disk format change (metadata or log format).
157 -----------------------
176 The solution is relatively simple - it just took a long time to recognise it.
179 simply copies the memory these vectors point to into the log buffer during
199 Object    +---------------------------------------------+
200 Vector 1      +----+
201 Vector 2                    +----+
202 Vector 3                                   +----------+
206 Log Buffer    +-V1-+-V2-+----V3----+
210 Object    +---------------------------------------------+
211 Vector 1      +----+
212 Vector 2                    +----+
213 Vector 3                                   +----------+
217 Memory Buffer +-V1-+-V2-+----V3----+
218 Vector 1      +----+
219 Vector 2           +----+
220 Vector 3                +----------+
228 buffer is to support splitting vectors across log buffer boundaries correctly.
231 buffer writing (i.e. double encapsulation). This would be an on-disk format
238 self-describing object that can be passed to the log buffer write code to be
239 handled in exactly the same manner as the existing log vectors are handled.
240 Hence we avoid needing a new on-disk format to handle items that have been
255 and as such are stored in the Active Item List (AIL) which is a LSN-ordered
273 it's place in the list and re-inserted at the tail. This is entirely arbitrary
274 and done to make it easy for debugging - the last items in the list are the
287 log replay - all the changes in all the objects in a given transaction must
305 to any other transaction - it contains a transaction header, a series of
307 perspective, the checkpoint transaction is also no different - just a lot
312 items are stored as log vectors, we can use the existing log buffer writing
316 way it separates the writing of the transaction contents (the log vectors) from
318 per-checkpoint context that travels through the log write process through to
339 to store the list of log vectors that need to be written into the transaction.
340 Hence log vectors need to be able to be chained together to allow them to be
349 	Log Item <-> log vector 1	-> memory buffer
350 	   |				-> vector array
352 	Log Item <-> log vector 2	-> memory buffer
353 	   |				-> vector array
358 	Log Item <-> log vector N-1	-> memory buffer
359 	   |				-> vector array
361 	Log Item <-> log vector N	-> memory buffer
362 					-> vector array
370 	log vector 1	-> memory buffer
371 	   |		-> vector array
372 	   |		-> Log Item
374 	log vector 2	-> memory buffer
375 	   |		-> vector array
376 	   |		-> Log Item
381 	log vector N-1	-> memory buffer
382 	   |		-> vector array
383 	   |		-> Log Item
385 	log vector N	-> memory buffer
386 			-> vector array
387 			-> Log Item
401 efficient way to track vectors, even though it seems like the natural way to do
403 vectors and break the link between the log item and the log vector means that
405 the log vector chaining. If we track by the log vectors, then we only need to
409 vectors in one checkpoint transaction. I'd guess this is a "measure and
420 re-using a freed metadata extent for a data extent), a special, optimised log
430 As discussed in the checkpoint section, delayed logging uses per-checkpoint
435 atomic counter - we can just take the current context sequence number and add
464 else for such serialisation - it only matters when we do a log force.
491 of log vectors in the transaction).
494 inode changes. If you modify lots of inode cores (e.g. chmod -R g+w *), then
496 format structure. That is, two vectors totaling roughly 150 bytes. If we modify
497 10,000 inodes, we have about 1.5MB of metadata to write in 20,000 vectors. Each
501 buffer format structure for each buffer - roughly 800 vectors or 1.51MB total
519 reservation of around 150KB, which is a non-trivial amount of space.
521 A static reservation needs to manipulate the log grant counters - we can take a
540 available in their reservation for this as they have already reserved the
583 That is, we now have a many-to-one relationship between transaction commit and
591 pin the object the first time it is inserted into the CIL - if it is already in
613 there was only one CPU using it, but it does not slow down either.
617 points in the design - the three important ones are:
624 that we have a many-to-one interaction here. That is, the only restriction on
628 128MB log, which means that it is generally one per CPU in a machine.
631 relatively long period of time - the pinning of log items needs to be done
639 really needs to be a sleeping lock - if the CIL flush takes the lock, we do not
640 want every other CPU in the machine spinning on the CIL lock. Given that
648 compared to transaction commit for asynchronous transaction workloads - only
649 time will tell if using a read-write semaphore for exclusion will limit
664 an ordering loop after writing all the log vectors into the log buffers but
725 Essentially, steps 1-6 operate independently from step 7, which is also
726 independent of steps 8-9. An item can be locked in steps 1-6 or steps 8-9
727 at the same time step 7 is occurring, but only steps 1-6 or 8-9 can occur
729 and steps 1-6 are re-entered, then the item is relogged. Only when steps 8-9
756 		Chain log vectors and buffers together
759 		write log vectors into log
781 logging methods are in the middle of the life cycle - they still have the same
787 As a result of this zero-impact "insertion" of delayed logging infrastructure