resctrl_ui.rst - OpenGrok cross reference for /Documentation/x86/resctrl

Lines Matching +full:cache +full:-
1 .. SPDX-License-Identifier: GPL-2.0
9 :Authors: - Fenghua Yu <fenghua.yu@intel.com>
10           - Tony Luck <tony.luck@intel.com>
11           - Vikas Shivappa <vikas.shivappa@intel.com>
22 CAT (Cache Allocation Technology)		"cat_l3", "cat_l2"
24 CQM (Cache QoS Monitoring)			"cqm_llc", "cqm_occup_llc"
31  # mount -t resctrl resctrl [-o cdp[,cdpl2][,mba_MBps]] /sys/fs/resctrl
36 	Enable code/data prioritization in L3 cache allocations.
38 	Enable code/data prioritization in L2 cache allocations.
46 monitoring, only control, or both monitoring and control.  Cache
47 pseudo-locking is a unique way of using cache control to "pin" or
48 "lock" data in the cache. Details can be found in
49 "Cache Pseudo-Locking".
67 Cache resource(L3/L2)  subdirectory contains the following files
84 		setting up exclusive cache partitions. Note that
86 		own settings for cache use which can over-ride
118 			      Corresponding region is pseudo-locked. No
138 		non-linear. This field is purely informational
156 		counter can be considered for re-use.
169 	mask f7 has non-consecutive 1-bits
212 	When the resource group is in pseudo-locked mode this file will
214 	pseudo-locked region.
225 	Each resource has its own line and format - see below for details.
236 	cache pseudo-locked region is created by first writing
237 	"pseudo-locksetup" to the "mode" file before writing the cache
238 	pseudo-locked region's schemata to the resource group's "schemata"
239 	file. On successful pseudo-locked region creation the mode will
240 	automatically change to "pseudo-locked".
256 -------------------------
261 1) If the task is a member of a non-default group, then the schemata
271 -------------------------
272 1) If a task is a member of a MON group, or non-default CTRL_MON group
283 Notes on cache occupancy monitoring and control
286 this only affects *new* cache allocations by the task. E.g. you may have
287 a task in a monitor group showing 3 MB of cache occupancy. If you move
290 the new group zero. When the task accesses locations still in cache from
292 you will likely see the occupancy in the old group go down as cache lines
293 are evicted and re-used while the occupancy in the new group rises as
294 the task accesses memory and loads into the cache are counted based on
297 The same applies to cache allocation control. Moving a task to a group
298 with a smaller cache partition will not evict any cache lines. The
308 max_threshold_occupancy - generic concepts
309 ------------------------------------------
312 the RMID is still tagged the cache lines of the previous user of RMID.
313 Hence such RMIDs are placed on limbo list and checked back if the cache
315 limbo RMIDs but which are not ready to be used, user may see an -EBUSY
321 Schemata files - general concepts
322 ---------------------------------
327 Cache IDs
328 ---------
329 On current generation systems there is one L3 cache per socket and L2
332 caches on a socket, multiple cores could share an L2 cache. So instead
334 a resource we use a "Cache ID". At a given cache level this will be a
337 CPU look in /sys/devices/system/cpu/cpu*/cache/index*/id
339 Cache Bit Masks (CBM)
340 ---------------------
341 For cache resources we describe the portion of the cache that is available
343 by each cpu model (and may be different for different cache levels). It
347 0x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9
348 and 0xA are not.  On a system with a 20-bit mask each bit represents 5%
349 of the capacity of the cache. You could partition the cache into four
411 ----------------------------------------------------------------
417 ------------------------------------------------------------------
425 ------------------------
438 ------------------------------------------
440 Memory b/w domain is L3 cache.
446 ---------------------------------------------
448 Memory bandwidth domain is L3 cache.
454 ---------------------------------
468 Cache Pseudo-Locking
470 CAT enables a user to specify the amount of cache space that an
471 application can fill. Cache pseudo-locking builds on the fact that a
472 CPU can still read and write data pre-allocated outside its current
473 allocated area on a cache hit. With cache pseudo-locking, data can be
474 preloaded into a reserved portion of cache that no application can
475 fill, and from that point on will only serve cache hits. The cache
476 pseudo-locked memory is made accessible to user space where an
480 The creation of a cache pseudo-locked region is triggered by a request
482 to be pseudo-locked. The cache pseudo-locked region is created as follows:
484 - Create a CAT allocation CLOSNEW with a CBM matching the schemata
485   from the user of the cache region that will contain the pseudo-locked
487   on the system and no future overlap with this cache region is allowed
488   while the pseudo-locked region exists.
489 - Create a contiguous region of memory of the same size as the cache
491 - Flush the cache, disable hardware prefetchers, disable preemption.
492 - Make CLOSNEW the active CLOS and touch the allocated memory to load
493   it into the cache.
494 - Set the previous CLOS as active.
495 - At this point the closid CLOSNEW can be released - the cache
496   pseudo-locked region is protected as long as its CBM does not appear in
497   any CAT allocation. Even though the cache pseudo-locked region will from
499   any CLOS will be able to access the memory in the pseudo-locked region since
500   the region continues to serve cache hits.
501 - The contiguous region of memory loaded into the cache is exposed to
502   user-space as a character device.
504 Cache pseudo-locking increases the probability that data will remain
505 in the cache via carefully configuring the CAT feature and controlling
507 cache. Instructions like INVD, WBINVD, CLFLUSH, etc. can still evict
508 “locked” data from cache. Power management C-states may shrink or
509 power off cache. Deeper C-states will automatically be restricted on
510 pseudo-locked region creation.
512 It is required that an application using a pseudo-locked region runs
514 with the cache on which the pseudo-locked region resides. A sanity check
515 within the code will not allow an application to map pseudo-locked memory
516 unless it runs with affinity to cores associated with the cache on which the
517 pseudo-locked region resides. The sanity check is only done during the
521 Pseudo-locking is accomplished in two stages:
524    of cache that should be dedicated to pseudo-locking. At this time an
526    cache portion, and exposed as a character device.
527 2) During the second stage a user-space application maps (mmap()) the
528    pseudo-locked memory into its address space.
530 Cache Pseudo-Locking Interface
531 ------------------------------
532 A pseudo-locked region is created using the resctrl interface as follows:
535 2) Change the new resource group's mode to "pseudo-locksetup" by writing
536    "pseudo-locksetup" to the "mode" file.
537 3) Write the schemata of the pseudo-locked region to the "schemata" file. All
541 On successful pseudo-locked region creation the "mode" file will contain
542 "pseudo-locked" and a new character device with the same name as the resource
544 by user space in order to obtain access to the pseudo-locked memory region.
546 An example of cache pseudo-locked region creation and usage can be found below.
548 Cache Pseudo-Locking Debugging Interface
549 ----------------------------------------
550 The pseudo-locking debugging interface is enabled by default (if
554 location is present in the cache. The pseudo-locking debugging interface uses
555 the tracing infrastructure to provide two ways to measure cache residency of
556 the pseudo-locked region:
560    example below). In this test the pseudo-locked region is traversed at
562    are disabled. This also provides a substitute visualization of cache
564 2) Cache hit and miss measurements using model specific precision counters if
565    available. Depending on the levels of cache on the system the pseudo_lock_l2
568 When a pseudo-locked region is created a new debugfs directory is created for
570 write-only file, pseudo_lock_measure, is present in this directory. The
571 measurement of the pseudo-locked region depends on the number written to this
579      writing "2" to the pseudo_lock_measure file will trigger the L2 cache
580      residency (cache hits and misses) measurement captured in the
583      writing "3" to the pseudo_lock_measure file will trigger the L3 cache
584      residency (cache hits and misses) measurement captured in the
592 In this example a pseudo-locked region named "newlock" was created. Here is
624 Example of cache hits/misses debugging
626 In this example a pseudo-locked region named "newlock" was created on the L2
627 cache of a platform. Here is how we can obtain details of the cache hits
639   #                              _-----=> irqs-off
640   #                             / _----=> need-resched
641   #                            | / _---=> hardirq/softirq
642   #                            || / _--=> preempt-depth
644   #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
646   pseudo_lock_mea-1672  [002] ....  3132.860500: pseudo_lock_l2: hits=4097 miss=0
654 On a two socket machine (one L3 cache per socket) with just four bits
655 for cache bit masks, minimum b/w of 10% with a memory bandwidth
659   # mount -t resctrl resctrl /sys/fs/resctrl
669 "lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1.
670 Tasks in group "p1" use the "lower" 50% of cache on both sockets.
675 Note that unlike cache masks, memory b/w cannot specify whether these
692 Again two sockets, but this time with a more realistic 20-bit mask.
695 processor 1 on socket 0 on a 2-socket and dual core machine. To avoid noisy
696 neighbors, each of the two real-time tasks exclusively occupies one quarter
697 of L3 cache on socket 0.
700   # mount -t resctrl resctrl /sys/fs/resctrl
704 50% of the L3 cache on socket 0 and 50% of memory b/w cannot be used by
710 it access to the "top" 25% of the cache on socket 0.
723   # taskset -cp 1 1234
725 Ditto for the second real time task (with the remaining 25% of cache)::
730   # taskset -cp 2 5678
739   # echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata
745   # echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata
749 A single socket system which has real-time tasks running on core 4-7 and
750 non real-time workload assigned to core 0-3. The real-time tasks share text
756   # mount -t resctrl resctrl /sys/fs/resctrl
760 50% of the L3 cache on socket 0, and 50% of memory bandwidth on socket 0
766 to the "top" 50% of the cache on socket 0 and 50% of memory bandwidth on
773 Finally we move core 4-7 over to the new group and make sure that the
774 kernel and the tasks running there get 50% of the cache. They should
775 also get 50% of memory bandwidth assuming that the cores 4-7 are SMT
776 siblings and only the real time threads are scheduled on the cores 4-7.
784 mode allowing sharing of their cache allocations. If one resource group
785 configures a cache allocation then nothing prevents another resource group
789 system with two L2 cache instances that can be configured with an 8-bit
791 25% of each cache instance.
794   # mount -t resctrl resctrl /sys/fs/resctrl/
798 cache::
811   -sh: echo: write error: Invalid argument
838 The bit_usage will reflect how the cache is used::
846   -sh: echo: write error: Invalid argument
850 Example of Cache Pseudo-Locking
852 Lock portion of L2 cache from cache id 1 using CBM 0x3. Pseudo-locked
857   # mount -t resctrl resctrl /sys/fs/resctrl/
860 Ensure that there are bits available that can be pseudo-locked, since only
861 unused bits can be pseudo-locked the bits to be pseudo-locked needs to be
870 Create a new resource group that will be associated with the pseudo-locked
871 region, indicate that it will be used for a pseudo-locked region, and
872 configure the requested pseudo-locked region capacity bitmask::
875   # echo pseudo-locksetup > newlock/mode
878 On success the resource group's mode will change to pseudo-locked, the
879 bit_usage will reflect the pseudo-locked region, and the character device
880 exposing the pseudo-locked region will exist::
883   pseudo-locked
886   # ls -l /dev/pseudo_lock/newlock
887   crw------- 1 root root 243, 0 Apr  3 05:01 /dev/pseudo_lock/newlock
892   * Example code to access one page of pseudo-locked cache region
905   * cores associated with the pseudo-locked region. Here the cpu
942     /* Application interacts with pseudo-locked memory @mapping */
956 ----------------------------
961 As an example, the allocation of an exclusive reservation of L3 cache
964   1. Read the cbmmasks from each directory or the per-resource "bit_usage"
995   $ flock -s /sys/fs/resctrl/ find /sys/fs/resctrl
999   $ cat create-dir.sh
1001   mask = function-of(output.txt)
1005   $ flock /sys/fs/resctrl/ ./create-dir.sh
1024       exit(-1);
1036       exit(-1);
1048       exit(-1);
1057     if (fd == -1) {
1059       exit(-1);
1073 ----------------------
1080 ------------------------------------------------------------------------
1081 On a two socket machine (one L3 cache per socket) with just four bits
1082 for cache bit masks::
1084   # mount -t resctrl resctrl /sys/fs/resctrl
1096 "lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1.
1097 Tasks in group "p1" use the "lower" 50% of cache on both sockets.
1124 --------------------------------------------
1125 On a two socket machine (one L3 cache per socket)::
1127   # mount -t resctrl resctrl /sys/fs/resctrl
1144 ---------------------------------------------------------------------
1151 This can also be used to profile jobs cache size footprint before being
1155   # mount -t resctrl resctrl /sys/fs/resctrl
1179 -----------------------------------
1181 A single socket system which has real time tasks running on cores 4-7
1182 and non real time tasks on other cpus. We want to monitor the cache
1186   # mount -t resctrl resctrl /sys/fs/resctrl
1190 Move the cpus 4-7 over to p1::