• Home
  • Raw
  • Download

Lines Matching +full:analog +full:- +full:pass +full:- +full:through

1 .. _cgroup-v2:
11 conventions of cgroup v2. It describes all userland-visible aspects
14 v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgroup-v1>`.
19 1-1. Terminology
20 1-2. What is cgroup?
22 2-1. Mounting
23 2-2. Organizing Processes and Threads
24 2-2-1. Processes
25 2-2-2. Threads
26 2-3. [Un]populated Notification
27 2-4. Controlling Controllers
28 2-4-1. Enabling and Disabling
29 2-4-2. Top-down Constraint
30 2-4-3. No Internal Process Constraint
31 2-5. Delegation
32 2-5-1. Model of Delegation
33 2-5-2. Delegation Containment
34 2-6. Guidelines
35 2-6-1. Organize Once and Control
36 2-6-2. Avoid Name Collisions
38 3-1. Weights
39 3-2. Limits
40 3-3. Protections
41 3-4. Allocations
43 4-1. Format
44 4-2. Conventions
45 4-3. Core Interface Files
47 5-1. CPU
48 5-1-1. CPU Interface Files
49 5-2. Memory
50 5-2-1. Memory Interface Files
51 5-2-2. Usage Guidelines
52 5-2-3. Memory Ownership
53 5-3. IO
54 5-3-1. IO Interface Files
55 5-3-2. Writeback
56 5-3-3. IO Latency
57 5-3-3-1. How IO Latency Throttling Works
58 5-3-3-2. IO Latency Interface Files
59 5-3-4. IO Priority
60 5-4. PID
61 5-4-1. PID Interface Files
62 5-5. Cpuset
63 5.5-1. Cpuset Interface Files
64 5-6. Device
65 5-7. RDMA
66 5-7-1. RDMA Interface Files
67 5-8. HugeTLB
68 5.8-1. HugeTLB Interface Files
69 5-9. Misc
70 5.9-1 Miscellaneous cgroup Interface Files
71 5.9-2 Migration and Ownership
72 5-10. Others
73 5-10-1. perf_event
74 5-N. Non-normative information
75 5-N-1. CPU controller root cgroup process behaviour
76 5-N-2. IO controller root cgroup process behaviour
78 6-1. Basics
79 6-2. The Root and Views
80 6-3. Migration and setns(2)
81 6-4. Interaction with Other Namespaces
83 P-1. Filesystem Support for Writeback
86 R-1. Multiple Hierarchies
87 R-2. Thread Granularity
88 R-3. Competition Between Inner Nodes and Threads
89 R-4. Other Interface Issues
90 R-5. Controller Issues and Remedies
91 R-5-1. Memory
98 -----------
107 ---------------
113 cgroup is largely composed of two parts - the core and controllers.
129 hierarchical - if a controller is enabled on a cgroup, it affects all
131 sub-hierarchy of the cgroup. When a controller is enabled on a nested
141 --------
146 # mount -t cgroup2 none $MOUNT_POINT
156 is no longer referenced in its current hierarchy. Because per-cgroup
163 to inter-controller dependencies, other controllers may need to be
183 through remount from the init namespace. The mount option is
184 ignored on non-init namespace mounts. Please refer to the
200 modified through remount from the init namespace. The mount
201 option is ignored on non-init namespace mounts.
209 behavior but is a mount-option to avoid regressing setups
223 controller. The pre-allocated pool does not belong to anyone.
243 The option restores v1-like behavior of pids.events:max, that is only
251 --------------------------------
257 A child cgroup can be created by creating a sub-directory::
262 structure. Each cgroup has a read-writable interface file
264 belong to the cgroup one-per-line. The PIDs are not ordered and the
295 0::/test-cgroup/test-cgroup-nested
302 0::/test-cgroup/test-cgroup-nested (deleted)
328 constraint - threaded controllers can be enabled on non-leaf cgroups
352 - As the cgroup will join the parent's resource domain. The parent
355 - When the parent is an unthreaded domain, it must not have any domain
359 Topology-wise, a cgroup can be in an invalid state. Please consider
362 A (threaded domain) - B (threaded) - C (domain, just created)
377 threads in the cgroup. Except that the operations are per-thread
378 instead of per-process, "cgroup.threads" has the same format and
400 between threads in a non-leaf cgroup and its child cgroups. Each
406 - cpu
407 - cpuset
408 - perf_event
409 - pids
412 --------------------------
414 Each non-root cgroup has a "cgroup.events" file which contains
415 "populated" field indicating whether the cgroup's sub-hierarchy has
419 example, to start a clean-up operation after all processes of a given
420 sub-hierarchy have exited. The populated state updates and
421 notifications are recursive. Consider the following sub-hierarchy
425 A(4) - B(0) - C(1)
435 -----------------------
449 # echo "+cpu +memory -io" > cgroup.subtree_control
458 Consider the following sub-hierarchy. The enabled controllers are
461 A(cpu,memory) - B(memory) - C()
475 controller interface files - anything which doesn't start with
479 Top-down Constraint
482 Resources are distributed top-down and a cgroup can further distribute
484 parent. This means that all non-root "cgroup.subtree_control" files
494 Non-root cgroups can distribute domain resources to their children
509 refer to the Non-normative information section in the Controllers
522 ----------
544 delegated, the user can build sub-hierarchy under the directory,
548 happens in the delegated sub-hierarchy, nothing can escape the
552 cgroups in or nesting depth of a delegated sub-hierarchy; however,
559 A delegated sub-hierarchy is contained in the sense that processes
560 can't be moved into or out of the sub-hierarchy by the delegatee.
563 requiring the following conditions for a process with a non-root euid
567 - The writer must have write access to the "cgroup.procs" file.
569 - The writer must have write access to the "cgroup.procs" file of the
573 processes around freely in the delegated sub-hierarchy it can't pull
574 in from or push out to outside the sub-hierarchy.
580 ~~~~~~~~~~~~~ - C0 - C00
583 ~~~~~~~~~~~~~ - C1 - C10
590 will be denied with -EACCES.
595 is not reachable, the migration is rejected with -ENOENT.
599 ----------
607 inherent trade-offs between migration and various hot paths in terms
613 resource structure once on start-up. Dynamic adjustments to resource
614 distribution can be made by changing controller configuration through
646 -------
652 work-conserving. Due to the dynamic nature, this model is usually
667 .. _cgroupv2-limits-distributor:
670 ------
673 Limits can be over-committed - the sum of the limits of children can
678 As limits can be over-committed, all configuration combinations are
685 .. _cgroupv2-protections-distributor:
688 -----------
693 soft boundaries. Protections can also be over-committed in which case
700 As protections can be over-committed, all configuration combinations
704 "memory.low" implements best-effort memory protection and is an
709 -----------
712 resource. Allocations can't be over-committed - the sum of the
719 As allocations can't be over-committed, some configuration
724 "cpu.rt.max" hard-allocates realtime slices and is an example of this
732 ------
737 New-line separated values
745 (when read-only or multiple values can be written at once)
771 -----------
773 - Settings for a single feature should be contained in a single file.
775 - The root cgroup should be exempt from resource control and thus
778 - The default time unit is microseconds. If a different unit is ever
781 - A parts-per quantity should use a percentage decimal with at least
782 two digit fractional part - e.g. 13.40.
784 - If a controller implements weight based resource distribution, its
790 - If a controller implements an absolute resource guarantee and/or
799 - If a setting has a configurable default value and keyed specific
813 # cat cgroup-example-interface-file
819 # echo 125 > cgroup-example-interface-file
823 # echo "default 125" > cgroup-example-interface-file
827 # echo "8:16 170" > cgroup-example-interface-file
831 # echo "8:0 default" > cgroup-example-interface-file
832 # cat cgroup-example-interface-file
836 - For events which are not very high frequency, an interface file
843 --------------------
848 A read-write single value file which exists on non-root
854 - "domain" : A normal valid domain cgroup.
856 - "domain threaded" : A threaded domain cgroup which is
859 - "domain invalid" : A cgroup which is in an invalid state.
863 - "threaded" : A threaded cgroup which is a member of a
870 A read-write new-line separated values file which exists on
874 the cgroup one-per-line. The PIDs are not ordered and the
883 - It must have write access to the "cgroup.procs" file.
885 - It must have write access to the "cgroup.procs" file of the
888 When delegating a sub-hierarchy, write access to this file
896 A read-write new-line separated values file which exists on
900 the cgroup one-per-line. The TIDs are not ordered and the
909 - It must have write access to the "cgroup.threads" file.
911 - The cgroup that the thread is currently in must be in the
914 - It must have write access to the "cgroup.procs" file of the
917 When delegating a sub-hierarchy, write access to this file
921 A read-only space separated values file which exists on all
928 A read-write space separated values file which exists on all
935 Space separated list of controllers prefixed with '+' or '-'
937 name prefixed with '+' enables the controller and '-'
943 A read-only flat-keyed file which exists on non-root cgroups.
955 A read-write single value files. The default is "max".
962 A read-write single value files. The default is "max".
969 A read-only flat-keyed file with the following entries:
995 A read-only flat-keyed file which exists in non-root cgroups.
1013 A read-write single value file which exists on non-root cgroups.
1036 create new sub-cgroups.
1039 A write-only single value file which exists in non-root cgroups.
1051 the whole thread-group.
1054 A read-write single value file that allowed values are "0" and "1".
1058 Writing "1" to the file will re-enable the cgroup PSI accounting.
1062 and doesn't need pass enablement via ancestors from root.
1066 This may cause non-negligible overhead for some workloads when under
1068 be used to disable PSI accounting in the non-leaf cgroups.
1071 A read-write nested-keyed file.
1079 .. _cgroup-v2-cpu:
1082 ---
1113 A read-only flat-keyed file.
1118 - usage_usec
1119 - user_usec
1120 - system_usec
1124 - nr_periods
1125 - nr_throttled
1126 - throttled_usec
1127 - nr_bursts
1128 - burst_usec
1131 A read-write single value file which exists on non-root
1141 A read-write single value file which exists on non-root
1144 The nice value is in the range [-20, 19].
1153 A read-write two value file which exists on non-root cgroups.
1165 A read-write single value file which exists on non-root
1171 A read-write nested-keyed file.
1177 A read-write single value file which exists on non-root cgroups.
1192 A read-write single value file which exists on non-root cgroups.
1203 A read-write single value file which exists on non-root cgroups.
1206 This is the cgroup analog of the per-task SCHED_IDLE sched policy.
1215 ------
1223 While not completely water-tight, all major memory usages by a given
1228 - Userland memory - page cache and anonymous memory.
1230 - Kernel data structures such as dentries and inodes.
1232 - TCP socket buffers.
1245 A read-only single value file which exists on non-root
1252 A read-write single value file which exists on non-root
1278 A read-write single value file which exists on non-root
1281 Best-effort memory protection. If the memory usage of a
1301 A read-write single value file which exists on non-root
1315 A read-write single value file which exists on non-root
1324 In default configuration regular 0-order allocations always
1329 as -ENOMEM or silently ignore in cases like disk readahead.
1332 A write-only nested-keyed file which exists for all cgroups.
1343 specified amount, -EAGAIN is returned.
1364 A read-write single value file which exists on non-root cgroups.
1369 A write of any non-empty string to this file resets it to the
1370 current memory usage for subsequent reads through the same
1374 A read-write single value file which exists on non-root
1384 Tasks with the OOM protection (oom_score_adj set to -1000)
1392 A read-only flat-keyed file which exists on non-root cgroups.
1406 boundary is over-committed.
1426 considered as an option, e.g. for failed high-order
1442 A read-only flat-keyed file which exists on non-root cgroups.
1445 types of memory, type-specific details, and other information
1454 If the entry has no per-node counter (or not show in the
1455 memory.numa_stat). We use 'npn' (non-per-node) as the tag
1483 Amount of memory used for storing per-cpu kernel
1493 Amount of cached filesystem data that is swap-backed,
1530 Amount of memory, swap-backed and filesystem-backed,
1536 the value for the foo counter, since the foo counter is type-based, not
1537 list-based.
1548 Amount of memory used for storing in-kernel data
1626 Number of zero-filled pages swapped out with I/O skipped due to the
1677 A read-only nested-keyed file which exists on non-root cgroups.
1680 types of memory, type-specific details, and other information
1702 A read-only single value file which exists on non-root
1709 A read-write single value file which exists on non-root
1714 allow userspace to implement custom out-of-memory procedures.
1725 A read-write single value file which exists on non-root cgroups.
1730 A write of any non-empty string to this file resets it to the
1731 current memory usage for subsequent reads through the same
1735 A read-write single value file which exists on non-root
1742 A read-only flat-keyed file which exists on non-root cgroups.
1758 because of running out of swap system-wide or max
1767 A read-only single value file which exists on non-root
1774 A read-write single value file which exists on non-root
1782 A read-write single value file. The default value is "1".
1800 A read-only nested-keyed file.
1810 Over-committing on high limit (sum of high limits > available memory)
1824 pressure - how much the workload is being impacted due to lack of
1825 memory - is necessary to determine whether a workload needs more
1839 To which cgroup the area will be charged is in-deterministic; however,
1850 --
1855 only if cfq-iosched is in use and neither scheme is available for
1856 blk-mq devices.
1863 A read-only nested-keyed file.
1883 A read-write nested-keyed file which exists only on the root
1895 enable Weight-based control enable
1927 devices which show wide temporary behavior changes - e.g. a
1938 A read-write nested-keyed file which exists only on the root
1951 model The cost model in use - "linear"
1977 generate device-specific coefficients.
1980 A read-write flat-keyed file which exists on non-root cgroups.
2000 A read-write nested-keyed file which exists on non-root
2014 When writing, any number of nested key-value pairs can be
2039 A read-only nested-keyed file.
2048 Page cache is dirtied through buffered writes and shared mmaps and
2058 writes out dirty pages for the memory domain. Both system-wide and
2059 per-cgroup dirty memory states are examined and the more restrictive
2097 memory controller and system-wide clean memory.
2130 your real setting, setting at 10-15% higher than the value in io.stat.
2140 - Queue depth throttling. This is the number of outstanding IO's a group is
2144 - Artificial delay induction. There are certain types of IO that cannot be
2191 no-change
2194 promote-to-rt
2195 For requests that have a non-RT I/O priority class, change it into RT.
2199 restrict-to-be
2209 none-to-rt
2210 Deprecated. Just an alias for promote-to-rt.
2214 +----------------+---+
2215 | no-change | 0 |
2216 +----------------+---+
2217 | promote-to-rt | 1 |
2218 +----------------+---+
2219 | restrict-to-be | 2 |
2220 +----------------+---+
2222 +----------------+---+
2226 +-------------------------------+---+
2228 +-------------------------------+---+
2229 | IOPRIO_CLASS_RT (real-time) | 1 |
2230 +-------------------------------+---+
2232 +-------------------------------+---+
2234 +-------------------------------+---+
2238 - If I/O priority class policy is promote-to-rt, change the request I/O
2241 - If I/O priority class policy is not promote-to-rt, translate the I/O priority
2247 ---
2266 A read-write single value file which exists on non-root
2272 A read-only single value file which exists on non-root cgroups.
2278 A read-only single value file which exists on non-root cgroups.
2284 A read-only flat-keyed file which exists on non-root cgroups. Unless
2302 through fork() or clone(). These will return -EAGAIN if the creation
2307 ------
2314 memory placement to reduce cross-node memory access and contention
2325 A read-write multiple values file which exists on non-root
2326 cpuset-enabled cgroups.
2333 The CPU numbers are comma-separated numbers or ranges.
2337 0-4,6,8-10
2340 setting as the nearest cgroup ancestor with a non-empty
2347 A read-only multiple values file which exists on all
2348 cpuset-enabled cgroups.
2364 A read-write multiple values file which exists on non-root
2365 cpuset-enabled cgroups.
2372 The memory node numbers are comma-separated numbers or ranges.
2376 0-1,3
2379 setting as the nearest cgroup ancestor with a non-empty
2386 Setting a non-empty value to "cpuset.mems" causes memory of
2398 A read-only multiple values file which exists on all
2399 cpuset-enabled cgroups.
2414 A read-write multiple values file which exists on non-root
2415 cpuset-enabled cgroups.
2448 A read-only multiple values file which exists on all non-root
2449 cpuset-enabled cgroups.
2461 A read-only and root cgroup only multiple values file.
2468 A read-write single value file which exists on non-root
2469 cpuset-enabled cgroups. This flag is owned by the parent cgroup
2475 "member" Non-root member of a partition
2480 A cpuset partition is a collection of cpuset-enabled cgroups with
2487 There are two types of partitions - local and remote. A local
2503 be changed. All other non-root cgroups start out as "member".
2516 two possible states - valid or invalid. An invalid partition
2527 "member" Non-root member of a partition
2554 A valid non-root parent partition may distribute out all its CPUs
2573 A user can pre-configure certain CPUs to an isolated state
2580 -----------------
2591 on the return value the attempt will succeed or fail with -EPERM.
2596 If the program returns 0, the attempt fails with -EPERM, otherwise it
2604 ----
2613 A readwrite nested-keyed file that exists for all the cgroups
2634 A read-only file that describes current resource usage.
2643 -------
2660 A read-only flat-keyed file which exists on non-root cgroups.
2673 use hugetlb pages are included. The per-node values are in bytes.
2676 ----
2698 A read-only flat-keyed file shown only in the root cgroup. It shows
2707 A read-only flat-keyed file shown in the all cgroups. It shows
2715 A read-only flat-keyed file shown in all cgroups. It shows the
2724 A read-write flat-keyed file shown in the non root cgroups. Allowed
2743 A read-only flat-keyed file which exists on non-root cgroups. The
2766 ------
2777 Non-normative information
2778 -------------------------
2794 appropriately so the neutral - nice 0 - value is 100 instead of 1024).
2810 ------
2829 The path '/batchjobs/container_id1' can be considered as system-data
2834 # ls -l /proc/self/ns/cgroup
2835 lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835]
2841 # ls -l /proc/self/ns/cgroup
2842 lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183]
2846 When some thread from a multi-threaded process unshares its cgroup
2858 ------------------
2869 # ~/unshare -c # unshare cgroupns in some cgroup
2877 Each process gets its namespace-specific view of "/proc/$PID/cgroup"
2908 ----------------------
2937 ---------------------------------
2940 running inside a non-init cgroup namespace::
2942 # mount -t cgroup2 none $MOUNT_POINT
2949 the view of cgroup hierarchy by namespace-private cgroupfs mount
2962 --------------------------------
2965 address_space_operations->writepage[s]() to annotate bio's using the
2982 super_block by setting SB_I_CGROUPWB in ->s_iflags. This allows for
2999 - Multiple hierarchies including named ones are not supported.
3001 - All v1 mount options are not supported.
3003 - The "tasks" file is removed and "cgroup.procs" is not sorted.
3005 - "cgroup.clone_children" is removed.
3007 - /proc/cgroups is meaningless for v2. Use "cgroup.controllers" or
3015 --------------------
3068 ------------------
3076 Generally, in-process knowledge is available only to the process
3077 itself; thus, unlike service-level organization of processes,
3084 sub-hierarchies and control resource distributions along them. This
3085 effectively raised cgroup to the status of a syscall-like API exposed
3095 that the process would actually be operating on its own sub-hierarchy.
3099 system-management pseudo filesystem. cgroup ended up with interface
3102 individual applications through the ill-defined delegation mechanism
3104 without going through the required scrutiny.
3112 -------------------------------------------
3123 cycles and the number of internal threads fluctuated - the ratios
3139 clearly defined. There were attempts to add ad-hoc behaviors and
3153 ----------------------
3157 was how an empty cgroup was notified - a userland helper binary was
3160 to in-kernel event delivery filtering mechanism further complicating
3182 ------------------------------
3189 global reclaim prefers is opt-in, rather than opt-out. The costs for
3196 the soft limit reclaim pass is so aggressive that it not just
3199 becomes self-defeating.
3201 The memory.low boundary on the other hand is a top-down allocated
3239 new limit is met - or the task writing to memory.max is killed.
3248 groups can sabotage swapping by other means - such as referencing its
3249 anonymous memory in a tight loop - and an admin can not assume full