Lines Matching +full:over +full:- +full:sampling
9 conventions of cgroup v2. It describes all userland-visible aspects
12 v1 is available under Documentation/admin-guide/cgroup-v1/.
17 1-1. Terminology
18 1-2. What is cgroup?
20 2-1. Mounting
21 2-2. Organizing Processes and Threads
22 2-2-1. Processes
23 2-2-2. Threads
24 2-3. [Un]populated Notification
25 2-4. Controlling Controllers
26 2-4-1. Enabling and Disabling
27 2-4-2. Top-down Constraint
28 2-4-3. No Internal Process Constraint
29 2-5. Delegation
30 2-5-1. Model of Delegation
31 2-5-2. Delegation Containment
32 2-6. Guidelines
33 2-6-1. Organize Once and Control
34 2-6-2. Avoid Name Collisions
36 3-1. Weights
37 3-2. Limits
38 3-3. Protections
39 3-4. Allocations
41 4-1. Format
42 4-2. Conventions
43 4-3. Core Interface Files
45 5-1. CPU
46 5-1-1. CPU Interface Files
47 5-2. Memory
48 5-2-1. Memory Interface Files
49 5-2-2. Usage Guidelines
50 5-2-3. Memory Ownership
51 5-3. IO
52 5-3-1. IO Interface Files
53 5-3-2. Writeback
54 5-3-3. IO Latency
55 5-3-3-1. How IO Latency Throttling Works
56 5-3-3-2. IO Latency Interface Files
57 5-4. PID
58 5-4-1. PID Interface Files
59 5-5. Cpuset
60 5.5-1. Cpuset Interface Files
61 5-6. Device
62 5-7. RDMA
63 5-7-1. RDMA Interface Files
64 5-8. Misc
65 5-8-1. perf_event
66 5-N. Non-normative information
67 5-N-1. CPU controller root cgroup process behaviour
68 5-N-2. IO controller root cgroup process behaviour
70 6-1. Basics
71 6-2. The Root and Views
72 6-3. Migration and setns(2)
73 6-4. Interaction with Other Namespaces
75 P-1. Filesystem Support for Writeback
78 R-1. Multiple Hierarchies
79 R-2. Thread Granularity
80 R-3. Competition Between Inner Nodes and Threads
81 R-4. Other Interface Issues
82 R-5. Controller Issues and Remedies
83 R-5-1. Memory
90 -----------
99 ---------------
105 cgroup is largely composed of two parts - the core and controllers.
121 hierarchical - if a controller is enabled on a cgroup, it affects all
123 sub-hierarchy of the cgroup. When a controller is enabled on a nested
133 --------
138 # mount -t cgroup2 none $MOUNT_POINT
148 is no longer referenced in its current hierarchy. Because per-cgroup
155 to inter-controller dependencies, other controllers may need to be
177 ignored on non-init namespace mounts. Please refer to the
187 option is ignored on non-init namespace mounts.
191 --------------------------------
197 A child cgroup can be created by creating a sub-directory::
202 structure. Each cgroup has a read-writable interface file
204 belong to the cgroup one-per-line. The PIDs are not ordered and the
235 0::/test-cgroup/test-cgroup-nested
242 0::/test-cgroup/test-cgroup-nested (deleted)
268 constraint - threaded controllers can be enabled on non-leaf cgroups
292 - As the cgroup will join the parent's resource domain. The parent
295 - When the parent is an unthreaded domain, it must not have any domain
299 Topology-wise, a cgroup can be in an invalid state. Please consider
302 A (threaded domain) - B (threaded) - C (domain, just created)
317 threads in the cgroup. Except that the operations are per-thread
318 instead of per-process, "cgroup.threads" has the same format and
340 between threads in a non-leaf cgroup and its child cgroups. Each
345 --------------------------
347 Each non-root cgroup has a "cgroup.events" file which contains
348 "populated" field indicating whether the cgroup's sub-hierarchy has
352 example, to start a clean-up operation after all processes of a given
353 sub-hierarchy have exited. The populated state updates and
354 notifications are recursive. Consider the following sub-hierarchy
358 A(4) - B(0) - C(1)
368 -----------------------
382 # echo "+cpu +memory -io" > cgroup.subtree_control
391 Consider the following sub-hierarchy. The enabled controllers are
394 A(cpu,memory) - B(memory) - C()
408 controller interface files - anything which doesn't start with
412 Top-down Constraint
415 Resources are distributed top-down and a cgroup can further distribute
417 parent. This means that all non-root "cgroup.subtree_control" files
427 Non-root cgroups can distribute domain resources to their children
442 refer to the Non-normative information section in the Controllers
455 ----------
475 delegated, the user can build sub-hierarchy under the directory,
479 happens in the delegated sub-hierarchy, nothing can escape the
483 cgroups in or nesting depth of a delegated sub-hierarchy; however,
490 A delegated sub-hierarchy is contained in the sense that processes
491 can't be moved into or out of the sub-hierarchy by the delegatee.
494 requiring the following conditions for a process with a non-root euid
498 - The writer must have write access to the "cgroup.procs" file.
500 - The writer must have write access to the "cgroup.procs" file of the
504 processes around freely in the delegated sub-hierarchy it can't pull
505 in from or push out to outside the sub-hierarchy.
511 ~~~~~~~~~~~~~ - C0 - C00
514 ~~~~~~~~~~~~~ - C1 - C10
521 will be denied with -EACCES.
526 is not reachable, the migration is rejected with -ENOENT.
530 ----------
538 inherent trade-offs between migration and various hot paths in terms
544 resource structure once on start-up. Dynamic adjustments to resource
577 -------
583 work-conserving. Due to the dynamic nature, this model is usually
599 ------
602 Limits can be over-committed - the sum of the limits of children can
607 As limits can be over-committed, all configuration combinations are
616 -----------
621 soft boundaries. Protections can also be over-committed in which case
628 As protections can be over-committed, all configuration combinations
632 "memory.low" implements best-effort memory protection and is an
637 -----------
640 resource. Allocations can't be over-committed - the sum of the
647 As allocations can't be over-committed, some configuration
652 "cpu.rt.max" hard-allocates realtime slices and is an example of this
660 ------
665 New-line separated values
673 (when read-only or multiple values can be written at once)
699 -----------
701 - Settings for a single feature should be contained in a single file.
703 - The root cgroup should be exempt from resource control and thus
708 - The default time unit is microseconds. If a different unit is ever
711 - A parts-per quantity should use a percentage decimal with at least
712 two digit fractional part - e.g. 13.40.
714 - If a controller implements weight based resource distribution, its
720 - If a controller implements an absolute resource guarantee and/or
729 - If a setting has a configurable default value and keyed specific
743 # cat cgroup-example-interface-file
749 # echo 125 > cgroup-example-interface-file
753 # echo "default 125" > cgroup-example-interface-file
757 # echo "8:16 170" > cgroup-example-interface-file
761 # echo "8:0 default" > cgroup-example-interface-file
762 # cat cgroup-example-interface-file
766 - For events which are not very high frequency, an interface file
773 --------------------
779 A read-write single value file which exists on non-root
785 - "domain" : A normal valid domain cgroup.
787 - "domain threaded" : A threaded domain cgroup which is
790 - "domain invalid" : A cgroup which is in an invalid state.
794 - "threaded" : A threaded cgroup which is a member of a
801 A read-write new-line separated values file which exists on
805 the cgroup one-per-line. The PIDs are not ordered and the
814 - It must have write access to the "cgroup.procs" file.
816 - It must have write access to the "cgroup.procs" file of the
819 When delegating a sub-hierarchy, write access to this file
827 A read-write new-line separated values file which exists on
831 the cgroup one-per-line. The TIDs are not ordered and the
840 - It must have write access to the "cgroup.threads" file.
842 - The cgroup that the thread is currently in must be in the
845 - It must have write access to the "cgroup.procs" file of the
848 When delegating a sub-hierarchy, write access to this file
852 A read-only space separated values file which exists on all
859 A read-write space separated values file which exists on all
866 Space separated list of controllers prefixed with '+' or '-'
868 name prefixed with '+' enables the controller and '-'
874 A read-only flat-keyed file which exists on non-root cgroups.
886 A read-write single value files. The default is "max".
893 A read-write single value files. The default is "max".
900 A read-only flat-keyed file with the following entries:
918 A read-write single value file which exists on non-root cgroups.
941 create new sub-cgroups.
947 ---
975 A read-only flat-keyed file which exists on non-root cgroups.
980 - usage_usec
981 - user_usec
982 - system_usec
986 - nr_periods
987 - nr_throttled
988 - throttled_usec
991 A read-write single value file which exists on non-root
997 A read-write single value file which exists on non-root
1000 The nice value is in the range [-20, 19].
1009 A read-write two value file which exists on non-root cgroups.
1021 A read-only nested-key file which exists on non-root cgroups.
1027 A read-write single value file which exists on non-root cgroups.
1042 A read-write single value file which exists on non-root cgroups.
1055 ------
1063 While not completely water-tight, all major memory usages by a given
1068 - Userland memory - page cache and anonymous memory.
1070 - Kernel data structures such as dentries and inodes.
1072 - TCP socket buffers.
1085 A read-only single value file which exists on non-root
1092 A read-write single value file which exists on non-root
1118 A read-write single value file which exists on non-root
1121 Best-effort memory protection. If the memory usage of a
1140 A read-write single value file which exists on non-root
1145 over the high boundary, the processes of the cgroup are
1148 Going over the high limit never invokes the OOM killer and
1152 A read-write single value file which exists on non-root
1158 Under certain circumstances, the usage may go over the limit
1166 A read-write single value file which exists on non-root
1176 Tasks with the OOM protection (oom_score_adj set to -1000)
1184 A read-only flat-keyed file which exists on non-root cgroups.
1198 boundary is over-committed.
1210 about to go over the max boundary. If direct reclaim
1221 userspace as -ENOMEM or silently ignored in cases like
1226 considered as an option, e.g. for failed high-order
1239 A read-only flat-keyed file which exists on non-root cgroups.
1242 types of memory, type-specific details, and other information
1263 Amount of memory used for storing in-kernel data
1270 Amount of cached filesystem data that is swap-backed,
1289 Amount of memory, swap-backed and filesystem-backed,
1360 A read-only single value file which exists on non-root
1367 A read-write single value file which exists on non-root
1374 A read-only flat-keyed file which exists on non-root cgroups.
1381 to go over the max boundary and swap allocation
1386 because of running out of swap system-wide or max
1395 A read-only nested-key file which exists on non-root cgroups.
1405 Over-committing on high limit (sum of high limits > available memory)
1419 pressure - how much the workload is being impacted due to lack of
1420 memory - is necessary to determine whether a workload needs more
1434 To which cgroup the area will be charged is in-deterministic; however,
1435 over time, the memory area is likely to end up in a cgroup which has
1445 --
1450 only if cfq-iosched is in use and neither scheme is available for
1451 blk-mq devices.
1458 A read-only nested-keyed file which exists on non-root
1479 A read-write nested-keyed file with exists only on the root
1491 enable Weight-based control enable
1523 devices which show wide temporary behavior changes - e.g. a
1534 A read-write nested-keyed file with exists only on the root
1547 model The cost model in use - "linear"
1573 generate device-specific coefficients.
1576 A read-write flat-keyed file which exists on non-root cgroups.
1596 A read-write nested-keyed file which exists on non-root
1610 When writing, any number of nested key-value pairs can be
1635 A read-only nested-key file which exists on non-root cgroups.
1654 writes out dirty pages for the memory domain. Both system-wide and
1655 per-cgroup dirty memory states are examined and the more restrictive
1673 cgroup becomes the majority over a certain period of time, switches
1678 changes over time, use cases where multiple cgroups write to a single
1693 memory controller and system-wide clean memory.
1726 your real setting, setting at 10-15% higher than the value in io.stat.
1736 - Queue depth throttling. This is the number of outstanding IO's a group is
1740 - Artificial delay induction. There are certain types of IO that cannot be
1771 bound by the sampling interval. The decay rate interval can be
1776 The sampling window size in milliseconds. This is the minimum
1781 ---
1800 A read-write single value file which exists on non-root
1806 A read-only single value file which exists on all cgroups.
1816 through fork() or clone(). These will return -EAGAIN if the creation
1821 ------
1828 memory placement to reduce cross-node memory access and contention
1839 A read-write multiple values file which exists on non-root
1840 cpuset-enabled cgroups.
1847 The CPU numbers are comma-separated numbers or ranges.
1851 0-4,6,8-10
1854 setting as the nearest cgroup ancestor with a non-empty
1861 A read-only multiple values file which exists on all
1862 cpuset-enabled cgroups.
1878 A read-write multiple values file which exists on non-root
1879 cpuset-enabled cgroups.
1886 The memory node numbers are comma-separated numbers or ranges.
1890 0-1,3
1893 setting as the nearest cgroup ancestor with a non-empty
1901 A read-only multiple values file which exists on all
1902 cpuset-enabled cgroups.
1917 A read-write single value file which exists on non-root
1918 cpuset-enabled cgroups. This flag is owned by the parent cgroup
1923 "root" - a paritition root
1924 "member" - a non-root member of a partition
1965 "member" Non-root member of a partition
1991 -----------------
2002 the attempt will succeed or fail with -EPERM.
2007 If the program returns 0, the attempt fails with -EPERM, otherwise
2015 ----
2024 A readwrite nested-keyed file that exists for all the cgroups
2045 A read-only file that describes current resource usage.
2055 ----
2066 Non-normative information
2067 -------------------------
2083 appropriately so the neutral - nice 0 - value is 100 instead of 1024).
2099 ------
2118 The path '/batchjobs/container_id1' can be considered as system-data
2123 # ls -l /proc/self/ns/cgroup
2124 lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835]
2130 # ls -l /proc/self/ns/cgroup
2131 lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183]
2135 When some thread from a multi-threaded process unshares its cgroup
2147 ------------------
2158 # ~/unshare -c # unshare cgroupns in some cgroup
2166 Each process gets its namespace-specific view of "/proc/$PID/cgroup"
2197 ----------------------
2226 ---------------------------------
2229 running inside a non-init cgroup namespace::
2231 # mount -t cgroup2 none $MOUNT_POINT
2238 the view of cgroup hierarchy by namespace-private cgroupfs mount
2251 --------------------------------
2254 address_space_operations->writepage[s]() to annotate bio's using the
2271 super_block by setting SB_I_CGROUPWB in ->s_iflags. This allows for
2288 - Multiple hierarchies including named ones are not supported.
2290 - All v1 mount options are not supported.
2292 - The "tasks" file is removed and "cgroup.procs" is not sorted.
2294 - "cgroup.clone_children" is removed.
2296 - /proc/cgroups is meaningless for v2. Use "cgroup.controllers" file
2304 --------------------
2357 ------------------
2365 Generally, in-process knowledge is available only to the process
2366 itself; thus, unlike service-level organization of processes,
2373 sub-hierarchies and control resource distributions along them. This
2374 effectively raised cgroup to the status of a syscall-like API exposed
2384 that the process would actually be operating on its own sub-hierarchy.
2388 system-management pseudo filesystem. cgroup ended up with interface
2391 individual applications through the ill-defined delegation mechanism
2401 -------------------------------------------
2412 cycles and the number of internal threads fluctuated - the ratios
2421 control over internal threads, it was with serious drawbacks. It
2428 clearly defined. There were attempts to add ad-hoc behaviors and
2442 ----------------------
2446 was how an empty cgroup was notified - a userland helper binary was
2449 to in-kernel event delivery filtering mechanism further complicating
2471 ------------------------------
2478 global reclaim prefers is opt-in, rather than opt-out. The costs for
2488 becomes self-defeating.
2490 The memory.low boundary on the other hand is a top-down allocated
2528 new limit is met - or the task writing to memory.max is killed.
2531 control over swap space.
2537 groups can sabotage swapping by other means - such as referencing its
2538 anonymous memory in a tight loop - and an admin can not assume full