memory.rst - OpenGrok cross reference for /Documentation/admin-guide/cgroup-v1/memory.rst

Lines Matching +full:page +full:- +full:based
18       we call it "memory cgroup". When you see git-log and source code, you'll
30    Memory-hungry applications can be isolated and limited to a smaller
42 Current Status: linux-2.6.34-mmotm(development version of 2010/April)
46  - accounting anonymous pages, file caches, swap caches usage and limiting them.
47  - pages are linked to per-memcg LRU exclusively, and there is no global LRU.
48  - optionally, memory+swap usage can be accounted and limited.
49  - hierarchical accounting
50  - soft limit
51  - moving (recharging) account at moving a task is selectable.
52  - usage threshold notifier
53  - memory pressure notifier
54  - oom-killer disable knob and oom-notifier
55  - Root cgroup has no limit controls.
59  <cgroup-v1-memory-kernel-extension>`)
87  memory.force_empty		     trigger forced page reclaim
138 suggested that we handle both page cache and RSS together. Another request was
140 at version 6; it combines both mapped (RSS) and unmapped Page
162 -----------
170 ---------------
172 .. code-block::
175 		+--------------------+
178 		+--------------------+
181            +---------------+  |        +---------------+
184            +---------------+  |        +---------------+
186                               + --------------+
188            +---------------+           +------+--------+
189            | page          +---------->  page_cgroup|
191            +---------------+           +---------------+
199 3. Each page has a pointer to the page_cgroup, which in turn knows the
206 If everything goes well, a page meta-data-structure called page_cgroup is
208 (*) page_cgroup structure is allocated at boot/memory-hotplug time.
211 ------------------------
213 All mapped anon pages (RSS) and cache pages (Page Cache) are accounted.
218 for earlier. A file page will be accounted for as Page Cache when it's
219 inserted into inode (xarray). While it's mapped into the page tables of
222 An RSS page is unaccounted when it's fully unmapped. A PageCache page is
226 A swapped-in page is accounted after adding into swapcache.
228 Note: The kernel does swapin-readahead and reads multiple swaps at once.
229 Since page's memcg recorded into swap whatever memsw enabled, the page will
232 At page migration, accounting information is kept.
234 Note: we just account pages-on-LRU because our purpose is to control amount
235 of used pages; not-on-LRU pages tend to be out-of-control from VM view.
237 2.3 Shared Page Accounting
238 --------------------------
241 cgroup that first touches a page is accounted for the page. The principle
243 page will eventually get charged for it (once it is uncharged from
244 the cgroup that brought it in -- this will happen on memory pressure).
246 But see :ref:`section 8.2 <cgroup-v1-memory-movable-charges>` when moving a
251 --------------------------------------
258  - memory.memsw.usage_in_bytes.
259  - memory.memsw.limit_in_bytes.
273 The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
282 When a cgroup hits memory.memsw.limit_in_bytes, it's useless to do swap-out
283 in this cgroup. Then, swap-out will not be done by cgroup routine and file
289 -----------
296 cgroup. (See :ref:`10. OOM Control <cgroup-v1-memory-oom-control>` below.)
299 pages that are selected for reclaiming come from the per-cgroup LRU
310 (See :ref:`oom_control <cgroup-v1-memory-oom-control>` section)
313 -----------
318     mm->page_table_lock or split pte_lock
319       folio_memcg_lock (memcg->move_lock)
320         mapping->i_pages lock
321           lruvec->lru_lock.
323 Per-node-per-memcgroup LRU (cgroup's private LRU) is guarded by
324 lruvec->lru_lock; the folio LRU flag is cleared before
325 isolating a page from its LRU under lruvec->lru_lock.
327 .. _cgroup-v1-memory-kernel-extension:
330 -----------------------------------------------
338 it can be disabled system-wide by passing cgroup.memory=nokmem to the kernel
353 -----------------------------------------------
364   skipped while the cache is being created. All objects in a slab page should
366   different memcg during the page allocation by the cache.
377 ----------------------
390     deployments where the total amount of memory per-cgroup is overcommitted.
392     box can still run out of non-reclaimable memory.
415    <cgroups-why-needed>` for the background information)::
417 	# mount -t tmpfs none /sys/fs/cgroup
419 	# mount -t cgroup none /sys/fs/cgroup/memory -o memory
441    We can write "-1" to reset the ``*.limit_in_bytes(unlimited)``.
454 number of factors, such as rounding up to page boundaries or the total
455 availability of memory on the system. The user is required to re-read
477 Page-fault scalability is also important. At measuring parallel
478 page fault test, multi-process test may be better than multi-thread
484 .. _cgroup-v1-memory-test-troubleshoot:
487 -------------------
496 some of the pages cached in the cgroup (page cache pages).
499 <cgroup-v1-memory-oom-control>` (below) and seeing what happens will be
502 .. _cgroup-v1-memory-test-task-migration:
505 ------------------
509 remain charged to it, the charge is dropped when the page is freed or
513 See :ref:`8. "Move charges at task migration" <cgroup-v1-memory-move-charges>`
516 ---------------------
519 <cgroup-v1-memory-test-troubleshoot>` and :ref:`4.2
520 <cgroup-v1-memory-test-task-migration>`, a cgroup might have some charge
535 ---------------
545   charged file caches. Some out-of-use page caches may keep charged until
549 -------------
553   * per-memory cgroup local status
556     cache           # of bytes of page cache memory.
562                     event happens each time a page is accounted as either mapped
563                     anon page(RSS) or cache page(Page Cache) to the cgroup.
565                     event happens each time a page is unaccounted from the
576     inactive_file   # of bytes of file-backed memory and MADV_FREE anonymous
578     active_file     # of bytes of file-backed memory on active LRU list.
619 	mapped_file is accounted only when the memory cgroup is owner of page
623 --------------
634 -----------
646 ------------------
656 -------------
658 This is similar to numa_maps but operates on a per-memcg basis.  This is
665 per-node page counts including "hierarchical_<counter>" which sums up all
700 ---------------------------------------
726 Please note that soft limits is a best-effort feature; it comes with
728 heavily contended for, memory is allocated based on the soft limit
729 hints/setup. Currently soft limit based reclaim is set up such that
733 -------------
752 .. _cgroup-v1-memory-move-charges:
761 cgroups to allow fine-grained policy adjustments without having to
767 page tables.
770 -------------
782       <cgroup-v1-memory-movable-charges>` for details.
785       Charges are moved only when you move mm->owner, in other words,
800 .. _cgroup-v1-memory-movable-charges:
803 --------------------------------------
807 a page or a swap can be moved only when it is charged to the task's current
810 +---+--------------------------------------------------------------------------+
813 | 0 | A charge of an anonymous page (or swap of it) used by the target task.   |
815 +---+--------------------------------------------------------------------------+
819 |   | will be moved even if the task hasn't done page fault, i.e. they might   |
821 |   | The mapcount of the page is ignored (the page can be moved independent   |
824 +---+--------------------------------------------------------------------------+
827 --------
829 - All of moving charge operations are done under cgroup_mutex. It's not good
841 - create an eventfd using eventfd(2);
842 - open memory.usage_in_bytes or memory.memsw.usage_in_bytes;
843 - write string like "<event_fd> <fd of memory.usage_in_bytes> <threshold>" to
849 It's applicable for root and non-root cgroup.
851 .. _cgroup-v1-memory-oom-control:
866  - create an eventfd using eventfd(2)
867  - open memory.oom_control file
868  - write string like "<event_fd> <fd of memory.oom_control>" to
874 You can disable the OOM-killer by writing "1" to memory.oom_control file, as:
878 If OOM-killer is disabled, tasks under cgroup will hang/sleep
879 in memory cgroup's OOM-waitqueue when they request accountable memory.
895 	- oom_kill_disable 0 or 1
896 	  (if 1, oom-killer is disabled)
897 	- under_oom	   0 or 1
899         - oom_kill         integer counter
909 allocation cost; based on the pressure, applications can implement
923 resources that can be easily reconstructed or re-read from a disk.
926 about to out of memory (OOM) or even the in-kernel OOM killer is on its
932 events are not pass-through. For example, you have three cgroups: A->B->C. Now
942  - "default": this is the default behavior specified above. This mode is the
946  - "hierarchy": events always propagate up to the root, similar to the default
951  - "local": events are pass-through, i.e. they only receive notifications when
960 specified by a comma-delimited string, i.e. "low,hierarchy" specifies
961 hierarchical, pass-through, notification for all ancestor memcgs. Notification
962 that is the default, non pass-through behavior, does not specify a mode.
963 "medium,local" specifies pass-through notification for the medium level.
968 - create an eventfd using eventfd(2);
969 - open memory.pressure_level;
970 - write string as "<event_fd> <fd of memory.pressure_level> <level[,mode]>"
992    (Expect a bunch of notifications, and eventually, the oom-killer will
998 1. Make per-cgroup scanner reclaim not-shared pages first
999 2. Teach controller to account for shared-pages
1015 .. [3] Emelianov, Pavel. Resource controllers based on process cgroups
1017 .. [4] Emelianov, Pavel. RSS controller based on process cgroups (v2)
1019 .. [5] Emelianov, Pavel. RSS controller based on process cgroups (v3)
1030     https://lore.kernel.org/r/20070819094658.654.84837.sendpatchset@balbir-laptop
1033    https://lore.kernel.org/r/20070817084228.26003.12568.sendpatchset@balbir-laptop