userfaultfd.rst - OpenGrok cross reference for /kernel/linux/linux-6.6/Documentation/admin-guide/mm/userfaultfd.rst

Lines Matching +full:keep +full:- +full:a +full:- +full:live
8 Userfaults allow the implementation of on-demand paging from userland
12 For example userfaults allows a proper and more optimal implementation
18 Userspace creates a new userfaultfd, initializes it, and registers one or more
20 region(s) result in a message being delivered to the userfaultfd, notifying
26 1) ``read/POLLIN`` protocol to notify a userland thread of the faults
38 Vmas are not suitable for page- (or hugepage) granular fault tracking
43 passed using unix domain sockets to a manager process, so the same
44 manager process could handle the userfaults of a multitude of
48 is a corner case that would currently return ``-EBUSY``).
53 Creating a userfaultfd
54 ----------------------
56 There are two ways to create a new userfaultfd, each of which provide ways to
58 handle kernel page faults have been a useful tool for exploiting the kernel).
63 - Any user can always create a userfaultfd which traps userspace page faults
64   only. Such a userfaultfd can be created using the userfaultfd(2) syscall
67 - In order to also trap kernel page faults for the address space, either the
73 /dev/userfaultfd and issuing a USERFAULTFD_IOC_NEW ioctl to it. This method
83 Initializing a userfaultfd
84 --------------------------
87 ``UFFDIO_API`` ioctl specifying a ``uffdio_api.api`` value set to ``UFFD_API`` (or
88 a later API version) which will specify the ``read/POLLIN`` protocol
101 - The ``UFFD_FEATURE_EVENT_*`` flags indicate that various other events
103   detail below in the `Non-cooperative userfaultfd`_ section.
105 - ``UFFD_FEATURE_MISSING_HUGETLBFS`` and ``UFFD_FEATURE_MISSING_SHMEM``
111 - ``UFFD_FEATURE_MINOR_HUGETLBFS`` indicates that the kernel supports
122 bitmask) to register a memory range in the ``userfaultfd`` by setting the
133 memory from the ``userfaultfd`` registered range). This means a userfault
135 user-faulted page.
138 --------------------
142 - ``UFFDIO_COPY`` atomically copies some existing page contents from
145 - ``UFFDIO_ZEROPAGE`` atomically zeros the new page.
147 - ``UFFDIO_CONTINUE`` maps an existing, previously-populated page.
150 see a half-populated page, since readers will keep userfaulting until the
154 They support a ``UFFDIO_*_MODE_DONTWAKE`` ``mode`` flag, which indicates
160 - For ``UFFDIO_REGISTER_MODE_MISSING`` faults, the fault needs to be
161   resolved by either providing a new page (``UFFDIO_COPY``), or mapping
163   the zero page for a missing fault. With userfaultfd, userspace can
166 - For ``UFFDIO_REGISTER_MODE_MINOR`` faults, there is an existing page (in
174 - You can tell which kind of fault occurred by examining
178 - None of the page-delivering ioctls default to the range that you
182 - You get the address of the access that triggered the missing page
183   event out of a struct uffd_msg that you read in the thread from the
185   Keep in mind that unless you used DONTWAKE then the first of any of
188 - Be sure to test for all errors including
193 ---------------------------
195 This is equivalent to (but faster than) using mprotect and a SIGSEGV
198 Firstly you need to register a range with ``UFFDIO_REGISTER_MODE_WP``.
215 which you supply a page and undo write protect.  Note that there is a
216 difference between writes into a WP area and into a !WP area.  The
219 you still need to supply a page when ``UFFDIO_REGISTER_MODE_MISSING`` was
222 Userfaultfd write-protect mode currently behave differently on none ptes
226 (e.g. when pages are missing and not populated).  For file-backed memories
227 like shmem and hugetlbfs, none ptes will be write protected just like a
228 present pte.  In other words, there will be a userfaultfd write fault
229 message generated when writing to a missing page on file typed memories,
230 as long as the page range was write-protected before.  Such a message will
234 memory, one can pre-populate the memory with e.g. MADV_POPULATE_READ.  On
243 write-protected (so future writes will also result in a WP fault). These ioctls
244 support a mode flag (``UFFDIO_COPY_MODE_WP`` or ``UFFDIO_CONTINUE_MODE_WP``
248 ---------------------------
250 In response to a fault (either missing or minor), an action userspace can
251 take to "resolve" it is to issue a ``UFFDIO_POISON``. This will cause any
252 future faulters to either get a SIGBUS, or in KVM's case the guest will
255 This is used to emulate hardware memory poisoning. Imagine a VM running on a
256 machine which experiences a real hardware memory error. Later, we live migrate
259 still poisoned, even though it's on a new physical host which ostensibly
260 doesn't have a memory error in the exact same spot.
265 QEMU/KVM is using the ``userfaultfd`` syscall to implement postcopy live
266 migration. Postcopy live migration is one form of memory
267 externalization consisting of a virtual machine running with part or
268 all of its memory residing on a different node in the cloud. The
269 ``userfaultfd`` abstraction is generic enough that not a single line of
270 KVM kernel code had to be modified in order to add postcopy live
276 aren't waiting for userfaults (i.e. network bound) can keep running in
279 It is generally beneficial to run one pass of precopy live migration
280 just before starting postcopy live migration, in order to avoid
283 The implementation of postcopy live migration currently uses one
292 guest (``UFFDIO_ZEROCOPY`` is used if the source page was a zero page).
294 A different postcopy thread in the destination node listens with
295 poll() to the ``userfaultfd`` in parallel. When a ``POLLIN`` event is
296 generated after a userfault triggers, the postcopy thread read() from
297 the ``userfaultfd`` and receives the fault address (or ``-EAGAIN`` in case the
298 userfault was already resolved and waken by a ``UFFDIO_COPY|ZEROPAGE`` run
311 requested through a userfault).
314 doesn't need to keep any per-page state bitmap relative to the live
315 migration around and a single per-page bitmap has to be maintained in
325 Non-cooperative userfaultfd
344 	non-cooperative process moves a virtual memory area to a
364 ``userfaultfd``, and if a page fault occurs in that area it will be
376 asynchronously and the non-cooperative process resumes execution as
380 return ``-ENOSPC`` when the monitored process exits at the time of
381 ``UFFDIO_COPY``, and ``-ENOENT``, when the non-cooperative process has changed
386 single threaded non-cooperative ``userfaultfd`` manager implementations. A
387 synchronous event delivery model can be added later as a new