userfaultfd: documentation update
Add documentation about new userfaultfd features and events Link: http://lkml.kernel.org/r/1487716431-5551-1-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit is contained in:
parent
96333187ab
commit
5a02026d39
@ -54,6 +54,26 @@ uffdio_api.features and uffdio_api.ioctls two 64bit bitmasks of
|
|||||||
respectively all the available features of the read(2) protocol and
|
respectively all the available features of the read(2) protocol and
|
||||||
the generic ioctl available.
|
the generic ioctl available.
|
||||||
|
|
||||||
|
The uffdio_api.features bitmask returned by the UFFDIO_API ioctl
|
||||||
|
defines what memory types are supported by the userfaultfd and what
|
||||||
|
events, except page fault notifications, may be generated.
|
||||||
|
|
||||||
|
If the kernel supports registering userfaultfd ranges on hugetlbfs
|
||||||
|
virtual memory areas, UFFD_FEATURE_MISSING_HUGETLBFS will be set in
|
||||||
|
uffdio_api.features. Similarly, UFFD_FEATURE_MISSING_SHMEM will be
|
||||||
|
set if the kernel supports registering userfaultfd ranges on shared
|
||||||
|
memory (covering all shmem APIs, i.e. tmpfs, IPCSHM, /dev/zero
|
||||||
|
MAP_SHARED, memfd_create, etc).
|
||||||
|
|
||||||
|
The userland application that wants to use userfaultfd with hugetlbfs
|
||||||
|
or shared memory need to set the corresponding flag in
|
||||||
|
uffdio_api.features to enable those features.
|
||||||
|
|
||||||
|
If the userland desires to receive notifications for events other than
|
||||||
|
page faults, it has to verify that uffdio_api.features has appropriate
|
||||||
|
UFFD_FEATURE_EVENT_* bits set. These events are described in more
|
||||||
|
detail below in "Non-cooperative userfaultfd" section.
|
||||||
|
|
||||||
Once the userfaultfd has been enabled the UFFDIO_REGISTER ioctl should
|
Once the userfaultfd has been enabled the UFFDIO_REGISTER ioctl should
|
||||||
be invoked (if present in the returned uffdio_api.ioctls bitmask) to
|
be invoked (if present in the returned uffdio_api.ioctls bitmask) to
|
||||||
register a memory range in the userfaultfd by setting the
|
register a memory range in the userfaultfd by setting the
|
||||||
@ -142,3 +162,72 @@ course the bitmap is updated accordingly. It's also useful to avoid
|
|||||||
sending the same page twice (in case the userfault is read by the
|
sending the same page twice (in case the userfault is read by the
|
||||||
postcopy thread just before UFFDIO_COPY|ZEROPAGE runs in the migration
|
postcopy thread just before UFFDIO_COPY|ZEROPAGE runs in the migration
|
||||||
thread).
|
thread).
|
||||||
|
|
||||||
|
== Non-cooperative userfaultfd ==
|
||||||
|
|
||||||
|
When the userfaultfd is monitored by an external manager, the manager
|
||||||
|
must be able to track changes in the process virtual memory
|
||||||
|
layout. Userfaultfd can notify the manager about such changes using
|
||||||
|
the same read(2) protocol as for the page fault notifications. The
|
||||||
|
manager has to explicitly enable these events by setting appropriate
|
||||||
|
bits in uffdio_api.features passed to UFFDIO_API ioctl:
|
||||||
|
|
||||||
|
UFFD_FEATURE_EVENT_EXIT - enable notification about exit() of the
|
||||||
|
non-cooperative process. When the monitored process exits, the uffd
|
||||||
|
manager will get UFFD_EVENT_EXIT.
|
||||||
|
|
||||||
|
UFFD_FEATURE_EVENT_FORK - enable userfaultfd hooks for fork(). When
|
||||||
|
this feature is enabled, the userfaultfd context of the parent process
|
||||||
|
is duplicated into the newly created process. The manager receives
|
||||||
|
UFFD_EVENT_FORK with file descriptor of the new userfaultfd context in
|
||||||
|
the uffd_msg.fork.
|
||||||
|
|
||||||
|
UFFD_FEATURE_EVENT_REMAP - enable notifications about mremap()
|
||||||
|
calls. When the non-cooperative process moves a virtual memory area to
|
||||||
|
a different location, the manager will receive UFFD_EVENT_REMAP. The
|
||||||
|
uffd_msg.remap will contain the old and new addresses of the area and
|
||||||
|
its original length.
|
||||||
|
|
||||||
|
UFFD_FEATURE_EVENT_REMOVE - enable notifications about
|
||||||
|
madvise(MADV_REMOVE) and madvise(MADV_DONTNEED) calls. The event
|
||||||
|
UFFD_EVENT_REMOVE will be generated upon these calls to madvise. The
|
||||||
|
uffd_msg.remove will contain start and end addresses of the removed
|
||||||
|
area.
|
||||||
|
|
||||||
|
UFFD_FEATURE_EVENT_UNMAP - enable notifications about memory
|
||||||
|
unmapping. The manager will get UFFD_EVENT_UNMAP with uffd_msg.remove
|
||||||
|
containing start and end addresses of the unmapped area.
|
||||||
|
|
||||||
|
Although the UFFD_FEATURE_EVENT_REMOVE and UFFD_FEATURE_EVENT_UNMAP
|
||||||
|
are pretty similar, they quite differ in the action expected from the
|
||||||
|
userfaultfd manager. In the former case, the virtual memory is
|
||||||
|
removed, but the area is not, the area remains monitored by the
|
||||||
|
userfaultfd, and if a page fault occurs in that area it will be
|
||||||
|
delivered to the manager. The proper resolution for such page fault is
|
||||||
|
to zeromap the faulting address. However, in the latter case, when an
|
||||||
|
area is unmapped, either explicitly (with munmap() system call), or
|
||||||
|
implicitly (e.g. during mremap()), the area is removed and in turn the
|
||||||
|
userfaultfd context for such area disappears too and the manager will
|
||||||
|
not get further userland page faults from the removed area. Still, the
|
||||||
|
notification is required in order to prevent manager from using
|
||||||
|
UFFDIO_COPY on the unmapped area.
|
||||||
|
|
||||||
|
Unlike userland page faults which have to be synchronous and require
|
||||||
|
explicit or implicit wakeup, all the events are delivered
|
||||||
|
asynchronously and the non-cooperative process resumes execution as
|
||||||
|
soon as manager executes read(). The userfaultfd manager should
|
||||||
|
carefully synchronize calls to UFFDIO_COPY with the events
|
||||||
|
processing. To aid the synchronization, the UFFDIO_COPY ioctl will
|
||||||
|
return -ENOSPC when the monitored process exits at the time of
|
||||||
|
UFFDIO_COPY, and -ENOENT, when the non-cooperative process has changed
|
||||||
|
its virtual memory layout simultaneously with outstanding UFFDIO_COPY
|
||||||
|
operation.
|
||||||
|
|
||||||
|
The current asynchronous model of the event delivery is optimal for
|
||||||
|
single threaded non-cooperative userfaultfd manager implementations. A
|
||||||
|
synchronous event delivery model can be added later as a new
|
||||||
|
userfaultfd feature to facilitate multithreading enhancements of the
|
||||||
|
non cooperative manager, for example to allow UFFDIO_COPY ioctls to
|
||||||
|
run in parallel to the event reception. Single threaded
|
||||||
|
implementations should continue to use the current async event
|
||||||
|
delivery model instead.
|
||||||
|
Loading…
Reference in New Issue
Block a user