linux/mm
John Hubbard a43e982082 mm/gup: factor out duplicate code from four routines
Patch series "mm/gup: prereqs to track dma-pinned pages: FOLL_PIN", v12.

Overview:

This is a prerequisite to solving the problem of proper interactions
between file-backed pages, and [R]DMA activities, as discussed in [1],
[2], [3], and in a remarkable number of email threads since about
2017.  :)

A new internal gup flag, FOLL_PIN is introduced, and thoroughly
documented in the last patch's Documentation/vm/pin_user_pages.rst.

I believe that this will provide a good starting point for doing the
layout lease work that Ira Weiny has been working on.  That's because
these new wrapper functions provide a clean, constrained, systematically
named set of functionality that, again, is required in order to even
know if a page is "dma-pinned".

In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup").  In other words, opt-in by
changing from this:

    get_user_pages() (sets FOLL_GET)
    put_page()

to this:
    pin_user_pages() (sets FOLL_PIN)
    unpin_user_page()

Testing:

* I've done some overall kernel testing (LTP, and a few other goodies),
  and some directed testing to exercise some of the changes. And as you
  can see, gup_benchmark is enhanced to exercise this. Basically, I've
  been able to runtime test the core get_user_pages() and
  pin_user_pages() and related routines, but not so much on several of
  the call sites--but those are generally just a couple of lines
  changed, each.

  Not much of the kernel is actually using this, which on one hand
  reduces risk quite a lot. But on the other hand, testing coverage
  is low. So I'd love it if, in particular, the Infiniband and PowerPC
  folks could do a smoke test of this series for me.

  Runtime testing for the call sites so far is pretty light:

    * io_uring: Some directed tests from liburing exercise this, and
                they pass.
    * process_vm_access.c: A small directed test passes.
    * gup_benchmark: the enhanced version hits the new gup.c code, and
                     passes.
    * infiniband: Ran rdma-core tests: rdma-core/build/bin/run_tests.py
    * VFIO: compiles (I'm vowing to set up a run time test soon, but it's
                      not ready just yet)
    * powerpc: it compiles...
    * drm/via: compiles...
    * goldfish: compiles...
    * net/xdp: compiles...
    * media/v4l2: compiles...

[1] Some slow progress on get_user_pages() (Apr 2, 2019): https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018): https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018): https://lwn.net/Articles/753027/

This patch (of 22):

There are four locations in gup.c that have a fair amount of code
duplication.  This means that changing one requires making the same
changes in four places, not to mention reading the same code four times,
and wondering if there are subtle differences.

Factor out the common code into static functions, thus reducing the
overall line count and the code's complexity.

Also, take the opportunity to slightly improve the efficiency of the
error cases, by doing a mass subtraction of the refcount, surrounded by
get_page()/put_page().

Also, further simplify (slightly), by waiting until the the successful
end of each routine, to increment *nr.

Link: http://lkml.kernel.org/r/20200107224558.2362728-2-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:37 -08:00
..
kasan x86/kasan: Print original address on #GP 2019-12-31 13:15:38 +01:00
backing-dev.c memcg: fix a crash in wb_workfn when a device disappears 2020-01-31 10:30:36 -08:00
balloon_compaction.c mm/balloon_compaction: suppress allocation warnings 2019-09-04 07:42:01 -04:00
cleancache.c Driver Core and debugfs changes for 5.3-rc1 2019-07-12 12:24:03 -07:00
cma_debug.c mm/cma_debug.c: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops 2019-12-01 12:59:09 -08:00
cma.c mm/cma.c: switch to bitmap_zalloc() for cma bitmap allocation 2019-12-01 12:59:09 -08:00
cma.h
compaction.c mm, compaction: fix wrong pfn handling in __reset_isolation_pfn() 2019-10-14 15:04:01 -07:00
debug_page_ref.c
debug.c mm/debug.c: always print flags in dump_page() 2020-01-31 10:30:36 -08:00
dmapool.c mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options 2019-07-12 11:05:46 -07:00
early_ioremap.c mm/early_ioremap: Fix boot hang with earlyprintk=efi,keep 2017-12-11 14:54:44 +01:00
fadvise.c fs: Export generic_fadvise() 2019-08-30 22:43:58 -07:00
failslab.c mm/failslab.c: by default, do not fail allocations with direct reclaim only 2019-07-12 11:05:43 -07:00
filemap.c mm/filemap.c: clean up filemap_write_and_wait() 2020-01-31 10:30:37 -08:00
frame_vector.c mm: untag user pointers in get_vaddr_frames 2019-09-25 17:51:41 -07:00
frontswap.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482 2019-06-19 17:09:52 +02:00
gup_benchmark.c mm/gup: fix memory leak in __gup_benchmark_ioctl 2020-01-04 13:55:09 -08:00
gup.c mm/gup: factor out duplicate code from four routines 2020-01-31 10:30:37 -08:00
highmem.c mm, x86/mm: Untangle address space layout definitions from basic pgtable type definitions 2019-12-10 10:12:55 +01:00
hmm.c mm/hmm: remove hmm_range_dma_map and hmm_range_dma_unmap 2019-11-23 19:56:45 -04:00
huge_memory.c mm/huge_memory.c: thp: fix conflict of above-47bit hint address and PMD alignment 2020-01-13 18:19:01 -08:00
hugetlb_cgroup.c mm: hugetlb controller for cgroups v2 2019-12-16 12:41:40 -08:00
hugetlb.c mm/hugetlb: defer freeing of huge pages if in non-task context 2020-01-04 13:55:09 -08:00
hwpoison-inject.c mm/hwpoison-inject: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops 2019-12-01 12:59:09 -08:00
init-mm.c mm/init-mm.c: include <linux/mman.h> for vm_committed_as_batch 2019-10-19 06:32:32 -04:00
internal.h mm, pcpu: make zone pcp updates and reset internal to the mm 2019-12-01 12:59:06 -08:00
interval_tree.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 248 2019-06-19 17:09:08 +02:00
Kconfig mm/Kconfig: fix trivial help text punctuation 2019-12-01 12:59:10 -08:00
Kconfig.debug mm, page_owner, debug_pagealloc: save and dump freeing stack trace 2019-09-24 15:54:08 -07:00
khugepaged.c mm/thp: flush file for !is_shmem PageDirty() case in collapse_file() 2019-12-01 12:59:09 -08:00
kmemleak-test.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333 2019-06-05 17:37:06 +02:00
kmemleak.c mm/kmemleak: turn kmemleak_lock and object->lock to raw_spinlock_t 2020-01-31 10:30:36 -08:00
ksm.c * PPC secure guest support 2019-12-04 11:08:30 -08:00
list_lru.c mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages 2019-07-12 11:05:44 -07:00
maccess.c uaccess: Add strict non-pagefault kernel-space read function 2019-11-02 12:39:12 -07:00
madvise.c mm: make do_madvise() available internally 2020-01-20 17:04:02 -07:00
Makefile mm: Add write-protect and clean utilities for address space ranges 2019-11-06 13:03:36 +01:00
mapping_dirty_helpers.c mm: Add write-protect and clean utilities for address space ranges 2019-11-06 13:03:36 +01:00
memblock.c mm: support memblock alloc on the exact node for sparse_buffer_init() 2019-12-01 12:59:08 -08:00
memcontrol.c mm: thp: don't need care deferred split queue in memcg charge move path 2020-01-31 10:30:36 -08:00
memfd.c mm: page cache: store only head pages in i_pages 2019-09-24 15:54:08 -07:00
memory_hotplug.c mm/memory_hotplug: fix remove_memory() lockdep splat 2020-01-31 10:30:36 -08:00
memory-failure.c mm/memory-failure.c: use page_shift() in add_to_kill() 2019-12-01 12:59:04 -08:00
memory.c Linux 5.5-rc3 2019-12-25 10:41:37 +01:00
mempolicy.c mm/mempolicy.c: fix out of bounds write in mpol_parse_str() 2020-01-31 10:30:36 -08:00
mempool.c docs/core-api/mm: fix return value descriptions in mm/ 2019-03-05 21:07:20 -08:00
memremap.c mm/memory_hotplug: shrink zones when offlining memory 2020-01-04 13:55:08 -08:00
memtest.c
migrate.c mm: move_pages: report the number of non-attempted pages 2020-01-31 10:30:36 -08:00
mincore.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
mlock.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
mm_init.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
mmap.c The timekeeping and timers departement provides: 2020-01-27 16:47:05 -08:00
mmu_context.c
mmu_gather.c mm: remove quicklist page table caches 2019-09-24 15:54:09 -07:00
mmu_notifier.c mm/mmu_notifiers: Use 'interval_sub' as the variable for mmu_interval_notifier 2020-01-14 11:54:47 -04:00
mmzone.c
mprotect.c autonuma: reduce cache footprint when scanning page tables 2019-12-01 12:59:09 -08:00
mremap.c mm/mmap.c: use IS_ERR_VALUE to check return value of get_unmapped_area 2019-12-01 06:29:19 -08:00
msync.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
nommu.c mm/mmap.c: rb_parent is not necessary in __vma_link_list() 2019-12-01 06:29:19 -08:00
oom_kill.c mm/oom: fix pgtables units mismatch in Killed process message 2020-01-04 13:55:09 -08:00
page_alloc.c mm, debug_pagealloc: don't rely on static keys too early 2020-01-13 18:19:02 -08:00
page_counter.c memcg: introduce memory.min 2018-06-07 17:34:36 -07:00
page_ext.c mm, page_owner: fix off-by-one error in __set_page_owner_handle() 2019-10-14 15:04:00 -07:00
page_idle.c mm/page_idle.c: fix oops because end_pfn is larger than max_pfn 2019-06-29 16:43:45 +08:00
page_io.c mm/page_io.c: annotate refault stalls from swap_readpage 2019-12-01 12:59:11 -08:00
page_isolation.c mm/page_isolation.c: convert SKIP_HWPOISON to MEMORY_OFFLINE 2019-12-01 12:59:04 -08:00
page_owner.c mm/page_owner: don't access uninitialized memmaps when reading /proc/pagetypeinfo 2019-10-19 06:32:31 -04:00
page_poison.c mm/page_poison.c: fix a typo in a comment 2019-09-24 15:54:08 -07:00
page_vma_mapped.c mm: introduce page_size() 2019-09-24 15:54:08 -07:00
page-writeback.c mm/page-writeback.c: improve arithmetic divisions 2020-01-13 18:19:02 -08:00
pagewalk.c mm: Add a walk_page_mapping() function to the pagewalk code 2019-11-06 13:02:43 +01:00
percpu-internal.h percpu: convert chunk hints to be based on pcpu_block_md 2019-03-13 12:25:31 -07:00
percpu-km.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu-stats.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu-vm.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu.c bitmap: genericize percpu bitmap region iterators 2020-01-20 16:40:56 +01:00
pgtable-generic.c asm-generic/mm: stub out p{4,u}d_clear_bad() if __PAGETABLE_P{4,U}D_FOLDED 2019-12-01 06:29:19 -08:00
process_vm_access.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
readahead.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
rmap.c mm, thp: do not queue fully unmapped pages for deferred split 2019-12-01 12:59:09 -08:00
rodata_test.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
shmem.c mm/shmem.c: thp, shmem: fix conflict of above-47bit hint address and PMD alignment 2020-01-13 18:19:01 -08:00
shuffle.c mm: fix -Wmissing-prototypes warnings 2019-10-07 15:47:19 -07:00
shuffle.h mm: maintain randomization of page free lists 2019-05-14 19:52:48 -07:00
slab_common.c mm: memcg/slab: call flush_memcg_workqueue() only if memcg workqueue is valid 2020-01-13 18:19:02 -08:00
slab.c mm, debug_pagealloc: don't rely on static keys too early 2020-01-13 18:19:02 -08:00
slab.h mm: clean up and clarify lruvec lookup procedure 2019-12-01 12:59:06 -08:00
slob.c mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two) 2019-10-07 15:47:20 -07:00
slub.c mm/slub.c: avoid slub allocation while holding list_lock 2020-01-31 10:30:36 -08:00
sparse-vmemmap.c mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap() 2019-07-18 17:08:07 -07:00
sparse.c mm/sparse.c: reset section's mem_map when fully deactivated 2020-01-31 10:30:36 -08:00
swap_cgroup.c
swap_slots.c mm, swap, get_swap_pages: use entry_size instead of cluster in parameter 2018-08-22 10:52:44 -07:00
swap_state.c mm: page cache: store only head pages in i_pages 2019-09-24 15:54:08 -07:00
swap.c mm/swap.c: piggyback lru_add_drain_all() calls 2019-12-01 06:29:19 -08:00
swapfile.c mm, swap: disallow swapon() on zoned block devices 2019-12-01 06:29:18 -08:00
truncate.c mm/thp: allow dropping THP from page cache 2019-10-19 06:32:33 -04:00
usercopy.c usercopy: Avoid HIGHMEM pfn warning 2019-09-17 15:20:17 -07:00
userfaultfd.c mm: fix typos in comments when calling __SetPageUptodate() 2019-12-01 12:59:10 -08:00
util.c mm/mmap.c: rb_parent is not necessary in __vma_link_list() 2019-12-01 06:29:19 -08:00
vmacache.c mm: get rid of vmacache_flush_all() entirely 2018-09-13 15:18:04 -10:00
vmalloc.c Linux 5.5-rc7 2020-01-20 08:05:16 +01:00
vmpressure.c mm/vmpressure.c: fix a signedness bug in vmpressure_register_event() 2019-10-07 15:47:19 -07:00
vmscan.c mm: vmscan: protect shrinker idr replace with CONFIG_MEMCG 2019-12-17 20:59:59 -08:00
vmstat.c mm/memcontrol: use vmstat names for printing statistics 2019-12-04 19:44:11 -08:00
workingset.c mm: vmscan: detect file thrashing at the reclaim root 2019-12-01 12:59:07 -08:00
z3fold.c mm/z3fold.c: add inter-page compaction 2019-12-01 12:59:07 -08:00
zbud.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
zpool.c zpool: add malloc_support_movable to zpool_driver 2019-09-24 15:54:12 -07:00
zsmalloc.c mm/zsmalloc.c: fix the migrated zspage statistics. 2020-01-04 13:55:09 -08:00
zswap.c zswap: do not map same object twice 2019-09-24 15:54:12 -07:00