linux/drivers/infiniband/core
Ira Weiny 932f4a630a mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM
Pach series "Add FOLL_LONGTERM to GUP fast and use it".

HFI1, qib, and mthca, use get_user_pages_fast() due to its performance
advantages.  These pages can be held for a significant time.  But
get_user_pages_fast() does not protect against mapping FS DAX pages.

Introduce FOLL_LONGTERM and use this flag in get_user_pages_fast() which
retains the performance while also adding the FS DAX checks.  XDP has also
shown interest in using this functionality.[1]

In addition we change get_user_pages() to use the new FOLL_LONGTERM flag
and remove the specialized get_user_pages_longterm call.

[1] https://lkml.org/lkml/2019/3/19/939

"longterm" is a relative thing and at this point is probably a misnomer.
This is really flagging a pin which is going to be given to hardware and
can't move.  I've thought of a couple of alternative names but I think we
have to settle on if we are going to use FL_LAYOUT or something else to
solve the "longterm" problem.  Then I think we can change the flag to a
better name.

Secondly, it depends on how often you are registering memory.  I have
spoken with some RDMA users who consider MR in the performance path...
For the overall application performance.  I don't have the numbers as the
tests for HFI1 were done a long time ago.  But there was a significant
advantage.  Some of which is probably due to the fact that you don't have
to hold mmap_sem.

Finally, architecturally I think it would be good for everyone to use
*_fast.  There are patches submitted to the RDMA list which would allow
the use of *_fast (they reworking the use of mmap_sem) and as soon as they
are accepted I'll submit a patch to convert the RDMA core as well.  Also
to this point others are looking to use *_fast.

As an aside, Jasons pointed out in my previous submission that *_fast and
*_unlocked look very much the same.  I agree and I think further cleanup
will be coming.  But I'm focused on getting the final solution for DAX at
the moment.

This patch (of 7):

This patch starts a series which aims to support FOLL_LONGTERM in
get_user_pages_fast().  Some callers who would like to do a longterm (user
controlled pin) of pages with the fast variant of GUP for performance
purposes.

Rather than have a separate get_user_pages_longterm() call, introduce
FOLL_LONGTERM and change the longterm callers to use it.

This patch does not change any functionality.  In the short term
"longterm" or user controlled pins are unsafe for Filesystems and FS DAX
in particular has been blocked.  However, callers of get_user_pages_fast()
were not "protected".

FOLL_LONGTERM can _only_ be supported with get_user_pages[_fast]() as it
requires vmas to determine if DAX is in use.

NOTE: In merging with the CMA changes we opt to change the
get_user_pages() call in check_and_migrate_cma_pages() to a call of
__get_user_pages_locked() on the newly migrated pages.  This makes the
code read better in that we are calling __get_user_pages_locked() on the
pages before and after a potential migration.

As a side affect some of the interfaces are cleaned up but this is not the
primary purpose of the series.

In review[1] it was asked:

<quote>
> This I don't get - if you do lock down long term mappings performance
> of the actual get_user_pages call shouldn't matter to start with.
>
> What do I miss?

A couple of points.

First "longterm" is a relative thing and at this point is probably a
misnomer.  This is really flagging a pin which is going to be given to
hardware and can't move.  I've thought of a couple of alternative names
but I think we have to settle on if we are going to use FL_LAYOUT or
something else to solve the "longterm" problem.  Then I think we can
change the flag to a better name.

Second, It depends on how often you are registering memory.  I have spoken
with some RDMA users who consider MR in the performance path...  For the
overall application performance.  I don't have the numbers as the tests
for HFI1 were done a long time ago.  But there was a significant
advantage.  Some of which is probably due to the fact that you don't have
to hold mmap_sem.

Finally, architecturally I think it would be good for everyone to use
*_fast.  There are patches submitted to the RDMA list which would allow
the use of *_fast (they reworking the use of mmap_sem) and as soon as they
are accepted I'll submit a patch to convert the RDMA core as well.  Also
to this point others are looking to use *_fast.

As an asside, Jasons pointed out in my previous submission that *_fast and
*_unlocked look very much the same.  I agree and I think further cleanup
will be coming.  But I'm focused on getting the final solution for DAX at
the moment.

</quote>

[1] https://lore.kernel.org/lkml/20190220180255.GA12020@iweiny-DESK2.sc.intel.com/T/#md6abad2569f3bf6c1f03686c8097ab6563e94965

[ira.weiny@intel.com: v3]
  Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-2-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:45 -07:00
..
addr.c 5.2 Merge Window pull request 2019-05-09 09:02:46 -07:00
agent.c RDMA: Mark if destroy address handle is in a sleepable context 2018-12-19 16:28:03 -07:00
agent.h
cache.c IB/core, ipoib: Do not overreact to SM LID change event 2019-05-07 16:06:03 -03:00
cgroup.c IB/core: Simplify rdma cgroup registration 2019-01-18 13:43:10 -07:00
cm_msgs.h RDMA: Use __packed annotation instead of __attribute__ ((packed)) 2019-03-25 21:14:12 -03:00
cm.c IB/cm: Reduce dependency on gid attribute ndev check 2019-05-03 11:10:02 -03:00
cma_configfs.c RDMA/cma: Move cma module specific functions to cma_priv.h 2018-11-22 11:57:33 -07:00
cma_priv.h IB/cma: Define option to set ack timeout and pack tos_set 2019-02-08 16:14:21 -07:00
cma.c RDMA/cma: Use rdma_read_gid_attr_ndev_rcu to access netdev 2019-05-03 11:10:03 -03:00
core_priv.h RDMA/core: Do not invoke init_port on compat devices 2019-05-03 10:41:23 -03:00
cq.c IB: Pass only ib_udata in function prototypes 2019-04-01 15:00:47 -03:00
device.c RDMA/device: Don't fire uevent before device is fully initialized 2019-05-07 13:02:43 -03:00
fmr_pool.c RDMA: Start use ib_device_ops 2018-12-12 07:40:16 -07:00
iwcm.c RDMA: Get rid of iw_cm_verbs 2019-05-03 10:56:56 -03:00
iwcm.h iw_cm: free cm_id resources on the last deref 2016-08-02 13:15:18 -04:00
iwpm_msg.c RDMA/iwpm: move kdoc comments to functions 2019-02-05 15:40:41 -07:00
iwpm_util.c netlink: make validation more configurable for future strictness 2019-04-27 17:07:21 -04:00
iwpm_util.h RDMA/IWPM: Support no port mapping requirements 2019-02-04 16:26:02 -07:00
mad_priv.h RDMA: Use __packed annotation instead of __attribute__ ((packed)) 2019-03-25 21:14:12 -03:00
mad_rmpp.c RDMA: Mark if destroy address handle is in a sleepable context 2018-12-19 16:28:03 -07:00
mad_rmpp.h
mad.c IB/MAD: Add SMP details to MAD tracing 2019-03-27 15:52:01 -03:00
Makefile IB/{core,uverbs}: Move ib_umem_xxx functions from ib_core to ib_uverbs 2019-01-10 17:06:44 -07:00
mr_pool.c IB/core: add a simple MR pool 2016-05-13 13:37:18 -04:00
multicast.c IB/core, ipoib: Do not overreact to SM LID change event 2019-05-07 16:06:03 -03:00
netlink.c RDMA/cma: Remove CM_ID statistics provided by rdma-cm module 2019-02-05 15:30:33 -07:00
nldev.c 5.2 Merge Window pull request 2019-05-09 09:02:46 -07:00
opa_smi.h RDMA: Start use ib_device_ops 2018-12-12 07:40:16 -07:00
packer.c
rdma_core.c uverbs: Convert idr to XArray 2019-04-25 12:27:11 -03:00
rdma_core.h IB: Pass uverbs_attr_bundle down uobject destroy path 2019-04-01 14:55:36 -03:00
restrack.c XArray updates for 5.1-rc1 2019-03-11 20:06:18 -07:00
restrack.h RDMA/restrack: Prepare restrack_root to addition of extra fields per-type 2019-02-19 10:13:38 -07:00
roce_gid_mgmt.c IB/core: Fix oops in netdev_next_upper_dev_rcu() 2018-12-12 12:14:49 -05:00
rw.c IB/core: Remove ib_sg_dma_address() and ib_sg_dma_len() 2019-02-04 14:34:07 -07:00
sa_query.c 5.2 Merge Window pull request 2019-05-09 09:02:46 -07:00
sa.h RDMA/core: Annotate timeout as unsigned long 2018-10-16 13:34:01 -04:00
security.c RDMA/device: Consolidate ib_device per_port data into one place 2019-02-19 10:13:39 -07:00
smi.c
smi.h RDMA: Start use ib_device_ops 2018-12-12 07:40:16 -07:00
sysfs.c RDMA: Add EFA related definitions 2019-05-06 13:47:50 -03:00
ucm.c 5.2 Merge Window pull request 2019-05-09 09:02:46 -07:00
ucma.c *: convert stream-like files from nonseekable_open -> stream_open 2019-05-06 17:46:41 +03:00
ud_header.c
umem_odp.c RDMA/umem: Remove hugetlb flag 2019-05-06 13:08:11 -03:00
umem.c mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM 2019-05-14 09:47:45 -07:00
user_mad.c 5.2 Merge Window pull request 2019-05-09 09:02:46 -07:00
uverbs_cmd.c IB/core: Set qp->real_qp before it may be accessed 2019-05-03 10:17:45 -03:00
uverbs_ioctl.c RDMA/uverbs: Initialize udata struct on destroy flows 2019-05-02 17:07:02 -03:00
uverbs_main.c 5.2 Merge Window pull request 2019-05-09 09:02:46 -07:00
uverbs_marshall.c IB/cm: Replace members of sa_path_rec with 'struct sgid_attr *' 2018-06-25 14:19:57 -06:00
uverbs_std_types_counters.c IB: When attrs.udata/ufile is available use that instead of uobject 2019-04-08 13:05:25 -03:00
uverbs_std_types_cq.c IB: When attrs.udata/ufile is available use that instead of uobject 2019-04-08 13:05:25 -03:00
uverbs_std_types_device.c IB/uverbs: Fix ioctl query port to consider device disassociation 2019-01-25 11:58:06 -07:00
uverbs_std_types_dm.c IB: When attrs.udata/ufile is available use that instead of uobject 2019-04-08 13:05:25 -03:00
uverbs_std_types_flow_action.c IB: When attrs.udata/ufile is available use that instead of uobject 2019-04-08 13:05:25 -03:00
uverbs_std_types_mr.c IB: Pass uverbs_attr_bundle down ib_x destroy path 2019-04-01 14:57:35 -03:00
uverbs_std_types.c IB: Remove 'uobject->context' dependency in object destroy APIs 2019-04-01 14:59:35 -03:00
uverbs_uapi.c IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD 2019-01-29 13:32:43 -07:00
uverbs.h uverbs: Convert idr to XArray 2019-04-25 12:27:11 -03:00
verbs.c RDMA: Add EFA related definitions 2019-05-06 13:47:50 -03:00