linux

Author	SHA1	Message	Date
Weihang Li	fcc57a7b2b	RDMA/core: Use refcount_t instead of atomic_t on refcount of iwpm_admin_data The refcount_t API will WARN on underflow and overflow of a reference counter, and avoid use-after-free risks. Increase refcount_t from 0 to 1 is regarded as there is a risk about use-after-free. So it should be set to 1 directly during initialization. Link: https://lore.kernel.org/r/1622194663-2383-3-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-06-08 14:36:24 -03:00
Weihang Li	60dff56d77	RDMA/core: Use refcount_t instead of atomic_t on refcount of iwcm_id_private The refcount_t API will WARN on underflow and overflow of a reference counter, and avoid use-after-free risks. Link: https://lore.kernel.org/r/1622194663-2383-2-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-06-08 14:35:44 -03:00
Mark Zhang	76039ac909	IB/cm: Protect cm_dev, cm_ports and mad_agent with kref and lock During cm_dev deregistration in cm_remove_one(), the cm_device and cm_ports will be freed, after that they should not be accessed. The mad_agent needs to be protected as well. This patch adds a cm_device kref to protect cm_dev and cm_ports, and a mad_agent_lock spinlock to protect mad_agent. Link: https://lore.kernel.org/r/501ba7a2ff203dccd0e6755d3f93329772adce52.1622629024.git.leonro@nvidia.com Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-06-02 15:41:58 -03:00
Mark Zhang	7345201c39	IB/cm: Improve the calling of cm_init_av_for_lap and cm_init_av_by_path The cm_init_av_for_lap() and cm_init_av_by_path() function calls have the following issues: 1. Both of them might sleep and should not be called under spinlock. 2. The access of cm_id_priv->av should be under cm_id_priv->lock, which means it can't be initialized directly. This patch splits the calling of 2 functions into two parts: first one initializes an AV outside of the spinlock, the second one copies AV to cm_id_priv->av under spinlock. Fixes: `e1444b5a16` ("IB/cm: Fix automatic path migration support") Link: https://lore.kernel.org/r/038fb8ad932869b4548b0c7708cab7f76af06f18.1622629024.git.leonro@nvidia.com Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-06-02 15:41:58 -03:00
Mark Zhang	70076a414e	IB/cm: Simplify ib_cancel_mad() and ib_modify_mad() calls The mad_agent parameter is redundant since the struct ib_mad_send_buf already has a pointer of it. Link: https://lore.kernel.org/r/0987c784b25f7bfa72f78691f50cff066de587e1.1622629024.git.leonro@nvidia.com Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-06-02 15:41:58 -03:00
Mark Zhang	3595c398f6	Revert "IB/cm: Mark stale CM id's whenever the mad agent was unregistered" This reverts commit `9db0ff53cb`, which wasn't a full fix and still causes to the following panic: panic @ time 1605623870.843, thread 0xfffffeb63b552000: vm_fault_lookup: fault on nofault entry, addr: 0xfffffe811a94e000 time = 1605623870 cpuid = 9, TSC = 0xb7937acc1b6 Panic occurred in module kernel loaded at 0xffffffff80200000:Stack: -------------------------------------------------- kernel:vm_fault+0x19da kernel:vm_fault_trap+0x6e kernel:trap_pfault+0x1f1 kernel:trap+0x31e kernel:cm_destroy_id+0x38c kernel:rdma_destroy_id+0x127 kernel:sdp_shutdown_task+0x3ae kernel:taskqueue_run_locked+0x10b kernel:taskqueue_thread_loop+0x87 kernel:fork_exit+0x83 Link: https://lore.kernel.org/r/4346449a7cdacc7a4eedc89cb1b42d8434ec9814.1622629024.git.leonro@nvidia.com Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-06-02 15:41:58 -03:00
Jason Gunthorpe	efafae6717	IB/cm: Tidy remaining cm_msg free paths Now that all the free paths are explicit cm_free_msg() will only be called for msgs's allocated with cm_alloc_msg(), so we can assume the context is set. Place it after the allocation function it is paired with for clarity. Also remove a bogus NULL assignment in one place after a cancel. This does nothing other than disable completions to become events, but changing the state already did that. Link: https://lore.kernel.org/r/082fd3552be0d1a2c19b1c4cefb5f3f0e3e68e82.1622629024.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-06-02 15:41:57 -03:00
Jason Gunthorpe	c1cf6d9f74	IB/cm: Call the correct message free functions in cm_send_handler() There are now three destroy functions for the cm_msg, and all places except the general send completion handler use the correct function. Fix cm_send_handler() to detect which kind of message is being completed and destroy it using the correct function with the correct locking. Link: https://lore.kernel.org/r/62a507195b8db85bb11228d0c6e7fa944204bf12.1622629024.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-06-02 15:41:57 -03:00
Jason Gunthorpe	4b4e586ebe	IB/cm: Split cm_alloc_msg() This is being used with two quite different flows, one attaches the message to the priv and the other does not. Ensure the message attach is consistently done under the spinlock and ensure that the free on error always detaches the message from the cm_id_priv, also always under lock. This makes read/write to the cm_id_priv->msg consistently locked and consistently NULL'd when the message is freed, even in all error paths. Link: https://lore.kernel.org/r/f692b8c89eecb34fd82244f317e478bea6c97688.1622629024.git.leonro@nvidia.com Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-06-02 15:41:57 -03:00
Jason Gunthorpe	96376a4095	IB/cm: Pair cm_alloc_response_msg() with a cm_free_response_msg() This is not a functional change, but it helps make the purpose of all the cm_free_msg() calls clearer. In this case a response msg has a NULL context[0], and is never placed in cm_id_priv->msg. Link: https://lore.kernel.org/r/5cd53163be7df0a94f0d4ef7294546bc674fb74a.1622629024.git.leonro@nvidia.com Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-06-02 15:41:57 -03:00
Leon Romanovsky	f974428872	RDMA/core: Sanitize WQ state received from the userspace The mlx4 and mlx5 implemented differently the WQ input checks. Instead of duplicating mlx4 logic in the mlx5, let's prepare the input in the central place. The mlx5 implementation didn't check for validity of state input. It is not real bug because our FW checked that, but still worth to fix. Fixes: `f213c05272` ("IB/uverbs: Add WQ support") Link: https://lore.kernel.org/r/ac41ad6a81b095b1a8ad453dcf62cf8d3c5da779.1621413310.git.leonro@nvidia.com Reported-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-06-02 15:20:11 -03:00
YueHaibing	c5b8eaf8af	RDMA/core: Use the DEVICE_ATTR_RO macro Use the DEVICE_ATTR_RO() helper instead of plain DEVICE_ATTR(), which makes the code a bit shorter and easier to read. Link: https://lore.kernel.org/r/20210526132949.20184-1-yuehaibing@huawei.com Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-05-28 20:39:51 -03:00
Håkon Bugge	d58c23c925	IB/core: Only update PKEY and GID caches on respective events Both the PKEY and GID tables in an HCA can hold in the order of hundreds entries. Reading them is expensive. Partly because the API for retrieving them only returns a single entry at a time. Further, on certain implementations, e.g., CX-3, the VFs are paravirtualized in this respect and have to rely on the PF driver to perform the read. This again demands VF to PF communication. IB Core's cache is refreshed on all events. Hence, filter the refresh of the PKEY and GID caches based on the event received being IB_EVENT_PKEY_CHANGE and IB_EVENT_GID_CHANGE respectively. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Link: https://lore.kernel.org/r/1621964949-28484-1-git-send-email-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-05-28 20:26:16 -03:00
Leon Romanovsky	16149eddd3	RDMA/core: Remove never used ib_modify_wq function call The function ib_modify_wq() is not used, so remove it. Link: https://lore.kernel.org/r/c5e48d517b9163fe4f9ffd224050b83fdb3571c6.1620552935.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-05-11 14:43:58 -03:00
Xiaofei Tan	e3d65124ce	RDMA/ucma: Cleanup to reduce duplicate code The lable "err1" does the same thing as the branch of copy_to_user() failed in the function ucma_create_id(). Just jump to the label directly to reduce duplicate code. Link: https://lore.kernel.org/r/1620291106-3675-1-git-send-email-tanxiaofei@huawei.com Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-05-11 13:17:26 -03:00
Linus Torvalds	f34b2cf178	RDMA merge window pull request This is significantly bug fixes and general cleanups. The noteworthy new features are fairly small: - XRC support for HNS and improves RQ operations - Bug fixes and updates for hns, mlx5, bnxt_re, hfi1, i40iw, rxe, siw and qib - Quite a few general cleanups on spelling, error handling, static checker detections, etc - Increase the number of device ports supported beyond 255. High port count software switches now exist - Several bug fixes for rtrs - mlx5 Device Memory support for host controlled atomics - Report SRQ tables through to rdma-tool -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmCMMHEACgkQOG33FX4g mxri3Q//RAgIExCGHebQ9xkptZHVyTLLJMpiMl2cqk3ZVRdDZ7QdiQjIqY2KqlUK nxBj7EXJeX6rV5a1xqCcOO1gBetB28TSwnCNE2ZqrXP5B59ISW8D052IWza3UkUz WmHLARxHQlyKBWA4+ZAgfoUGL0NmWA8QPf56t/RK/3/OsuYnGzcnWmmFbt8XKFcH NtO3KC45mKWDqqG0A0XRrLbEQz/ElO3OuPBqlBKgB3ZgGPzgsOUTOGkm1tCcZ89L /pvZGB7SklKZdCX8TxdpVGd9h0zHl8pqh1yEzvTA1ypNAYSUId2mvZXluU8J5yJl FLk7E1IxE5050FNEc7T5uZdUVntulYiqL2558coRI34l5w26pKGjIMxw/nTB8hg8 4ZfBtKVemIG6yzW5Up6iBpK7qWYpvLWVShwYAWhbNsjN7JGzJuh1gJnjbmYgyz2P RTMU9wjFPLL2wZxg4LDHACVJNBb82j6KKuE+kZWpk11ro7INw9+7YwRuTo7/ezxC BwXKu8wF4igwSigV55jM+WnGXLhxdC3qmx/2cbtWyLM/PzdRL96tM0RWW5v8/Nv7 teFhkt+f3RVqcfYH5K1qCXy3UFrxG6bxFSvcHHSBx2bdIrqhuTY5FqszAYImeW2j iHoyIsuSuGu79HQgOzAQZsEyksWi6OYDvA9Q9VBoPP4bJ3DOAa4= =vsXA -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "This is significantly bug fixes and general cleanups. The noteworthy new features are fairly small: - XRC support for HNS and improves RQ operations - Bug fixes and updates for hns, mlx5, bnxt_re, hfi1, i40iw, rxe, siw and qib - Quite a few general cleanups on spelling, error handling, static checker detections, etc - Increase the number of device ports supported beyond 255. High port count software switches now exist - Several bug fixes for rtrs - mlx5 Device Memory support for host controlled atomics - Report SRQ tables through to rdma-tool" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (145 commits) IB/qib: Remove redundant assignment to ret RDMA/nldev: Add copy-on-fork attribute to get sys command RDMA/bnxt_re: Fix a double free in bnxt_qplib_alloc_res RDMA/siw: Fix a use after free in siw_alloc_mr IB/hfi1: Remove redundant variable rcd RDMA/nldev: Add QP numbers to SRQ information RDMA/nldev: Return SRQ information RDMA/restrack: Add support to get resource tracking for SRQ RDMA/nldev: Return context information RDMA/core: Add CM to restrack after successful attachment to a device RDMA/cma: Skip device which doesn't support CM RDMA/rxe: Fix a bug in rxe_fill_ip_info() RDMA/mlx5: Expose private query port RDMA/mlx4: Remove an unused variable RDMA/mlx5: Fix type assignment for ICM DM IB/mlx5: Set right RoCE l3 type and roce version while deleting GID RDMA/i40iw: Fix error unwinding when i40iw_hmc_sd_one fails RDMA/cxgb4: add missing qpid increment IB/ipoib: Remove unnecessary struct declaration RDMA/bnxt_re: Get rid of custom module reference counting ...	2021-05-01 09:15:05 -07:00
Joao Martins	1d4b0166e3	RDMA/umem: batch page unpin in __ib_umem_release() Use the newly added unpin_user_page_range_dirty_lock() for more quickly unpinning a consecutive range of pages represented as compound pages. This will also calculate number of pages to unpin (for the tail pages which matching head page) and thus batch the refcount update. Running a test program which calls memory range reg/unreg on a region 1G in size and measures cost of both operations together (in a guest using rxe) with THP and hugetlbfs: Before: 590 rounds in 5.003 sec: 8480.335 usec / round 6898 rounds in 60.001 sec: 8698.367 usec / round After: 2688 rounds in 5.002 sec: 1860.786 usec / round 32517 rounds in 60.001 sec: 1845.225 usec / round Link: https://lkml.kernel.org/r/20210212130843.13865-5-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Doug Ledford <dledford@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-04-30 11:20:37 -07:00
Gal Pressman	6cc9e215eb	RDMA/nldev: Add copy-on-fork attribute to get sys command The new attribute indicates that the kernel copies DMA pages on fork, hence libibverbs' fork support through madvise and MADV_DONTFORK is not needed. The introduced attribute is always reported as supported since the kernel has the patch that added the copy-on-fork behavior. This allows the userspace library to identify older vs newer kernel versions. Extra care should be taken when backporting this patch as it relies on the fact that the copy-on-fork patch is merged, hence no check for support is added. Don't backport this patch unless you also have the following series: commit `70e806e4e6` ("mm: Do early cow for pinned pages during fork() for ptes") and commit `4eae4efa2c` ("hugetlb: do early cow when page pinned on src mm"). Fixes: `70e806e4e6` ("mm: Do early cow for pinned pages during fork() for ptes") Fixes: `4eae4efa2c` ("hugetlb: do early cow when page pinned on src mm") Link: https://lore.kernel.org/r/20210418121025.66849-1-galpress@amazon.com Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-27 15:33:20 -03:00
Neta Ostrovsky	c6c11ad3ab	RDMA/nldev: Add QP numbers to SRQ information Add QP numbers that are associated with the SRQ to the SRQ information. The QPs are displayed in a range form. Sample output: $ rdma res show srq dev ibp8s0f0 srqn 0 type BASIC pdn 3 comm [ib_ipoib] dev ibp8s0f0 srqn 4 type BASIC lqpn 125-128,130-140 pdn 9 pid 3581 comm ibv_srq_pingpon dev ibp8s0f0 srqn 5 type BASIC lqpn 141-156 pdn 10 pid 3584 comm ibv_srq_pingpon dev ibp8s0f0 srqn 6 type BASIC lqpn 157-172 pdn 11 pid 3590 comm ibv_srq_pingpon dev ibp8s0f1 srqn 0 type BASIC pdn 3 comm [ib_ipoib] dev ibp8s0f1 srqn 1 type BASIC lqpn 329-344 pdn 4 pid 3586 comm ibv_srq_pingpon $ rdma res show srq lqpn 126-141 dev ibp8s0f0 srqn 4 type BASIC lqpn 126-128,130-140 pdn 9 pid 3581 comm ibv_srq_pingpon dev ibp8s0f0 srqn 5 type BASIC lqpn 141 pdn 10 pid 3584 comm ibv_srq_pingpon $ rdma res show srq lqpn 127 dev ibp8s0f0 srqn 4 type BASIC lqpn 127 pdn 9 pid 3581 comm ibv_srq_pingpon Link: https://lore.kernel.org/r/79a4bd4caec2248fd9583cccc26786af8e4414fc.1618753110.git.leonro@nvidia.com Signed-off-by: Neta Ostrovsky <netao@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-22 10:30:27 -03:00
Neta Ostrovsky	391c6bd5ac	RDMA/nldev: Return SRQ information Extend the RDMA nldev return a SRQ information, like SRQ number, SRQ type, PD number, CQ number and process ID that created that SRQ. Sample output: $ rdma res show srq dev ibp8s0f0 srqn 0 type BASIC pdn 3 comm [ib_ipoib] dev ibp8s0f0 srqn 4 type BASIC pdn 9 pid 3581 comm ibv_srq_pingpon dev ibp8s0f0 srqn 5 type BASIC pdn 10 pid 3584 comm ibv_srq_pingpon dev ibp8s0f0 srqn 6 type BASIC pdn 11 pid 3590 comm ibv_srq_pingpon dev ibp8s0f1 srqn 0 type BASIC pdn 3 comm [ib_ipoib] dev ibp8s0f1 srqn 1 type BASIC pdn 4 pid 3586 comm ibv_srq_pingpon Link: https://lore.kernel.org/r/322f9210b95812799190dd4a0fb92f3a3bba0333.1618753110.git.leonro@nvidia.com Signed-off-by: Neta Ostrovsky <netao@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-22 10:30:27 -03:00
Neta Ostrovsky	48f8a70e89	RDMA/restrack: Add support to get resource tracking for SRQ In order to track SRQ resources, a new restrack object is initialized and added to the resource tracking database. Link: https://lore.kernel.org/r/0db71c409f24f2f6b019bf8797a8fed96fe7079c.1618753110.git.leonro@nvidia.com Signed-off-by: Neta Ostrovsky <netao@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-22 10:30:27 -03:00
Neta Ostrovsky	12ce208f40	RDMA/nldev: Return context information Extend the RDMA nldev return a context information, like ctx number and process ID that created that context. This functionality is helpful to find orphan contexts that are not closed for some reason. Sample output: $ rdma res show ctx dev ibp8s0f0 ctxn 0 pid 980 comm ibv_rc_pingpong dev ibp8s0f0 ctxn 1 pid 981 comm ibv_rc_pingpong dev ibp8s0f0 ctxn 2 pid 992 comm ibv_rc_pingpong dev ibp8s0f1 ctxn 0 pid 984 comm ibv_rc_pingpong dev ibp8s0f1 ctxn 1 pid 987 comm ibv_rc_pingpong $ rdma res show ctx dev ibp8s0f1 dev ibp8s0f1 ctxn 0 pid 984 comm ibv_rc_pingpong dev ibp8s0f1 ctxn 1 pid 987 comm ibv_rc_pingpong Link: https://lore.kernel.org/r/5c956acfeac4e9d532988575f3da7d64cb449374.1618753110.git.leonro@nvidia.com Signed-off-by: Neta Ostrovsky <netao@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-22 10:30:27 -03:00
Shay Drory	cb5cd0ea4e	RDMA/core: Add CM to restrack after successful attachment to a device The device attach triggers addition of CM_ID to the restrack DB. However, when error occurs, we releasing this device, but defer CM_ID release. This causes to the situation where restrack sees CM_ID that is not valid anymore. As a solution, add the CM_ID to the resource tracking DB only after the attachment is finished. Found by syzcaller: infiniband syz0: added syz_tun rdma_rxe: ignoring netdev event = 10 for syz_tun infiniband syz0: set down infiniband syz0: ib_query_port failed (-19) restrack: ------------[ cut here ]------------ infiniband syz0: BUG: RESTRACK detected leak of resources restrack: User CM_ID object allocated by syz-executor716 is not freed restrack: ------------[ cut here ]------------ Fixes: `b09c4d7012` ("RDMA/restrack: Improve readability in task name management") Link: https://lore.kernel.org/r/ab93e56ba831eac65c322b3256796fa1589ec0bb.1618753862.git.leonro@nvidia.com Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-21 20:53:14 -03:00
Parav Pandit	4d51c3d9de	RDMA/cma: Skip device which doesn't support CM A switchdev RDMA device do not support IB CM. When such device is added to the RDMA CM's device list, when application invokes rdma_listen(), cma attempts to listen to such device, however it has IB CM attribute disabled. Due to this, rdma_listen() call fails to listen for other non switchdev devices as well. A below error message can be seen. infiniband mlx5_0: RDMA CMA: cma_listen_on_dev, error -38 A failing call flow is below. cma_listen_on_all() cma_listen_on_dev() _cma_attach_to_dev() rdma_listen() <- fails on a specific switchdev device This is because rdma_listen() is hardwired to only work with iwarp or IB CM compatible devices. Hence, when a IB device doesn't support IB CM or IW CM, avoid adding such device to the cma list so rdma_listen() can't even be called. Link: https://lore.kernel.org/r/f9cac00d52864ea7c61295e43fb64cf4db4fdae6.1618753862.git.leonro@nvidia.com Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-21 20:27:52 -03:00
Håkon Bugge	65d4801ae4	RDMA/core: Unify RoCE check and re-factor code In cm_req_handler(), unify the check for RoCE and re-factor to avoid one test. Link: https://lore.kernel.org/r/1617705423-15570-1-git-send-email-haakon.bugge@oracle.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Fixes: `8f97486024` ("IB/cm: Reduce dependency on gid attribute ndev check") Fixes: `194f64a3ca` ("RDMA/core: Fix corrupted SL on passive side") Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-19 12:56:53 -03:00
Håkon Bugge	3aeffc46af	IB/cma: Introduce rdma_set_min_rnr_timer() Introduce the ability for kernel ULPs to adjust the minimum RNR Retry timer. The INIT -> RTR transition executed by RDMA CM will be used for this adjustment. This avoids an additional ib_modify_qp() call. rdma_set_min_rnr_timer() must be called before the call to rdma_connect() on the active side and before the call to rdma_accept() on the passive side. The default value of RNR Retry timer is zero, which translates to 655 ms. When the receiver is not ready to accept a send messages, it encodes the RNR Retry timer value in the NAK. The requestor will then wait at least the specified time value before retrying the send. The 5-bit value to be supplied to the rdma_set_min_rnr_timer() is documented in IBTA Table 45: "Encoding for RNR NAK Timer Field". Link: https://lore.kernel.org/r/1617216194-12890-2-git-send-email-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-12 19:51:48 -03:00
Wenpeng Liang	26caea5fda	RDMA/core: Correct format of block comments Block comments should not use a trailing / on a separate line and every line of a block comment should start with an ''. Link: https://lore.kernel.org/r/1617783353-48249-7-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-12 14:56:51 -03:00
Wenpeng Liang	b6eb7011f5	RDMA/core: Correct format of braces Do following cleanups about braces: - Add the necessary braces to maintain context alignment. - Fix the open '{' that is not on the same line as "switch". - Remove braces that are not necessary for single statement blocks. - Fix "else" that doesn't follow close brace '}'. Link: https://lore.kernel.org/r/1617783353-48249-6-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-12 14:56:51 -03:00
Wenpeng Liang	f681967ae7	RDMA/core: Remove redundant spaces Space is not required after '(', before ')', before ',' and between '*' and symbol name of a definition. Link: https://lore.kernel.org/r/1617783353-48249-5-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-12 14:56:48 -03:00
Wenpeng Liang	9516b8f9ec	RDMA/core: Add necessary spaces Space is required before '(' of switch statements and around '='. Link: https://lore.kernel.org/r/1617783353-48249-4-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-12 14:52:22 -03:00
Wenpeng Liang	9279c35b63	RDMA/core: Remove the redundant return statements The return statements at the end of a void function is meaningless. Link: https://lore.kernel.org/r/1617783353-48249-3-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-12 14:52:22 -03:00
Wenpeng Liang	ab27f45fdf	RDMA/core: Print the function name by __func__ instead of an fixed string It's better to use __func__ than a fixed string to print a function's name. Link: https://lore.kernel.org/r/1617783353-48249-2-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-12 14:52:22 -03:00
Leon Romanovsky	d1c803a9cc	RDMA/addr: Be strict with gid size The nla_len() is less than or equal to 16. If it's less than 16 then end of the "gid" buffer is uninitialized. Fixes: `ae43f82867` ("IB/core: Add IP to GID netlink offload") Link: https://lore.kernel.org/r/20210405074434.264221-1-leon@kernel.org Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-08 16:14:56 -03:00
Yixian Liu	7d8f346504	RDMA/core: Make the wc status prompt message clearer Local invalidate is also a kind of memory management operation, not only memory bind operation. Furthermore, as invalidate operations include local and remote, add prefix to the prompt message to make it clearer. Link: https://lore.kernel.org/r/1617698772-13871-1-git-send-email-liweihang@huawei.com Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-07 20:27:03 -03:00
Håkon Bugge	194f64a3ca	RDMA/core: Fix corrupted SL on passive side On RoCE systems, a CM REQ contains a Primary Hop Limit > 1 and Primary Subnet Local is zero. In cm_req_handler(), the cm_process_routed_req() function is called. Since the Primary Subnet Local value is zero in the request, and since this is RoCE (Primary Local LID is permissive), the following statement will be executed: IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl); This corrupts SL in req_msg if it was different from zero. In other words, a request to setup a connection using an SL != zero, will not be honored, and a connection using SL zero will be created instead. Fixed by not calling cm_process_routed_req() on RoCE systems, the cm_process_route_req() is only for IB anyhow. Fixes: `3971c9f6db` ("IB/cm: Add interim support for routed paths") Link: https://lore.kernel.org/r/1616420132-31005-1-git-send-email-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-01 14:47:24 -03:00
Yangyang Li	016b26af13	RDMA/core: Correct misspellings of two words in comments Correct the following spelling errors: 1. shold -> should 2. uncontext -> ucontext Link: https://lore.kernel.org/r/1616147749-49106-1-git-send-email-liweihang@huawei.com Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-26 11:58:26 -03:00
Patrisious Haddad	49695e95ce	RDMA/uverbs: Refactor rdma_counter_set_auto_mode and __counter_set_mode Success is returned in the following flows: * New mode is the same as the current one. * Switched to new mode and there are no bound counters yet. Link: https://lore.kernel.org/r/20210318110502.673676-1-leon@kernel.org Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-26 11:52:09 -03:00
Mark Bloch	1fb7f8973f	RDMA: Support more than 255 rdma ports Current code uses many different types when dealing with a port of a RDMA device: u8, unsigned int and u32. Switch to u32 to clean up the logic. This allows us to make (at least) the core view consistent and use the same type. Unfortunately not all places can be converted. Many uverbs functions expect port to be u8 so keep those places in order not to break UAPIs. HW/Spec defined values must also not be changed. With the switch to u32 we now can support devices with more than 255 ports. U32_MAX is reserved to make control logic a bit easier to deal with. As a device with U32_MAX ports probably isn't going to happen any time soon this seems like a non issue. When a device with more than 255 ports is created uverbs will report the RDMA device as having 255 ports as this is the max currently supported. The verbs interface is not changed yet because the IBTA spec limits the port size in too many places to be u8 and all applications that relies in verbs won't be able to cope with this change. At this stage, we are extending the interfaces that are using vendor channel solely Once the limitation is lifted mlx5 in switchdev mode will be able to have thousands of SFs created by the device. As the only instance of an RDMA device that reports more than 255 ports will be a representor device and it exposes itself as a RAW Ethernet only device CM/MAD/IPoIB and other ULPs aren't effected by this change and their sysfs/interfaces that are exposes to userspace can remain unchanged. While here cleanup some alignment issues and remove unneeded sanity checks (mainly in rdmavt), Link: https://lore.kernel.org/r/20210301070420.439400-1-leon@kernel.org Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-26 09:31:21 -03:00
Gal Pressman	871159515c	RDMA/cma: Remove unused leftovers in cma code Commit `ee1c60b1bf` ("IB/SA: Modify SA to implicitly cache Class Port info") removed the class_port_info_context struct usage, remove a couple of leftovers. Link: https://lore.kernel.org/r/20210314143427.76101-1-galpress@amazon.com Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-22 09:31:28 -03:00
Yishai Hadas	2904bb37b3	IB/core: Split uverbs_get_const/default to consider target type Change uverbs_get_const/uverbs_get_const_default to work properly with both signed/unsigned parameters. Current APIs mix s64 and u64 which leads to incorrect check when u64 value was supplied and its upper bit was set. In that case uverbs_get_const() / uverbs_get_const_default() lower bound check may fail unexpectedly, target is unsigned (lower bound is 0) but value became negative as of the s64 usage. Split to have two different APIs, no change to callers as the required API will be called internally according to the target type. Link: https://lore.kernel.org/r/20210304130501.1102577-3-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-11 20:20:36 -04:00
Yishai Hadas	3f32dc0f46	IB/core: Drop WARN_ON() from ib_umem_find_best_pgsz() The WARN_ON() issued as part of ib_umem_find_best_pgsz() blocked cases when only page sizes larger than PAGE_SIZE were set, drop it to enable those cases. In addition, there is no need to have a specific check for zero pgsz_bitmap, the function will do its job and return 0 at the end if nothing match will be found. Link: https://lore.kernel.org/r/20210304130501.1102577-2-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-11 20:20:36 -04:00
Jason Gunthorpe	e6fb246cca	RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr() Now that the SRCU stuff has been removed the entire MR destroy logic can be made a lot simpler. Currently there are many different ways to destroy a MR and it makes it really hard to do this task correctly. Route all destruction through mlx5_ib_dereg_mr() and make it work for all situations. Since it turns out all the different MR types do basically the same thing this removes a lot of knowledge of MR internals from ODP and leaves ODP just exporting an operation to clean up children. This fixes a few weird corner cases bugs and firmly uses the correct ordering of the MR destruction: - Stop parallel access to the mkey via the ODP xarray - Stop DMA - Release the umem - Clean up ODP children - Free/Recycle the MR Link: https://lore.kernel.org/r/20210304120745.1090751-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-11 20:03:25 -04:00
Gal Pressman	f675ba125b	RDMA/core: Remove unused req_ncomp_notif device operation The request_ncomp_notif device operation and function are unused, remove them. Link: https://lore.kernel.org/r/20210311150921.23726-1-galpress@amazon.com Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-11 13:46:40 -04:00
Bernard Metzler	e35ecb466e	RDMA/iwcm: Allow AFONLY binding for IPv6 addresses Binding IPv6 address/port to AF_INET6 domain only is provided via rdma_set_afonly(), but was not signalled to the provider. Applications like NFS/RDMA bind the same port to both IPv4 and IPv6 addresses simultaneously and thus rely on it working correctly. Link: https://lore.kernel.org/r/20210219143441.1068-1-bmt@zurich.ibm.com Tested-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-10 15:30:45 -04:00
Leon Romanovsky	cca7f12b93	RDMA/uverbs: Fix kernel-doc warning of _uverbs_alloc Fix the following W=1 compilation warning: drivers/infiniband/core/uverbs_ioctl.c:108: warning: expecting prototype for uverbs_alloc(). Prototype was for _uverbs_alloc() instead Fixes: `461bb2eee4` ("IB/uverbs: Add a simple allocator to uverbs_attr_bundle") Link: https://lore.kernel.org/r/20210302074214.1054299-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-03 13:22:10 -04:00
Saeed Mahameed	221384df61	RDMA/cm: Fix IRQ restore in ib_send_cm_sidr_rep ib_send_cm_sidr_rep() { spin_lock_irqsave() cm_send_sidr_rep_locked() { ... spin_lock_irq() .... spin_unlock_irq() <--- this will enable interrupts } spin_unlock_irqrestore() } spin_unlock_irqrestore() expects interrupts to be disabled but the internal spin_unlock_irq() will always enable hard interrupts. Fix this by replacing the internal spin_{lock,unlock}_irq() with irqsave/restore variants. It fixes the following kernel trace: raw_local_irq_restore() called with IRQs enabled WARNING: CPU: 2 PID: 20001 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x1d/0x20 Call Trace: _raw_spin_unlock_irqrestore+0x4e/0x50 ib_send_cm_sidr_rep+0x3a/0x50 [ib_cm] cma_send_sidr_rep+0xa1/0x160 [rdma_cm] rdma_accept+0x25e/0x350 [rdma_cm] ucma_accept+0x132/0x1cc [rdma_ucm] ucma_write+0xbf/0x140 [rdma_ucm] vfs_write+0xc1/0x340 ksys_write+0xb3/0xe0 do_syscall_64+0x2d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: `87c4c774cb` ("RDMA/cm: Protect access to remote_sidr_table") Link: https://lore.kernel.org/r/20210301081844.445823-1-leon@kernel.org Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-03-01 14:43:16 -04:00
Avihai Horon	fe454dc31e	RDMA/ucma: Fix use-after-free bug in ucma_create_uevent ucma_process_join() allocates struct ucma_multicast mc and frees it if an error occurs during its run. Specifically, if an error occurs in copy_to_user(), a use-after-free might happen in the following scenario: 1. mc struct is allocated. 2. rdma_join_multicast() is called and succeeds. During its run, cma_iboe_join_multicast() enqueues a work that will later use the aforementioned mc struct. 3. copy_to_user() is called and fails. 4. mc struct is deallocated. 5. The work that was enqueued by cma_iboe_join_multicast() is run and calls ucma_create_uevent() which tries to access mc struct (which is freed by now). Fix this bug by cancelling the work enqueued by cma_iboe_join_multicast(). Since cma_work_handler() frees struct cma_work, we don't use it in cma_iboe_join_multicast() so we can safely cancel the work later. The following syzkaller report revealed it: BUG: KASAN: use-after-free in ucma_create_uevent+0x2dd/0x;3f0 drivers/infiniband/core/ucma.c:272 Read of size 8 at addr ffff88810b3ad110 by task kworker/u8:1/108 CPU: 1 PID: 108 Comm: kworker/u8:1 Not tainted 5.10.0-rc6+ #257 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: rdma_cm cma_work_handler Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xbe/0xf9 lib/dump_stack.c:118 print_address_description.constprop.0+0x3e/0×60 mm/kasan/report.c:385 __kasan_report mm/kasan/report.c:545 [inline] kasan_report.cold+0x1f/0×37 mm/kasan/report.c:562 ucma_create_uevent+0x2dd/0×3f0 drivers/infiniband/core/ucma.c:272 ucma_event_handler+0xb7/0×3c0 drivers/infiniband/core/ucma.c:349 cma_cm_event_handler+0x5d/0×1c0 drivers/infiniband/core/cma.c:1977 cma_work_handler+0xfa/0×190 drivers/infiniband/core/cma.c:2718 process_one_work+0x54c/0×930 kernel/workqueue.c:2272 worker_thread+0x82/0×830 kernel/workqueue.c:2418 kthread+0x1ca/0×220 kernel/kthread.c:292 ret_from_fork+0x1f/0×30 arch/x86/entry/entry_64.S:296 Allocated by task 359: kasan_save_stack+0x1b/0×40 mm/kasan/common.c:48 kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc mm/kasan/common.c:461 [inline] __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:434 kmalloc include/linux/slab.h:552 [inline] kzalloc include/linux/slab.h:664 [inline] ucma_process_join+0x16e/0×3f0 drivers/infiniband/core/ucma.c:1453 ucma_join_multicast+0xda/0×140 drivers/infiniband/core/ucma.c:1538 ucma_write+0x1f7/0×280 drivers/infiniband/core/ucma.c:1724 vfs_write fs/read_write.c:603 [inline] vfs_write+0x191/0×4c0 fs/read_write.c:585 ksys_write+0x1a1/0×1e0 fs/read_write.c:658 do_syscall_64+0x2d/0×40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 359: kasan_save_stack+0x1b/0×40 mm/kasan/common.c:48 kasan_set_track+0x1c/0×30 mm/kasan/common.c:56 kasan_set_free_info+0x1b/0×30 mm/kasan/generic.c:355 __kasan_slab_free+0x112/0×160 mm/kasan/common.c:422 slab_free_hook mm/slub.c:1544 [inline] slab_free_freelist_hook mm/slub.c:1577 [inline] slab_free mm/slub.c:3142 [inline] kfree+0xb3/0×3e0 mm/slub.c:4124 ucma_process_join+0x22d/0×3f0 drivers/infiniband/core/ucma.c:1497 ucma_join_multicast+0xda/0×140 drivers/infiniband/core/ucma.c:1538 ucma_write+0x1f7/0×280 drivers/infiniband/core/ucma.c:1724 vfs_write fs/read_write.c:603 [inline] vfs_write+0x191/0×4c0 fs/read_write.c:585 ksys_write+0x1a1/0×1e0 fs/read_write.c:658 do_syscall_64+0x2d/0×40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The buggy address belongs to the object at ffff88810b3ad100 which belongs to the cache kmalloc-192 of size 192 The buggy address is located 16 bytes inside of 192-byte region [ffff88810b3ad100, ffff88810b3ad1c0) Fixes: `b5de0c60cc` ("RDMA/cma: Fix use after free race in roce multicast join") Link: https://lore.kernel.org/r/20210211090517.1278415-1-leon@kernel.org Reported-by: Amit Matityahu <mitm@nvidia.com> Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-02-16 14:42:59 -04:00
Leon Romanovsky	168e4cd949	RDMA/core: Fix kernel doc warnings for ib_port_immutable_read() drivers/infiniband/core/device.c:859: warning: Function parameter or member 'dev' not described in 'ib_port_immutable_read' drivers/infiniband/core/device.c:859: warning: Function parameter or member 'port' not described in 'ib_port_immutable_read' Fixes: `7416790e22` ("RDMA/core: Introduce and use API to read port immutable data") Link: https://lore.kernel.org/r/20210210151421.1108809-1-leon@kernel.org Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-02-16 14:42:58 -04:00
Christoph Lameter	633d610212	RDMA/ipoib: Remove racy Subnet Manager sendonly join checks When a system receives a REREG event from the SM, then the SM information in the kernel is marked as invalid and a request is sent to the SM to update the information. The SM information is invalid in that time period. However, receiving a REREG also occurs simultaneously in user space applications that are now trying to rejoin the multicast groups. Some of those may be sendonly multicast groups which are then failing. If the SM information is invalid then ib_sa_sendonly_fullmem_support() returns false. That is wrong because it just means that we do not know yet if the potentially new SM supports sendonly joins. Sendonly join was introduced in 2015 and all the Subnet managers have supported it ever since. So there is no point in checking if a subnet manager supports it. Should an old opensm get a request for a sendonly join then the request will fail. The code that is removed here accomodated that situation and fell back to a full join. Falling back to a full join is problematic in itself. The reason to use the sendonly join was to reduce the traffic on the Infiniband fabric otherwise one could have just stayed with the regular join. So this patch may cause users of very old opensms to discover that lots of traffic needlessly crosses their IB fabrics. Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2101281845160.13303@www.lameter.com Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-02-16 14:42:58 -04:00
Parav Pandit	7416790e22	RDMA/core: Introduce and use API to read port immutable data Currently mlx5 driver caches port GID table length for 2 ports. It is also cached by IB core as port immutable data. When mlx5 representor ports are present, which are usually more than 2, invalid access to port_caps array can happen while validating the GID table length which is only for 2 ports. To avoid this, take help of the IB cores port immutable data by exposing an API to read the port immutable fields. Remove mlx5 driver's internal cache, thereby reduce code and data. Link: https://lore.kernel.org/r/20210203130133.4057329-5-leon@kernel.org Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-02-05 12:06:01 -04:00

1 2 3 4 5 ...

2877 Commits