Fixes the following W=1 kernel build warning(s):
drivers/infiniband/hw/i40iw/i40iw_hmc.c:64: warning: Function parameter or member 'idx' not described in 'i40iw_find_sd_index_limit'
drivers/infiniband/hw/i40iw/i40iw_hmc.c:64: warning: Excess function parameter 'index' description in 'i40iw_find_sd_index_limit'
drivers/infiniband/hw/i40iw/i40iw_hmc.c:94: warning: Function parameter or member 'pd_idx' not described in 'i40iw_find_pd_index_limit'
drivers/infiniband/hw/i40iw/i40iw_hmc.c:94: warning: Excess function parameter 'pd_index' description in 'i40iw_find_pd_index_limit'
Link: https://lore.kernel.org/r/20210118223929.512175-2-lee.jones@linaro.org
Cc: Faisal Latif <faisal.latif@intel.com>
Cc: Shiraz Saleem <shiraz.saleem@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In commit e28bf1f03b ("RDMA: Convert various random sprintf sysfs _show
uses to sysfs_emit") I mistakenly used len = sysfs_emit_at to overwrite
the last trailing space of potentially multiple entry output.
Instead use a more common style by removing the trailing space from the
output formats and adding a prefixing space to the contination formats and
converting the final terminating output newline from the defective
len = sysfs_emit_at(buf, len, "\n");
to the now appropriate and typical
len += sysfs_emit_at(buf, len, "\n");
Fixes: e28bf1f03b ("RDMA: Convert various random sprintf sysfs _show uses to sysfs_emit")
Link: https://lore.kernel.org/r/5eb794b9c9bca0494d94b2b209f1627fa4e7b555.camel@perches.com
Reported-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This reverts commit fbdd0049d9.
Due to commit in fixes tag, netdevice events were received only in one net
namespace of mlx5_core_dev. Due to this when netdevice events arrive in
net namespace other than net namespace of mlx5_core_dev, they are missed.
This results in empty GID table due to RDMA device being detached from its
net device.
Hence, revert back to receive netdevice events in all net namespaces to
restore back RDMA functionality in non init_net net namespace. The
deadlock will have to be addressed in another patch.
Fixes: fbdd0049d9 ("RDMA/mlx5: Fix devlink deadlock on net namespace deletion")
Link: https://lore.kernel.org/r/20210117092633.10690-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
GFP_KERNEL may cause ida_alloc_range() to sleep, but the spinlock covering
this function is not allowed to sleep, so the spinlock needs to be changed
to mutex.
As there is a certain chance of memory allocation failure, GFP_ATOMIC is
not suitable for QP allocation scenarios.
Fixes: 71586dd200 ("RDMA/hns: Create QP with selected QPN for bank load balance")
Link: https://lore.kernel.org/r/1611048513-28663-1-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The PVRDMA device HW interface defines network_hdr_type according to an
old definition of the internal kernel rdma_network_type enum that has
since changed, resulting in the wrong rdma_network_type being reported.
Fix this by explicitly defining the enum used by the PVRDMA device and
adding a function to convert the pvrdma_network_type to rdma_network_type
enum.
Cc: stable@vger.kernel.org # 5.10+
Fixes: 1c15b4f2a4 ("RDMA/core: Modify enum ib_gid_type and enum rdma_network_type")
Link: https://lore.kernel.org/r/1611026189-17943-1-git-send-email-bryantan@vmware.com
Reviewed-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Leon Romanovsky says:
====================
Be more strict with DEVX get/set operations for the obj_id.
====================
Based on the mlx5-next branch at
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.
* branch 'devx_set_get':
RDMA/mlx5: Use strict get/set operations for obj_id
RDMA/mlx5: Use the correct obj_id upon DEVX TIR creation
net/mlx5: Expose ifc bits for query modify header
In order to improve performance by balancing the load between different
banks of cache, the CQC cache is desigend to choose one of 4 banks
according to lower 2 bits of CQN. The hns driver needs to count the number
of CQ on each bank and then assigns the CQ being created to the bank with
the minimum load first.
Link: https://lore.kernel.org/r/1610008589-35770-1-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In ocrdma_dealloc_ucontext_pd() uctx->cntxt_pd is assigned to the variable
pd and then after uctx->cntxt_pd is freed, the variable pd is passed to
function _ocrdma_dealloc_pd() which dereferences pd directly or through
its call to ocrdma_mbx_dealloc_pd().
Reorder the free using the variable pd.
Cc: stable@vger.kernel.org
Fixes: 21a428a019 ("RDMA: Handle PD allocations by IB/core")
Link: https://lore.kernel.org/r/20201230024653.1516495-1-trix@redhat.com
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This change fixes the checkpatch warning described in
commit cbacb5ab0a ("docs: printk-formats: Stop encouraging use of
unnecessary %h[xudi] and %hh[xudi]")
Standard integer promotion is already done and %hx and %hhx is useless so
do not encourage the use of %hh[xudi] or %h[xudi].
Link: https://lore.kernel.org/r/20201223193041.122850-1-trix@redhat.com
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Pull rdma updates from Jason Gunthorpe:
"A smaller set of patches, nothing stands out as being particularly
major this cycle. The biggest item would be the new HIP09 HW support
from HNS, otherwise it was pretty quiet for new work here:
- Driver bug fixes and updates: bnxt_re, cxgb4, rxe, hns, i40iw,
cxgb4, mlx4 and mlx5
- Bug fixes and polishing for the new rts ULP
- Cleanup of uverbs checking for allowed driver operations
- Use sysfs_emit all over the place
- Lots of bug fixes and clarity improvements for hns
- hip09 support for hns
- NDR and 50/100Gb signaling rates
- Remove dma_virt_ops and go back to using the IB DMA wrappers
- mlx5 optimizations for contiguous DMA regions"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (147 commits)
RDMA/cma: Don't overwrite sgid_attr after device is released
RDMA/mlx5: Fix MR cache memory leak
RDMA/rxe: Use acquire/release for memory ordering
RDMA/hns: Simplify AEQE process for different types of queue
RDMA/hns: Fix inaccurate prints
RDMA/hns: Fix incorrect symbol types
RDMA/hns: Clear redundant variable initialization
RDMA/hns: Fix coding style issues
RDMA/hns: Remove unnecessary access right set during INIT2INIT
RDMA/hns: WARN_ON if get a reserved sl from users
RDMA/hns: Avoid filling sl in high 3 bits of vlan_id
RDMA/hns: Do shift on traffic class when using RoCEv2
RDMA/hns: Normalization the judgment of some features
RDMA/hns: Limit the length of data copied between kernel and userspace
RDMA/mlx4: Remove bogus dev_base_lock usage
RDMA/uverbs: Fix incorrect variable type
RDMA/core: Do not indicate device ready when device enablement fails
RDMA/core: Clean up cq pool mechanism
RDMA/core: Update kernel documentation for ib_create_named_qp()
MAINTAINERS: SOFT-ROCE: Change Zhu Yanjun's email address
...
If the MR cache entry invalidation failed, then we detach this entry from
the cache, therefore we must to free the memory as well.
Allcation backtrace for the leaker:
[<00000000d8e423b0>] alloc_cache_mr+0x23/0xc0 [mlx5_ib]
[<000000001f21304c>] create_cache_mr+0x3f/0xf0 [mlx5_ib]
[<000000009d6b45dc>] mlx5_ib_alloc_implicit_mr+0x41/0×210 [mlx5_ib]
[<00000000879d0d68>] mlx5_ib_reg_user_mr+0x9e/0×6e0 [mlx5_ib]
[<00000000be74bf89>] create_qp+0x2fc/0xf00 [ib_uverbs]
[<000000001a532d22>] ib_uverbs_handler_UVERBS_METHOD_COUNTERS_READ+0x1d9/0×230 [ib_uverbs]
[<0000000070f46001>] rdma_alloc_commit_uobject+0xb5/0×120 [ib_uverbs]
[<000000006d8a0b38>] uverbs_alloc+0x2b/0xf0 [ib_uverbs]
[<00000000075217c9>] ksysioctl+0x234/0×7d0
[<00000000eb5c120b>] __x64_sys_ioctl+0x16/0×20
[<00000000db135b48>] do_syscall_64+0x59/0×2e0
Fixes: 1769c4c575 ("RDMA/mlx5: Always remove MRs from the cache before destroying them")
Link: https://lore.kernel.org/r/20201213132940.345554-2-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
xdp_return_frame_bulk() needs to pass a xdp_buff
to __xdp_return().
strlcpy got converted to strscpy but here it makes no
functional difference, so just keep the right code.
Conflicts:
net/netfilter/nf_tables_api.c
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For ib_copy_from_user(), the length of udata may not be the same as that
of cmd. For ib_copy_to_user(), the length of udata may not be the same as
that of resp. So limit the length to prevent out-of-bounds read and write
operations from ib_copy_from_user() and ib_copy_to_user().
Fixes: de77503a59 ("RDMA/hns: RDMA/hns: Assign rq head pointer when enable rq record db")
Fixes: 633fb4d9fd ("RDMA/hns: Use structs to describe the uABI instead of opencoding")
Fixes: ae85bf92ef ("RDMA/hns: Optimize qp param setup flow")
Fixes: 6fd610c573 ("RDMA/hns: Support 0 hop addressing for SRQ buffer")
Fixes: 9d9d4ff788 ("RDMA/hns: Update the kernel header file of hns")
Link: https://lore.kernel.org/r/1607650657-35992-2-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
It is not clear what this lock protects. If the authors wanted to ensure
that "dev" does not disappear, that is impossible, given the following
code path:
mlx4_ib_netdev_event (under RTNL mutex)
-> mlx4_ib_scan_netdevs
-> mlx4_ib_update_qps
Also, the dev_base_lock does not protect dev->dev_addr either.
So it serves no purpose here. Remove it.
Link: https://lore.kernel.org/r/20201208193928.1500893-1-vladimir.oltean@nxp.com
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Saeed Mahameed says:
====================
mlx5-next auxbus support
This pull request is targeting net-next and rdma-next branches.
This series provides mlx5 support for auxiliary bus devices.
It starts with a merge commit of tag 'auxbus-5.11-rc1' from
gregkh/driver-core into mlx5-next, then the mlx5 patches that will convert
mlx5 ulp devices (netdev, rdma, vdpa) to use the proper auxbus
infrastructure instead of the internal mlx5 device and interface management
implementation, which Leon is deleting at the end of this patchset.
Link: https://lore.kernel.org/alsa-devel/20201026111849.1035786-1-leon@kernel.org/
Thanks to everyone for the joint effort !
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
RDMA/mlx5: Remove IB representors dead code
net/mlx5: Simplify eswitch mode check
net/mlx5: Delete custom device management logic
RDMA/mlx5: Convert mlx5_ib to use auxiliary bus
net/mlx5e: Connect ethernet part to auxiliary bus
vdpa/mlx5: Connect mlx5_vdpa to auxiliary bus
net/mlx5: Register mlx5 devices to auxiliary virtual bus
vdpa/mlx5: Make hardware definitions visible to all mlx5 devices
net/mlx5_core: Clean driver version and name
net/mlx5: Properly convey driver version to firmware
driver core: auxiliary bus: minor coding style tweaks
driver core: auxiliary bus: make remove function return void
driver core: auxiliary bus: move slab.h from include file
Add auxiliary bus support
====================
Link: https://lore.kernel.org/r/20201207053349.402772-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, DM MR registration flow doesn't set the mlx5_ib_dev pointer and
can cause a NULL pointer dereference if userspace dumps the MR via rdma
tool.
Assign the IB device together with the other fields and remove the
redundant reference of mlx5_ib_dev from mlx5_ib_mr.
Cc: stable@vger.kernel.org
Fixes: 6c29f57ea4 ("IB/mlx5: Device memory mr registration support")
Link: https://lore.kernel.org/r/20201203190807.127189-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This is all a giant train wreck of error handling, in many cases the MR is
left in some corrupted state where continuing on is going to lead to
chaos, or various unwinds/order is missed.
rereg had three possible completely different actions, depending on flags
and various details about the MR. Split the three actions into three
functions, and call the right action from the start.
For each action carefully design the error handling to fit the action:
- UMR access/PD update is a simple UMR, if it fails the MR isn't changed,
so do nothing
- PAS update over UMR is multiple UMR operations. To keep everything sane
revoke access to the MKey while it is being changed and restore it once
the MR is correct.
- Recreating the mkey should completely build a parallel MR with a fully
loaded PAS then swap and destroy the old one. If it fails the original
should be left untouched. This is handled in the core code. Directly
call the normal MR creation functions, possibly re-using the existing
umem.
Add support for working with ODP MRs. The READ/WRITE access flags can be
changed by UMR and we can trivially convert to/from ODP MRs using the
logic to build a completely new MR.
This new logic also fixes various problems with MRs continuing to work
while their PAS lists are no longer valid, eg during a page size change.
Link: https://lore.kernel.org/r/20201130075839.278575-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
mlx5 has an ugly flow where it tries to allocate a new MR and replace the
existing MR in the same memory during rereg. This is very complicated and
buggy. Instead of trying to replace in-place inside the driver, provide
support from uverbs to change the entire HW object assigned to a handle
during rereg_mr.
Since destroying a MR is allowed to fail (ie if a MW is pointing at it)
and can't be detected in advance, the algorithm creates a completely new
uobject to hold the new MR and swaps the IDR entries of the two objects.
The old MR in the temporary IDR entry is destroyed, and if it fails
rereg_mr succeeds and destruction is deferred to FD release. This
complexity is why this cannot live in a driver safely.
Link: https://lore.kernel.org/r/20201130075839.278575-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Provide mlx5_core device instead of "priv" pointer while checking
eswith mode.
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
The conversion to auxiliary bus solves long standing issue with
existing mlx5_ib<->mlx5_core coupling. It required to have both
modules in initramfs if one of them needed for the boot.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>