For the user MR path, instead of calling this after getting the umem, call
it as part of creating the struct mlx5_ib_mr and distill its output to a
single page_shift stored inside the mr.
This avoids passing around the tuple of its output. Based on the umem and
page_shift, the output arguments can be computed using:
count == ib_umem_num_pages(mr->umem)
shift == mr->page_shift
ncont == ib_umem_num_dma_blocks(mr->umem, 1 << mr->page_shift)
order == order_base_2(ncont)
And since mr->page_shift == umem_odp->page_shift then ncont ==
ib_umem_num_dma_blocks() == ib_umem_odp_num_pages() for ODP umems.
Link: https://lore.kernel.org/r/20201026131936.1335664-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The is only ever set to non-zero if the MR is from the cache, and if it is
cached then the order is in cached_ent->order.
Make it clearer that use_umr_mtt_update() only returns true for cached MRs
and remove the redundant data.
Link: https://lore.kernel.org/r/20201026131936.1335664-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Once a mkey is created it can be modified using UMR. This is desirable for
performance reasons. However, different hardware has restrictions on what
modifications are possible using UMR. Make sense of these checks:
- mlx5_ib_can_reconfig_with_umr() returns true if the access flags can be
altered. Most cases create MRs using 0 access flags (now made clear by
consistent use of set_mkc_access_pd_addr_fields()), but the old logic
here was tormented. Make it clear that this is checking if the current
access_flags can be modified using UMR to different access_flags. It is
always OK to use UMR to change flags that all HW supports.
- mlx5_ib_can_load_pas_with_umr() returns true if UMR can be used to
enable and update the PAS/XLT. Enabling requires updating the entity
size, so UMR ends up completely disabled on this old hardware. Make it
clear why it is disabled. FRWR, ODP and cache always requires
mlx5_ib_can_load_pas_with_umr().
- mlx5_ib_pas_fits_in_mr() is used to tell if an existing MR can be
resized to hold a new PAS list. This only works for cached MR's because
we don't store the PAS list size in other cases.
To be very clear, arrange things so any pre-created MR's in the cache
check the newly requested access_flags before allowing the MR to leave the
cache. If UMR cannot set the required access_flags the cache fails to
create the MR.
This in turn means relaxed ordering and atomic are now correctly blocked
early for implicit ODP on older HW.
Link: https://lore.kernel.org/r/20200914112653.345244-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Like any other verbs objects, CQ shouldn't fail during destroy, but
mlx5_ib didn't follow this contract with mixed IB verbs objects with
DEVX. Such mix causes to the situation where FW and kernel are fully
interdependent on the reference counting of each side.
Kernel verbs and drivers that don't have DEVX flows shouldn't fail.
Fixes: e39afe3d6d ("RDMA: Convert CQ allocations to be under core responsibility")
Link: https://lore.kernel.org/r/20200907120921.476363-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In similar way to other IB objects, restore the ability to return error on
SRQ destroy. Strictly speaking, this change is not necessary, and provided
here to ensure a symmetrical interface like other destroy functions.
Fixes: 68e326dea1 ("RDMA: Handle SRQ allocations by IB/core")
Link: https://lore.kernel.org/r/20200907120921.476363-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Like any other IB verbs objects, AH are refcounted by ib_core. The release
of those objects are controlled by ib_core with promise that AH destroy
can't fail.
Being SW object for now, this change makes dealloc_ah() to behave like any
other destroy IB flows.
Fixes: d345691471 ("RDMA: Handle AH allocations by IB/core")
Link: https://lore.kernel.org/r/20200907120921.476363-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Meir Lichtinger says:
====================
ConnectX-7 supports setting relaxed ordering read/write mkey attribute by
UMR, indicated by new HCA capabilities, so extend mlx5_ib driver to
configure UMR control segment
====================
Based on the mlx5-next branch at
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.
* branch 'mlx5_uar':
RDMA/mlx5: Set mkey relaxed ordering by UMR with ConnectX-7
RDMA/mlx5: Use MLX5_SET macro instead of local structure
RDMA/mlx5: ConnectX-7 new capabilities to set relaxed ordering by UMR
Up to ConnectX-7 UMR is not used when user passes relaxed ordering access
flag. ConnectX-7 supports setting relaxed ordering read/write mkey
attribute by UMR, indicated by new HCA capabilities.
With ConnectX-7 driver uses UMR when user set relaxed ordering access
flag, in contrast to previous silicon models. Specifically it includes
setting relvant flags of mkey context mask in UMR control segment, and
relaxed ordering write and read flags in UMR mkey context segment.
Link: https://lore.kernel.org/r/20200716105248.1423452-4-leon@kernel.org
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
There are number of counters types supported in mlx5_ib: HW counters,
congestion counters, Q-counters and flow counters. Almost all supporting
code was placed in main.c that made almost impossible to maintain the code
anymore. Let's create separate code namespace for the counters to easy
future generalization effort.
Link: https://lore.kernel.org/r/20200702081809.423482-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The mlx5 VF driver doesn't set QP tx port affinity because it doesn't know
if the lag is active or not, since the "lag_active" works only for PF
interfaces. In this case for VF interfaces only one lag is used which
brings performance issue.
Add a lag_tx_port_affinity CAP bit; When it is enabled and
"num_lag_ports > 1", then driver always set QP tx affinity, regardless
of lag state.
Link: https://lore.kernel.org/r/20200527055014.355093-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The patch sets the lag tx affinity of the data QPs and the GSI QPs
according to the LAG xmit slave.
For GSI QPs, in case the link layer is Ethenet (RoCE) we create two GSI
QPs, one for each physical port. When the driver selects the GSI QP, it
will consider the port affinity result. For connected QPs, the driver
sets the affinity of the xmit slave.
The above, ensures that RC QP and it's corresponding GSI QP will transmit
from the same physical port.
Link: https://lore.kernel.org/r/20200430192146.12863-17-maorg@mellanox.com
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Instead of keeping special field to separate kernel/user create/destroy
flows, rely on existence of udata pointer. All allocation flows are
using kzalloc() and leave uninitialized pointers as NULL which makes
MLX5_QP_EMPTY and MLX5_QP_KERNEL flows to be the same.
Link: https://lore.kernel.org/r/20200427154636.381474-25-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
QP type is stored in the IB/core QP struct, but it doesn't have all the
needed information, like internal QP type used in the driver itself.
Update mlx5_ib to have cached QP type which includes both IBTA and
Mellanox specific one.
Such change allows us to make even further cleanup of QP creation flow.
Link: https://lore.kernel.org/r/20200427154636.381474-21-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The mlx5_core doesn't need any functionality coded in qp.c, so move
that file to drivers/infiniband/ be under mlx5_ib responsibility.
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Remove mlx5_ib implementation of Q counter allocation logic
together with cleaning boolean which controlled validity of the
counter. It is not needed, because counter_id == 0 means that
counter is not valid.
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>