In the tasklets (completer, responder, and requester) check the return
value from rxe_get() to detect failures to get a reference. This only
occurs if the qp has had its reference count drop to zero which indicates
that it no longer should be used.
The ref is never 0 today because the tasklets are flushed before the ref
is dropped. The next patch changes this so that the ref is dropped then
the tasklets are flushed.
Link: https://lore.kernel.org/r/20220421014042.26985-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Currently the #define IB_SRQ_INIT_MASK is used to distinguish the
rxe_create_srq verb from the rxe_modify_srq verb so that some code can be
shared between these two subroutines.
This commit splits rxe_srq_chk_attr into two subroutines: rxe_srq_chk_init
and rxe_srq_chk_attr which handle the create_srq and modify_srq verbs
separately.
Link: https://lore.kernel.org/r/20220421014042.26985-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
rxe_mcast.c currently uses _irqsave spinlocks for rxe->mcg_lock while
rxe_recv.c uses _bh spinlocks for the same lock.
As there is no case where the mcg_lock can be taken from an IRQ, change
these all to bh locks so we don't have confusing mismatched lock types on
the same spinlock.
Fixes: 6090a0c4c7 ("RDMA/rxe: Cleanup rxe_mcast.c")
Link: https://lore.kernel.org/r/20220504202817.98247-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
These routines were not intended to be called under a spinlock and will
throw debugging warnings:
raw_local_irq_restore() called with IRQs enabled
WARNING: CPU: 13 PID: 3107 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x2f/0x50
CPU: 13 PID: 3107 Comm: python3 Tainted: G E 5.18.0-rc1+ #7
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
RIP: 0010:warn_bogus_irq_restore+0x2f/0x50
Call Trace:
<TASK>
_raw_spin_unlock_irqrestore+0x75/0x80
rxe_attach_mcast+0x304/0x480 [rdma_rxe]
ib_attach_mcast+0x88/0xa0 [ib_core]
ib_uverbs_attach_mcast+0x186/0x1e0 [ib_uverbs]
ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xcd/0x140 [ib_uverbs]
ib_uverbs_cmd_verbs+0xdb0/0xea0 [ib_uverbs]
ib_uverbs_ioctl+0xd2/0x160 [ib_uverbs]
do_syscall_64+0x5c/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Move them out of the spinlock, it is OK if there is some races setting up
the MC reception at the ethernet layer with rbtree lookups.
Fixes: 6090a0c4c7 ("RDMA/rxe: Cleanup rxe_mcast.c")
Link: https://lore.kernel.org/r/20220504202817.98247-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In finish_packet() in rxe_req.c a variable was incorrectly called paylen
instead of payload. Elsewhere in the rxe source payload is always used for
the RoCE payload length and paylen is always used for the UDP payload
length. This will cause unnecessary confusion.
Replace paylen by payload in finish_packet().
Link: https://lore.kernel.org/r/20220420172316.5465-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The rping benchmark fails on long runs. The root cause of this failure has
been traced to a failure to compute a nonzero value of mr in rare
situations.
Fix this failure by correctly handling the computation of mr in
read_reply() in rxe_resp.c in the replay flow.
Fixes: 8a1a0be894 ("RDMA/rxe: Replace mr by rkey in responder resources")
Link: https://lore.kernel.org/r/20220418174103.3040-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The referenced commit generates a reference counting error if the rkey has
the same index but the wrong key. In this case the reference taken by
rxe_pool_get_index() is not dropped.
Drop the reference if the keys don't match in rxe_recheck_mr(). Check
that the mw and mr are still valid.
Fixes: 8a1a0be894 ("RDMA/rxe: Replace mr by rkey in responder resources")
Link: https://lore.kernel.org/r/20220411030647.20011-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Current rxe_requester() doesn't generate a completion when processing an
unsupported/invalid opcode. If rxe driver doesn't support a new opcode
(e.g. RDMA Atomic Write) and RDMA library supports it, an application
using the new opcode can reproduce this issue. Fix the issue by calling
"goto err;".
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20220410113513.27537-1-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Currently the rdma_rxe driver claims to support both 2A and 2B type memory
windows. But the IBA requires
010-37.2.31: If an HCA supports the Base Memory Management
extensions, the HCA shall support either Type 2A or Type 2B MWs,
but not both.
This commit removes the device capability bit for type 2A memory windows
and adds a clarifying comment to rxe_mw.c.
Link: https://lore.kernel.org/r/20220407184321.14207-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Split out flags from ib_device::device_cap_flags that are only used
internally to the kernel into kernel_cap_flags that is not part of the
uapi. This limits the device_cap_flags to being the same bitmap that will
be copied to userspace.
This cleanly splits out the uverbs flags from the kernel flags to avoid
confusion in the flags bitmap.
Add some short comments describing which each of the kernel flags is
connected to. Remove unused kernel flags.
Link: https://lore.kernel.org/r/0-v2-22c19e565eef+139a-kern_caps_jgg@nvidia.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Currently the rxe driver uses red-black trees to add indices to the rxe
object pools. Linux xarrays provide a better way to implement the same
functionality for indices. This patch replaces red-black trees by xarrays
for pool objects. Since xarrays already have a spinlock use that in place
of the pool rwlock. Make sure that all changes in the xarray(index) and
kref(ref counnt) occur atomically.
Link: https://lore.kernel.org/r/20220304000808.225811-9-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
There is only one remaining object type that allocates its own memory,
that is mr. So the sense of RXE_POOL_NO_ALLOC is changed to
RXE_POOL_ALLOC. Add checks to rxe_alloc() and rxe_add_to_pool() to make
sure the correct call is used for the setting of this flag.
Link: https://lore.kernel.org/r/20220304000808.225811-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Currently rxe saves a copy of MR in responder resources for RDMA reads.
Since the responder resources are never freed just over written if more
are needed this MR may not have a reference freed until the QP is
destroyed. This patch uses the rkey instead of the MR and on subsequent
packets of a multipacket read reply message it looks up the MR from the
rkey for each packet. This makes it possible for a user to deregister an
MR or unbind a MW on the fly and get correct behaviour.
Link: https://lore.kernel.org/r/20220304000808.225811-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The commit referenced below can take a reference to the AH which is never
dropped. This only happens in the UD request path. This patch optionally
passes that AH back to the caller so that it can hold the reference while
the AV is being accessed and then drop it. Code to do this is added to
rxe_req.c. The AV is also passed to rxe_prepare in rxe_net.c as an
optimization.
Fixes: e2fe06c908 ("RDMA/rxe: Lookup kernel AH from ah index in UD WQEs")
Link: https://lore.kernel.org/r/20220304000808.225811-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
A previous patch replaced all irqsave locks in rxe with bh locks. This
ran into problems because rdmacm has a bad habit of calling rdma verbs
APIs while disabling irqs. This is not allowed during spin_unlock_bh()
causing programs that use rdmacm to fail. This patch reverts the changes
to locks that had this problem or got dragged into the same mess. After
this patch blktests/check -q srp now runs correctly.
Link: https://lore.kernel.org/r/20220215194448.44369-1-rpearsonhpe@gmail.com
Fixes: 21adfa7a3c ("RDMA/rxe: Replace irqsave locks with bh locks")
Reported-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>