From Maor Gottlieb
====================
Fix the use of nents and orig_nents in the sg table append helpers. The
nents should be used by the DMA layer to store the number of DMA mapped
sges, the orig_nents is the number of CPU sges.
Since the sg append logic doesn't always create a SGL with exactly
orig_nents entries store a total_nents as well to allow the table to be
properly free'd and reorganize the freeing logic to share across all the
use cases.
====================
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* 'sg_nents':
RDMA: Use the sg_table directly and remove the opencoded version from umem
lib/scatterlist: Fix wrong update of orig_nents
lib/scatterlist: Provide a dedicated function to support table append
1) New index member of struct rxe_queue was introduced but not zeroed so
the initial value of index may be random.
2) The current index is not masked off to index_mask.
In this case producer_addr() and consumer_addr() will get an invalid
address by the random index and then accessing the invalid address
triggers the following panic:
"BUG: unable to handle page fault for address: ffff9ae2c07a1414"
Fix the issue by using kzalloc() to zero out index member.
Fixes: 5bcf5a59c4 ("RDMA/rxe: Protext kernel index from user space")
Link: https://lore.kernel.org/r/20210820111509.172500-1-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
rxe_mcast_add_grp_elem() in rxe_mcast.c calls rxe_alloc() while holding
spinlocks which in turn calls kzalloc(size, GFP_KERNEL) which is
incorrect. This patch replaces rxe_alloc() by rxe_alloc_locked() which
uses GFP_ATOMIC. This bug was caused by the below mentioned commit and
failing to handle the need for the atomic allocate.
Fixes: 4276fd0ddd ("RDMA/rxe: Remove RXE_POOL_ATOMIC")
Link: https://lore.kernel.org/r/20210813210625.4484-1-rpearsonhpe@gmail.com
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
An earlier patch removed setting of tot_len in IPv4 headers because it was
also set in ip_local_out. However, this change resulted in an incorrect
ICRC being computed because the tot_len field is not masked out. This
patch restores that line. This fixes the bug reported by Zhu Yanjun. This
bug affects anyone using rxe which is currently broken.
Fixes: 230bb836ee ("RDMA/rxe: Fix redundant call to ip_send_check")
Link: https://lore.kernel.org/r/20210729220039.18549-3-rpearsonhpe@gmail.com
Reported-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-and-tested-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The memcpy() that copies a WQE from a SRQ the QP uses an incorrect size.
The size should have been the size of the rxe_send_wqe struct not the size
of a pointer to it. The result is that IO operations using a SRQ on the
responder side will fail.
Fixes: ec0fa2445c ("RDMA/rxe: Fix over copying in get_srq_wqe")
Link: https://lore.kernel.org/r/20210729220039.18549-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Currently the ICRC is generated as a u32 type and then forced to a __be32
and stored into the ICRC field in the packet. The actual type of the ICRC
is __be32. This patch replaces u32 by __be32 and eliminates the casts.
The computation is exactly the same as the original but the types are more
consistent.
Link: https://lore.kernel.org/r/20210707040040.15434-10-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Currently get_srq_wqe() in rxe_resp.c copies the maximum possible number
of bytes from the wqe into the QPs copy of the SRQ wqe. This is usually
extra work and risks reading past the end of the SRQ circular buffer if
the SRQ is configured with less than the maximum possible number of SGEs.
Check the number of SGEs is not too large.
Compute the actual number of bytes in the WR and copy only those.
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20210618045742.204195-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
For IPV4 packets sent on the wire the rxe driver calls ip_local_out()
which immediately calls __ip_local_out() which sets iph->tot_len and calls
ip_send_check(). This code is duplicated in prepare4(). On the loopback
path the IP header checksum and tot_len fields are not used so they do not
need to be set.
Remove this redundant code.
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20210618045742.204195-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Linux 5.13-rc7
Needed for dependencies in following patches. Merge conflict in rxe_cmop.c
resolved by compining both patches.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
rxe_mr_init_user() always returns the fixed -EINVAL when ib_umem_get()
fails so it's hard for user to know which actual error happens in
ib_umem_get(). For example, ib_umem_get() will return -EOPNOTSUPP when
trying to pin pages on a DAX file.
Return actual error as mlx4/mlx5 does.
Link: https://lore.kernel.org/r/20210621071456.4259-1-ice_yangxiao@163.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Add support for bind MW work requests from user space. Since rdma/core
does not support bind mw in ib_send_wr there is no way to support bind mw
in kernel space.
Added bind_mw local operation in rxe_req.c. Added bind_mw WR operation in
rxe_opcode.c. Added bind_mw WC in rxe_comp.c. Added additional fields to
rxe_mw in rxe_verbs.h. Added rxe_do_dealloc_mw() subroutine to cleanup an
mw when rxe_dealloc_mw is called. Added code to implement bind_mw
operation in rxe_mw.c
Link: https://lore.kernel.org/r/20210608042552.33275-8-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Simplify rxe_requester() by moving the local operations to a subroutine.
Add an error return for illegal send WR opcode. Moved next_index ahead of
rxe_run_task which fixed a small bug where work completions were delayed
until after the next wqe which was not the intended behavior. Let errors
return their own WC status. Previously all errors were reported as
protection errors which was incorrect. Changed the return of errors from
rxe_do_local_ops() to err: which causes an immediate completion. Without
this an error on a last WR may get lost. Changed fill_packet() to
finish_packet() which is more accurate.
Fixes: 8700e2e7c485 ("The software RoCE driver")
Link: https://lore.kernel.org/r/20210608042552.33275-7-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Currently the rxe driver has a rxe_mw struct object but nothing about
memory windows is enabled. This patch turns on memory windows and some
minor cleanup.
Set device attribute in rxe.c so max_mw = MAX_MW. Change parameters in
rxe_param.h so that MAX_MW is the same as MAX_MR. Reduce the number of
MRs and MWs to 4K from 256K. Add device capability bits for 2a and 2b
memory windows. Removed RXE_MR_TYPE_MW from the rxe_mr_type enum.
Link: https://lore.kernel.org/r/20210608042552.33275-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Currently the rdma_rxe driver attempts to protect atomic responder
resources by taking a reference to the qp which is only freed when the
resource is recycled for a new read or atomic operation. This means that
in normal circumstances there is almost always an extra qp reference once
an atomic operation has been executed which prevents cleaning up the qp
and associated pd and cqs when the qp is destroyed.
This patch removes the call to rxe_add_ref() in send_atomic_ack() and the
call to rxe_drop_ref() in free_rd_atomic_resource(). If the qp is
destroyed while a peer is retrying an atomic op it will cause the
operation to fail which is acceptable.
Link: https://lore.kernel.org/r/20210604230558.4812-1-rpearsonhpe@gmail.com
Reported-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Fixes: 86af617641 ("IB/rxe: remove unnecessary skb_clone")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
To avoid the following failure when trying to load the rdma_rxe module
while IPv6 is disabled, add a check for EAFNOSUPPORT and ignore the
failure, also delete the needless debug print from rxe_setup_udp_tunnel().
$ modprobe rdma_rxe
modprobe: ERROR: could not insert 'rdma_rxe': Operation not permitted
Fixes: dfdd6158ca ("IB/rxe: Fix kernel panic in udp_setup_tunnel")
Link: https://lore.kernel.org/r/20210603090112.36341-1-kamalheib1@gmail.com
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In order to prevent user space from modifying the index that belongs to
the kernel for shared queues let the kernel use a local copy of the index
and copy any new values of that index to the shared rxe_queue_bus struct.
This adds more switch statements which decreases the performance of the
queue API. Move the type into the parameter list for these functions so
that the compiler can optimize out the switch statements when the explicit
type is known. Modify all the calls in the driver on performance paths to
pass in the explicit queue type.
Link: https://lore.kernel.org/r/20210527194748.662636-4-rpearsonhpe@gmail.com
Link: https://lore.kernel.org/linux-rdma/20210526165239.GP1002214@@nvidia.com/
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Modify the queue APIs to protect all user space index loads with
smp_load_acquire() and all user space index stores with
smp_store_release(). Base this on the types of the queues which can be one
of ..KERNEL, ..FROM_USER, ..TO_USER. Kernel space indices are protected by
locks which also provide memory barriers.
Link: https://lore.kernel.org/r/20210527194748.662636-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
To create optimal code only want to use smp_load_acquire() and
smp_store_release() for user indices in rxe_queue APIs since kernel
indices are protected by locks which also act as memory barriers. By
adding a type to the queues we can determine which indices need to be
protected.
Link: https://lore.kernel.org/r/20210527194748.662636-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In the original rxe implementation it was intended to use a common object
to represent MRs and MWs but they are different enough to separate these
into two objects.
This allows replacing the mem name with mr for MRs which is more
consistent with the style for the other objects and less likely to be
confusing. This is a long patch that mostly changes mem to mr where it
makes sense and adds a new rxe_mw struct.
Link: https://lore.kernel.org/r/20210325212425.2792-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>