Commit Graph

12884 Commits

Author SHA1 Message Date
Mark Bloch
c446d9da64 RDMA/mlx5: Add shared FDB support
Shared FDB allows to create a single RDMA device that holds representors
from both eswitches. As shared FDB is only active when both uplink
representors are enslaved there is a single RDMA port that represents
both uplinks.

The number of ports is the number of vports on both eswitches minus one
as we only need 1 port for both uplinks.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05 13:49:24 -07:00
Mark Bloch
979bf468fc {net, RDMA}/mlx5: Extend send to vport rules
In shared FDB there is only one eswitch which is active and it receives
traffic from all representors and all vports in the HCA.

While the Ethernet representor will always reside on its native PF
the IB representor will not. Extend send to vport rule creation to
support such flows. Need to account for source vport that sends the
traffic (on which the representors resides) and the target eswitch
the traffic which reach.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05 13:49:24 -07:00
Mark Bloch
6aeb16a134 RDMA/mlx5: Fill port info based on the relevant eswitch
In shared FDB a single RDMA device can have representors that are
connected to two different eswitches. Use the right eswitch when
preparing the response to userspace.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05 13:49:24 -07:00
Leon Romanovsky
d2b10794fc RDMA/core: Create clean QP creations interface for uverbs
Unify create QP creation interface to make clean approach to create
XRC_TGT and regular QPs.

Link: https://lore.kernel.org/r/5cd50e7d8ad9112545a1a61dea62799a5cb3224a.1628014762.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 15:26:19 -03:00
Leon Romanovsky
5507f67d08 RDMA/core: Properly increment and decrement QP usecnts
The QP usecnts were incremented through QP attributes structure while
decreased through QP itself. Rely on the ib_creat_qp_user() code that
initialized all QP parameters prior returning to the user and increment
exactly like destroy does.

Link: https://lore.kernel.org/r/25d256a3bb1fc480b77d7fe439817b993de48610.1628014762.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 15:26:19 -03:00
Leon Romanovsky
00a79d6b99 RDMA/core: Configure selinux QP during creation
All QP creation flows called ib_create_qp_security(), but differently.
This caused to the need to provide exclusion conditions for the XRC_TGT,
because such QP already had selinux configuration call.

In order to fix it, move ib_create_qp_security() to the general QP
creation routine.

Link: https://lore.kernel.org/r/4d7cd6f5828aca37fb62283e6b126b73ab86b18c.1628014762.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 15:26:18 -03:00
Leon Romanovsky
8da9fe4e4f RDMA/core: Reorganize create QP low-level functions
The low-level create QP function grew to be larger than any sensible
inline function should be. The inline attribute is not really needed for
that function and can be implemented as exported symbol.

Link: https://lore.kernel.org/r/2c08709d86f876c3dfb77684357b2a939e570ca4.1628014762.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 15:26:18 -03:00
Leon Romanovsky
20e2bcc4c2 RDMA/core: Remove protection from wrong in-kernel API usage
The ib_create_named_qp() is kernel verb that is not used for user supplied
attributes. In such case, it is ULP responsibility to provide valid QP
attributes.

In-kernel API shouldn't check it, exactly like other functions that don't
check device capabilities.

Link: https://lore.kernel.org/r/b9b9e981d1af148b750750196e686199dbbf61f8.1628014762.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 15:26:18 -03:00
Leon Romanovsky
8fc3beebf6 RDMA/core: Delete duplicated and unreachable code
The ib_create_named_qp() is kernel verb and no kernel users exist that use
XRC_INI QP. Hence such QP path is not reachable. In addition, delete
duplicated assignments of QP attributes from the initialization structure.

Link: https://lore.kernel.org/r/1b4c0d1def5f8f6d26839e14d19da950cc4a0b05.1628014762.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 15:26:18 -03:00
Leon Romanovsky
5f6bb7e322 RDMA/mlx5: Delete not-available udata check
XRC_TGT QPs are created through kernel verbs and don't have udata at all.

Fixes: 6eefa839c4 ("RDMA/mlx5: Protect from kernel crash if XRC_TGT doesn't have udata")
Fixes: e383085c24 ("RDMA/mlx5: Set ECE options during QP create")
Link: https://lore.kernel.org/r/b68228597e730675020aa5162745390a2d39d3a2.1628014762.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 15:26:17 -03:00
Leon Romanovsky
20da44dfe8 RDMA/mlx5: Drop in-driver verbs object creations
There is no real value in bypassing IB/core APIs for creating standard
objects with standard types. The open-coded variant didn't have any
restrack task management calls and caused to such objects to be not
present when running rdmatoool.

Link: https://lore.kernel.org/r/f745590e5fb7d56f90fdb25f64ee3983ba17e1e4.1627040189.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 13:44:28 -03:00
Leon Romanovsky
514aee660d RDMA: Globally allocate and release QP memory
Convert QP object to follow IB/core general allocation scheme.  That
change allows us to make sure that restrack properly kref the memory.

Link: https://lore.kernel.org/r/48e767124758aeecc433360ddd85eaa6325b34d9.1627040189.git.leonro@nvidia.com
Reviewed-by: Gal Pressman <galpress@amazon.com> #efa
Tested-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> #rdma and core
Tested-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 13:44:27 -03:00
Leon Romanovsky
44da3730e0 RDMA/rdmavt: Decouple QP and SGE lists allocations
The rdmavt QP has fields that are both needed for the control and data
path. Such mixed declaration caused to the very specific allocation flow
with kzalloc_node and SGE list embedded into the struct rvt_qp.

This patch separates QP creation to two: regular memory allocation for the
control path and specific code for the SGE list, while the access to the
later is performed through derefenced pointer.

Such pointer and its context are expected to be in the cache, so
performance difference is expected to be negligible, if any exists.

Link: https://lore.kernel.org/r/f66c1e20ccefba0db3c69c58ca9c897f062b4d1c.1627040189.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 13:44:27 -03:00
Leon Romanovsky
0dc0da15ed RDMA/mlx5: Rework custom driver QP type creation
Starting from commit 2b1f747071 ("RDMA/core: Allow drivers to disable
restrack DB") the restrack is able to handle non-standard QP types either.

That change allows us to rewrite custom QP calls to their IB/core
counterparts, so we will use general QP creation flow even for the driver
QP types.

Link: https://lore.kernel.org/r/51682ab82298748941f38bd23ee3bf77ef1cab7b.1627040189.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 13:44:27 -03:00
Leon Romanovsky
8c9e7f0325 RDMA/mlx5: Delete device resource mutex that didn't protect anything
The dev->devr.mutex was intended to protect GSI QP pointer change in the
struct mlx5_ib_port_resources when it is accessed from the
pkey_change_work. However that pointer isn't changed during the runtime
and once IB/core adds MAD, it stays stable.

Link: https://lore.kernel.org/r/6e338c561033df20d92e1371fc6a7a0d93aad945.1627040189.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 13:44:27 -03:00
Leon Romanovsky
b0791dbf12 RDMA/mlx5: Cancel pkey work before destroying device resources
In the driver release flow, we are ensuring that notifier is disabled and
no new works can be added to pkey_change_handler. It means that we can
cancel that handler before destroying resources to make sure that our
unwind routine is symmetrical to the allocation one.

Link: https://lore.kernel.org/r/f2b1ea1bad952e4e7a48a6f731de9e0344986b29.1627040189.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 13:44:26 -03:00
Leon Romanovsky
f9193d2663 RDMA/efa: Remove double QP type assignment
The QP type is set by the IB/core and shouldn't be set in the driver.

Fixes: 40909f664d ("RDMA/efa: Add EFA verbs implementation")
Link: https://lore.kernel.org/r/838c40134c1590167b888ca06ad51071139ff2ae.1627040189.git.leonro@nvidia.com
Acked-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 13:44:26 -03:00
Leon Romanovsky
e66e49592b RDMA/hns: Don't overwrite supplied QP attributes
QP attributes that were supplied by IB/core already have all parameters
set when they are passed to the driver. The drivers are not supposed to
change anything in struct ib_qp_init_attr.

Fixes: 66d86e529d ("RDMA/hns: Add UD support for HIP09")
Link: https://lore.kernel.org/r/5987138875e8ade9aa339d4db6e1bd9694ed4591.1627040189.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 13:44:26 -03:00
Leon Romanovsky
4ffd3b800e RDMA/hns: Don't skip IB creation flow for regular RC QP
The call to internal QP creation function skips QP creation checks and
misses the addition of such device QPs to the restrack DB.

As a preparation to general allocation scheme, convert hns to use proper
API.

Link: https://lore.kernel.org/r/7b236c15f7d5abb368958297ac6962d8459cb824.1627040189.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 13:44:26 -03:00
Yangyang Li
8b436a99cd RDMA/hns: Fix the double unlock problem of poll_sem
If hns_roce_cmd_use_events() fails then it means that the poll_sem is not
obtained, but the poll_sem is released in hns_roce_cmd_use_polling(), this
will cause an unlock problem.

This is the static checker warning:
	drivers/infiniband/hw/hns/hns_roce_main.c:926 hns_roce_init()
	error: double unlocked '&hr_dev->cmd.poll_sem' (orig line 879)

Event mode and polling mode are mutually exclusive and resources are
separated, so there is no need to process polling mode resources in event
mode.

The initial mode of cmd is polling mode, so even if cmd fails to switch to
event mode, it is not necessary to switch to polling mode.

Fixes: a389d016c0 ("RDMA/hns: Enable all CMDQ context")
Fixes: 3d50503b3b ("RDMA/hns: Optimize cmd init and mode selection for hip08")
Link: https://lore.kernel.org/r/1627887374-20019-1-git-send-email-liangwenpeng@huawei.com
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 13:42:44 -03:00
Bob Pearson
ef4b96a577 RDMA/rxe: Restore setting tot_len in the IPv4 header
An earlier patch removed setting of tot_len in IPv4 headers because it was
also set in ip_local_out. However, this change resulted in an incorrect
ICRC being computed because the tot_len field is not masked out. This
patch restores that line. This fixes the bug reported by Zhu Yanjun.  This
bug affects anyone using rxe which is currently broken.

Fixes: 230bb836ee ("RDMA/rxe: Fix redundant call to ip_send_check")
Link: https://lore.kernel.org/r/20210729220039.18549-3-rpearsonhpe@gmail.com
Reported-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-and-tested-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-02 12:45:22 -03:00
Bob Pearson
e2a05339fa RDMA/rxe: Use the correct size of wqe when processing SRQ
The memcpy() that copies a WQE from a SRQ the QP uses an incorrect size.
The size should have been the size of the rxe_send_wqe struct not the size
of a pointer to it. The result is that IO operations using a SRQ on the
responder side will fail.

Fixes: ec0fa2445c ("RDMA/rxe: Fix over copying in get_srq_wqe")
Link: https://lore.kernel.org/r/20210729220039.18549-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-02 12:45:22 -03:00
Mike Marciniszyn
db4657afd1 RDMA/cma: Revert INIT-INIT patch
The net/sunrpc/xprtrdma module creates its QP using rdma_create_qp() and
immediately post receives, implicitly assuming the QP is in the INIT state
and thus valid for ib_post_recv().

The patch noted in Fixes: removed the RESET->INIT modifiy from
rdma_create_qp(), breaking NFS rdma for verbs providers that fail the
ib_post_recv() for a bad state.

This situation was proven using kprobes in rvt_post_recv() and
rvt_modify_qp(). The traces showed that the rvt_post_recv() failed before
ANY modify QP and that the current state was RESET.

Fix by reverting the patch below.

Fixes: dc70f7c3ed ("RDMA/cma: Remove unnecessary INIT->INIT transition")
Link: https://lore.kernel.org/r/1627583182-81330-1-git-send-email-mike.marciniszyn@cornelisnetworks.com
Cc: Haakon Bugge <haakon.bugge@oracle.com>
Cc: Chuck Lever III <chuck.lever@oracle.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-02 12:45:22 -03:00
Aharon Landau
d6793ca97b RDMA/mlx5: Delay emptying a cache entry when a new MR is added to it recently
Fixing a typo that causes a cache entry to shrink immediately after adding
to it new MRs if the entry size exceeds the high limit.  In doing so, the
cache misses its purpose to prevent the creation of new mkeys on the
runtime by using the cached ones.

Fixes: b9358bdbc7 ("RDMA/mlx5: Fix locking in MR cache work queue")
Link: https://lore.kernel.org/r/fcb546986be346684a016f5ca23a0567399145fa.1627370131.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-02 12:45:22 -03:00
Jakub Kicinski
d2e11fd2b7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicting commits, all resolutions pretty trivial:

drivers/bus/mhi/pci_generic.c
  5c2c853159 ("bus: mhi: pci-generic: configurable network interface MRU")
  56f6f4c4eb ("bus: mhi: pci_generic: Apply no-op for wake using sideband wake boolean")

drivers/nfc/s3fwrn5/firmware.c
  a0302ff590 ("nfc: s3fwrn5: remove unnecessary label")
  46573e3ab0 ("nfc: s3fwrn5: fix undefined parameter values in dev_err()")
  801e541c79 ("nfc: s3fwrn5: fix undefined parameter values in dev_err()")

MAINTAINERS
  7d901a1e87 ("net: phy: add Maxlinear GPY115/21x/24x driver")
  8a7b46fa79 ("MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-07-31 09:14:46 -07:00
Prabhakar Kushwaha
0050a57638 RDMA/qedr: Improve error logs for rdma_alloc_tid error return
Use -EINVAL return type to identify whether error is returned because of
"Out of MR resources" or any other error types.

Link: https://lore.kernel.org/r/20210729151732.30995-2-pkushwaha@marvell.com
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-30 11:27:34 -03:00
Cai Huoqing
991c4274dc RDMA/hfi1: Fix typo in comments
Remove the repeated word 'the' from comments

Link: https://lore.kernel.org/r/20210729082346.1882-1-caihuoqing@baidu.com
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-30 10:06:08 -03:00
Leon Romanovsky
bbafcbc2b1 RDMA/iwpm: Rely on the rdma_nl_[un]register() to ensure that requests are valid
The core netlink code alread guarentees that no netlink callback can be
running outside the rdma_nl_register/unregister() region and this
registration happens during module init/exit. Thus it is already prevented
that iwpm_valid_client() can ever fail. Remove it.

Link: https://lore.kernel.org/r/a9f05a78f9996bf6ea47099b5e02671bf742f5ab.1627048781.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-30 10:01:41 -03:00
Leon Romanovsky
bdb0e4e3ff RDMA/iwpm: Remove not-needed reference counting
iwpm_init() and iwpm_exit() are called only once during iw_cm module
load. This makes whole reference count implementation not needed at all.

Link: https://lore.kernel.org/r/1778ded873ba58c9fadc5bb25038de1cec843bec.1627048781.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-30 10:01:40 -03:00
Leon Romanovsky
e677b72a06 RDMA/iwcm: Release resources if iw_cm module initialization fails
The failure during iw_cm module initialization partially left the system
with unreleased memory and other resources. Rewrite the module init/exit
routines in such way that netlink commands will be opened only after
successful initialization.

Fixes: b493d91d33 ("iwcm: common code for port mapper")
Link: https://lore.kernel.org/r/b01239f99cb1a3e6d2b0694c242d89e6410bcd93.1627048781.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-30 10:01:40 -03:00
Xiyu Yang
a0293eb249 RDMA/hfi1: Convert from atomic_t to refcount_t on hfi1_devdata->user_refcount
refcount_t type and corresponding API can protect refcounters from
accidental underflow and overflow and further use-after-free situations.

Link: https://lore.kernel.org/r/1626674454-56075-1-git-send-email-xiyuyang19@fudan.edu.cn
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Tested-by: Josh Fisher <josh.fisher@cornelisnetworks.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-30 09:34:26 -03:00
Mike Marciniszyn
62004871e1 IB/hfi1: Adjust pkey entry in index 0
It is possible for the primary IPoIB network device associated with any
RDMA device to fail to join certain multicast groups preventing IPv6
neighbor discovery and possibly other network ULPs from working
correctly. The IPv4 broadcast group is not affected as the IPoIB network
device handles joining that multicast group directly.

This is because the primary IPoIB network device uses the pkey at ndex 0
in the associated RDMA device's pkey table. Anytime the pkey value of
index 0 changes, the primary IPoIB network device automatically modifies
it's broadcast address (i.e. /sys/class/net/[ib0]/broadcast), since the
broadcast address includes the pkey value, and then bounces carrier. This
includes initial pkey assignment, such as when the pkey at index 0
transitions from the opa default of invalid (0x0000) to some value such as
the OPA default pkey for Virtual Fabric 0: 0x8001 or when the fabric
manager is restarted with a configuration change causing the pkey at index
0 to change. Many network ULPs are not sensitive to the carrier bounce and
are not expecting the broadcast address to change including the linux IPv6
stack.  This problem does not affect IPoIB child network devices as their
pkey value is constant for all time.

To mitigate this issue, change the default pkey in at index 0 to 0x8001 to
cover the predominant case and avoid issues as ipoib comes up and the FM
sweeps.

At some point, ipoib multicast support should automatically fix
non-broadcast addresses as it does with the primary broadcast address.

Fixes: 7724105686 ("IB/hfi1: add driver files")
Link: https://lore.kernel.org/r/20210715160445.142451.47651.stgit@awfm-01.cornelisnetworks.com
Suggested-by: Josh Collier <josh.d.collier@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-30 09:27:55 -03:00
Mike Marciniszyn
e9901043b2 IB/hfi1: Indicate DMA wait when txq is queued for wakeup
There is no counter for dmawait in AIP, which hampers debugging
performance issues.

Add the counter increment when the txq is queued.

Fixes: d99dc602e2 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
Link: https://lore.kernel.org/r/20210715160440.142451.8278.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-30 09:27:54 -03:00
Arnd Bergmann
a76053707d dev_ioctl: split out ndo_eth_ioctl
Most users of ndo_do_ioctl are ethernet drivers that implement
the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware
timestamping with SIOCSHWTSTAMP/SIOCGHWTSTAMP.

Separate these from the few drivers that use ndo_do_ioctl to
implement SIOCBOND, SIOCBR and SIOCWANDEV commands.

This is a purely cosmetic change intended to help readers find
their way through the implementation.

Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Vladimir Oltean <olteanv@gmail.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27 20:11:45 +01:00
Tal Gilboa
616d576934 IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq
is_apu_thread_cq() used to detect CQs which are attached to APU
threads. This was extended to support other elements as well,
so the function was renamed to is_apu_cq().

c_eqn_or_apu_element was extended from 8 bits to 32 bits, which wan't
reflected when the APU support was first introduced.

Acked-by: Michael S. Tsirkin <mst@redhat.com> # vdpa
Signed-off-by: Tal Gilboa <talgi@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-07-25 11:39:04 +03:00
Bart Van Assche
87662a472a scsi: iser: Use scsi_get_sector() instead of scsi_get_lba()
Use scsi_get_sector() instead of scsi_get_lba() since the name of the
latter is confusing. This patch does not change any functionality.

Link: https://lore.kernel.org/r/20210513223757.3938-3-bvanassche@acm.org
Link: https://lore.kernel.org/r/20210609033929.3815-12-martin.petersen@oracle.com
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Message-Id: <20210609033929.3815-12-martin.petersen@oracle.com>
2021-07-20 22:10:42 -04:00
Jason Gunthorpe
07d0f314ba Merge branch 'mlx5_dcs' into rdma.git for-next
Leon Romanovsky says:

====================
Add ConnectX DCS offload support

This patchset from Lior adds support of DCI stream channel (DCS) support.

DCS is an offload to SW load balancing of DC initiator work requests.

A single DC QP initiator (DCI) can be connected to only one target at the
time and can't start new connection until the previous work request is
completed.

This limitation causes to delays when the initiator process needs to
transfer data to multiple targets at the same time.
====================

* branch 'mlx5_dcs':
  RDMA/mlx5: Add DCS offload support
  RDMA/mlx5: Separate DCI QP creation logic
  net/mlx5: Add DCS caps & fields support
2021-07-20 15:11:38 -03:00
Lior Nahmanson
11656f593a RDMA/mlx5: Add DCS offload support
DCS is an offload to SW load balancing of DC initiator work requests.

A single DCI can be connected to only one target at the time and can't
start new connection until the previous work request is completed.  This
limitation will cause to delay when the initiator process needs to
transfer data to multiple targets at the same time.  The SW solution is to
use a process that handling and spreading the work request on many DCIs
according to destinations.

This feature is an offload to this process and coming to reduce the load
from the CPU and improve the performance.

Link: https://lore.kernel.org/r/491c2c2afdb5b07de7f03eab3f93cf0704549dbc.1624258894.git.leonro@nvidia.com
Reviewed-by: Meir Lichtinger <meirl@nvidia.com>
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-20 15:04:14 -03:00
Lior Nahmanson
2013b4d525 RDMA/mlx5: Separate DCI QP creation logic
This patch isolates DCI QP creation logic to separate function, so this
change will reduce complexity when adding new features to DCI QP without
interfering with other QP types.

The code was copied from create_user_qp() while taking only DCI relevant bits.

Link: https://lore.kernel.org/r/b4530bdd999349c59691224f016ff1efb5dc3b92.1624258894.git.leonro@nvidia.com
Reviewed-by: Meir Lichtinger <meirl@nvidia.com>
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-20 15:04:14 -03:00
Bob Pearson
923232bbea RDMA/rxe: Fix types in rxe_icrc.c
Currently the ICRC is generated as a u32 type and then forced to a __be32
and stored into the ICRC field in the packet. The actual type of the ICRC
is __be32. This patch replaces u32 by __be32 and eliminates the casts.
The computation is exactly the same as the original but the types are more
consistent.

Link: https://lore.kernel.org/r/20210707040040.15434-10-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-16 12:43:35 -03:00
Bob Pearson
e4f5c82fef RDMA/rxe: Add kernel-doc comments to rxe_icrc.c
This patch adds kernel-doc style comments to rxe_icrc.c

Link: https://lore.kernel.org/r/20210707040040.15434-9-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-16 12:43:34 -03:00
Bob Pearson
add2b3b80e RDMA/rxe: Move crc32 init code to rxe_icrc.c
This patch collects the code from rxe_register_device() that sets up the
crc32 calculation into a subroutine rxe_icrc_init() in rxe_icrc.c.

Link: https://lore.kernel.org/r/20210707040040.15434-8-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-16 12:43:34 -03:00
Bob Pearson
6388751057 RDMA/rxe: Fixup rxe_icrc_hdr
rxe_icrc_hdr() in rxe_icrc.c is no longer shared. This patch makes it
static and changes the parameter list to match the other routines there.

Link: https://lore.kernel.org/r/20210707040040.15434-7-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-16 12:43:34 -03:00
Bob Pearson
b6c6cc4acd RDMA/rxe: Move rxe_crc32 to a subroutine
Move rxe_crc32() from rxe.h to rxe_icrc.c as a static local function.

Link: https://lore.kernel.org/r/20210707040040.15434-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-16 12:43:34 -03:00
Bob Pearson
1117f26ea7 RDMA/rxe: Move ICRC generation to a subroutine
Isolate ICRC generation into a single subroutine named rxe_generate_icrc()
in rxe_icrc.c. Remove scattered crc generation code from elsewhere.

Link: https://lore.kernel.org/r/20210707040040.15434-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-16 12:43:34 -03:00
Bob Pearson
13050a0b32 RDMA/rxe: Fixup rxe_send and rxe_loopback
Fixup rxe_send() and rxe_loopback() in rxe_net.c to have the same calling
sequence. This patch makes them static and have the same parameter list
and return value.

Link: https://lore.kernel.org/r/20210707040040.15434-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-16 12:43:33 -03:00
Bob Pearson
36fbb03d05 RDMA/rxe: Move rxe_xmit_packet to a subroutine
rxe_xmit_packet() was an overlong inline subroutine. This patch moves it
into rxe_net.c as an ordinary subroutine.

Link: https://lore.kernel.org/r/20210707040040.15434-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-16 12:43:33 -03:00
Bob Pearson
fe87fb17c6 RDMA/rxe: Move ICRC checking to a subroutine
Move the code in rxe_recv() that checks the ICRC on incoming packets to a
subroutine rxe_check_icrc() and move that to rxe_icrc.c.

Link: https://lore.kernel.org/r/20210707040040.15434-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-16 12:43:33 -03:00
Anand Khoje
21bfee9c0c IB/core: Read subnet_prefix in ib_query_port via cache.
ib_query_port() calls device->ops.query_port() to get the port
attributes. The method of querying is device driver specific.  The same
function calls device->ops.query_gid() to get the GID and extract the
subnet_prefix (gid_prefix).

The GID and subnet_prefix are stored in a cache. But they do not get
read from the cache if the device is an Infiniband device. The
following change takes advantage of the cached subnet_prefix.
Testing with RDBMS has shown a significant improvement in performance
with this change.

Link: https://lore.kernel.org/r/20210712122625.1147-4-anand.a.khoje@oracle.com
Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com>
Signed-off-by: Haakon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-16 10:57:28 -03:00
Anand Khoje
36721a6d4c IB/core: Shifting initialization of device->cache_lock
The lock cache_lock of struct ib_device is initialized in function
ib_cache_setup_one(). This is much later than the device initialization in
_ib_alloc_device().

This change shifts initialization of cache_lock in _ib_alloc_device().

Link: https://lore.kernel.org/r/20210712122625.1147-3-anand.a.khoje@oracle.com
Suggested-by: Haakon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-16 10:57:28 -03:00