This patch adds support for SRQ's created in user space and update
qedr_affiliated_event to deal with general SRQ events.
Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Yuval Bason <yuval.bason@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Implement the SRQ specific verbs and update the poll_cq verb to deal with
SRQ completions.
Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Yuval Bason <yuval.bason@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Today, we are using idr mechanism for QP's only.
This patch prepares the qedr_idr stuctures and the idr routines for
both QP's and SRQ's.
Signed-off-by: Yuval Bason <yuval.bason@cavium.com>
Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
mlx5_ib_create_qp_resp was never initialized and only the first 4 bytes
were written.
Fixes: 41d902cb7c ("RDMA/mlx5: Fix definition of mlx5_ib_create_qp_resp")
Cc: <stable@vger.kernel.org>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Several handlers need temporary allocations for the life of the method,
switch them to use the uverbs_alloc allocator.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
There is no reason for drivers to do this, the core code should take of
everything. The drivers will provide their information from rodata to
describe their modifications to the core's base uapi specification.
The core uses this to build up the runtime uapi for each device.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
This will allow FW to not send more data to TP (which would then need to
be buffered). Pass the negotiated TCP window scale to FW in the FLOWC WR.
Also refactor send_flowc() a bit to clean it up.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Now that the unregister_netdev flow for IPoIB no longer relies on external
code we can now introduce the use of priv_destructor and
needs_free_netdev.
The rdma_netdev flow is switched to use the netdev common priv_destructor
instead of the special free_rdma_netdev and the IPOIB ULP adjusted:
- priv_destructor needs to switch to point to the ULP's destructor
which will then call the rdma_ndev's in the right order
- We need to be careful around the error unwind of register_netdev
as it sometimes calls priv_destructor on failure
- ULPs need to use ndo_init/uninit to ensure proper ordering
of failures around register_netdev
Switching to priv_destructor is a necessary pre-requisite to using
the rtnl new_link mechanism.
The VNIC user for rdma_netdev should also be revised, but that is left for
another patch.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Denis Drozdov <denisd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
To optimize NVME-oF READ IOPs, use a specialized WQE that combines
the RDMA WRITE and SEND_INV WR chain submitted by the NVME-oF target
driver.
This reduces uP overhead per NVME-oF IO, and results in over 10%
improvement in NVME-oF 4K READ IOPs.
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Adds iw_cxgb4 functionality to support RDMA_WRITE_WITH_IMMEDATE opcode.
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In c4iw_create_qp() there are several struct members which potentially
aren't inintialized like uresp.rq_key. I've fixed this code before in
in commit ae1fe07f3f ("RDMA/cxgb4: Fix stack info leak in
c4iw_create_qp()") so this time I'm just going to take a big hammer
approach and memset the whole struct to zero. Hopefully, it will stay
fixed this time.
In c4iw_create_srq() we don't clear uresp.reserved.
Fixes: 6a0b6174d3 ("rdma/cxgb4: Add support for kernel mode SRQ's")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
According to IB protocol, there are some cases that work requests must
return the flush error completion status through the completion queue. Due
to hardware limitation, the driver needs to assist the flush process.
This patch adds the support of flush cqe for hip08 in the cases that
needed, such as poll cqe, post send, post recv and aeqe handle.
The patch also considered the compatibility between kernel and user space.
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This does the same as the patch before, except for ioctl. The rules are
the same, but for the ioctl methods the core code handles setting up the
uobject.
- Retrieve the ib_dev from the uobject->context->device. This is
safe under ioctl as the core has already done rdma_alloc_begin_uobject
and so CREATE calls are entirely protected by the rwsem.
- Retrieve the ib_dev from uobject->object
- Call ib_uverbs_get_ucontext()
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The proper return code is "-EOPNOTSUPP" when the modify_cq() callback is
not supported, all drivers should generate this and all users should check
for it when detecting not supported functionality.
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com> (for mlx5)
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch avoids that sparse reports the following warning:
drivers/infiniband/hw/cxgb4/qp.c:2269:34: warning: Using plain integer as NULL pointer
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch avoids that sparse complains about casts to restricted __be32.
Fixes: a3cdaa69e4 ("cxgb4: Adds CPL support for Shared Receive Queues")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch avoids that the following warning is reported when building with
W=1:
drivers/infiniband/hw/cxgb4/cm.c:1860:5: warning: variable 'status' set but not used [-Wunused-but-set-variable]
u8 status;
^~~~~~
Fixes: 6a0b6174d3 ("rdma/cxgb4: Add support for kernel mode SRQ's")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This was missed in a few places, and was just using 0.
Also correct the spelling of HNS_ROCE_FLOW_LABEL_MASK
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch mainly uses CMD_CSQ_DESC_NUM instead of magic number in order
to improve readability.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Set for ret was missing in the error path here, resulting in incorrect
error code for modify_qp.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch mainly fills the correct value into the vlan id field of qp
context as well as update the vlan field name according to the latest
hardware user manual.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Only when the IB_QP_AV flag of attr_mask is set is it valid to assign the
related fields of the av into the qp context.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The rdma core is taking care of return the right error code when the
rdma device callbacks aren't supported.
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The proper return code is "-EOPNOTSUPP" when the create_srq() callback
is not supported.
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In the current implementation, the driver tries to allocate contiguous
memory, and if it fails, it falls back to 4K fragmented allocation.
Once the memory is fragmented, the first allocation might take a lot
of time, and even fail, which can cause connection failures.
This patch changes the logic to always allocate with 4K granularity,
since it's more robust and more likely to succeed.
This patch was tested with Lustre and no performance degradation
was observed.
Note: This commit eliminates the "shrinking WQE" feature. This feature
depended on using vmap to create a virtually contiguous send WQ.
vmap use was abandoned due to problems with several processors (see the
commit cited below). As a result, shrinking WQE was available only with
physically contiguous send WQs. Allocating such send WQs caused the
problems described above.
Therefore, as a side effect of eliminating the use of large physically
contiguous send WQs, the shrinking WQE feature became unavailable.
Warning example:
worker/20:1: page allocation failure: order:8, mode:0x80d0
CPU: 20 PID: 513 Comm: kworker/20:1 Tainted: G OE ------------
Workqueue: ib_cm cm_work_handler [ib_cm]
Call Trace:
[<ffffffff81686d81>] dump_stack+0x19/0x1b
[<ffffffff81186160>] warn_alloc_failed+0x110/0x180
[<ffffffff8118a954>] __alloc_pages_nodemask+0x9b4/0xba0
[<ffffffff811ce868>] alloc_pages_current+0x98/0x110
[<ffffffff81184fae>] __get_free_pages+0xe/0x50
[<ffffffff8133f6fe>] swiotlb_alloc_coherent+0x5e/0x150
[<ffffffff81062551>] x86_swiotlb_alloc_coherent+0x41/0x50
[<ffffffffa056b4c4>] mlx4_buf_direct_alloc.isra.7+0xc4/0x180 [mlx4_core]
[<ffffffffa056b73b>] mlx4_buf_alloc+0x1bb/0x260 [mlx4_core]
[<ffffffffa0b15496>] create_qp_common+0x536/0x1000 [mlx4_ib]
[<ffffffff811c6ef7>] ? dma_pool_free+0xa7/0xd0
[<ffffffffa0b163c1>] mlx4_ib_create_qp+0x3b1/0xdc0 [mlx4_ib]
[<ffffffffa0b01bc2>] ? mlx4_ib_create_cq+0x2d2/0x430 [mlx4_ib]
[<ffffffffa0b21f20>] mlx4_ib_create_qp_wrp+0x10/0x20 [mlx4_ib]
[<ffffffffa08f152a>] ib_create_qp+0x7a/0x2f0 [ib_core]
[<ffffffffa06205d4>] rdma_create_qp+0x34/0xb0 [rdma_cm]
[<ffffffffa08275c9>] kiblnd_create_conn+0xbf9/0x1950 [ko2iblnd]
[<ffffffffa074077a>] ? cfs_percpt_unlock+0x1a/0xb0 [libcfs]
[<ffffffffa0835519>] kiblnd_passive_connect+0xa99/0x18c0 [ko2iblnd]
Fixes: 73898db043 ("net/mlx4: Avoid wrong virtual mappings")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This clearly indicates that the input is a bitwise combination of values
in an enum, and identifies which enum contains the definition of the bits.
Special accessors are provided that handle the mandatory validation of the
allowed bits and enforce the correct type for bitwise flags.
If we had introduced this at the start then the kabi would have uniformly
used u64 data to pass flags, however today there is a mixture of u64 and
u32 flags. All places are converted to accept both sizes and the accessor
fixes it. This allows all existing flags to grow to u64 in future without
any hassle.
Finally all flags are, by definition, optional. If flags are not passed
the accessor does not fail, but provides a value of zero.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Since neither ib_post_send() nor ib_post_recv() modify the data structure
their second argument points at, declare that argument const. This change
makes it necessary to declare the 'bad_wr' argument const too and also to
modify all ULPs that call ib_post_send(), ib_post_recv() or
ib_post_srq_recv(). This patch does not change any functionality but makes
it possible for the compiler to verify whether the
ib_post_(send|recv|srq_recv) really do not modify the posted work request.
To make this possible, only one cast had to be introduce that casts away
constness, namely in rpcrdma_post_recvs(). The only way I can think of to
avoid that cast is to introduce an additional loop in that function or to
change the data type of bad_wr from struct ib_recv_wr ** into int
(an index that refers to an element in the work request list). However,
both approaches would require even more extensive changes than this
patch.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Since the next patch will constify the wr pointer, do not modify the data
that pointer points at.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When posting a send work request, the work request that is posted is not
modified by any of the RDMA drivers. Make this explicit by constifying
most ib_send_wr pointers in RDMA transport drivers.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The driver implements the modify_cq callback, but did not set the bit to
expose it to userspace.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Because the data structure of hip08 is little endian, it needs to fix the
immediate field of wqe and cqe into __le32.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In order to avoid using usleep function in lock function, we use delay
function instead of it. Besides, it also use brackets for standardized
the computed order.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When hop_num is more than three, it need to return -EINVAL. This patch
fixes it.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When create loop qp fail, it will return the correct result when
modify_qp() fails.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When init cmq fail in initial flow of RoCE, it should return the errno of
cmq_init function, not of the rest call.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When a CX5 device is configured in dual-port RoCE mode, after creating
many VFs against port 1, creating the same number of VFs against port 2
will flood kernel/syslog with something like
"mlx5_*:mlx5_ib_bind_slave_port:4266:(pid 5269): port 2 already
affiliated."
So basically, when traversing mlx5_ib_dev_list, mlx5_ib_add_slave_port()
repeatedly attempts to bind the new mpi structure to every device on the
list until it finds an unbound device.
Change the log level from warn to dbg to avoid log flooding as the warning
should be harmless.
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch avoids that the following compiler warning is reported when
building with gcc 8 and W=1:
drivers/infiniband/hw/usnic/usnic_fwd.c:95:2: warning: 'strncpy' output may be truncated copying 16 bytes from a string of length 20 [-Wstringop-truncation]
strncpy(ufdev->name, netdev_name(ufdev->netdev),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sizeof(ufdev->name) - 1);
~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This driver doesn't provide any kernel services, it only provides
an interface via uverbs, so it should depend on, not select, uverbs
support.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch implements the srq specific verbs such as create/destroy/modify
and post_srq_recv. And adds srq specific structures and defines to t4.h
and uapi.
Also updates the cq poll logic to deal with completions that are
associated with the SRQ's.
This patch also handles kernel mode SRQ_LIMIT events as well as flushed
SRQ buffers
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch adds kernel mode t4_srq structures and support functions,
uapi structures and defines, as well as firmware work request structures.
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch avoids that the following compiler warning is reported when
building with gcc 8 and W=1:
In function 'ocrdma_mbx_get_ctrl_attribs',
inlined from 'ocrdma_init_hw' at drivers/infiniband/hw/ocrdma/ocrdma_hw.c:3224:11:
drivers/infiniband/hw/ocrdma/ocrdma_hw.c:1368:3: warning: 'strncpy' output may be truncated copying 31 bytes from a string of length 31 [-Wstringop-truncation]
strncpy(dev->model_number,
^~~~~~~~~~~~~~~~~~~~~~~~~~
hba_attribs->controller_model_number, 31);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
We have a parallel unlocked reader and writer with ib_uverbs_get_context()
vs everything else, and nothing guarantees this works properly.
Audit and fix all of the places that access ucontext to use one of the
following locking schemes:
- Call ib_uverbs_get_ucontext() under SRCU and check for failure
- Access the ucontext through an struct ib_uobject context member
while holding a READ or WRITE lock on the uobject.
This value cannot be NULL and has no race.
- Hold the ucontext_lock and check for ufile->ucontext !NULL
This also re-implements ib_uverbs_get_ucontext() in a way that is safe
against concurrent ib_uverbs_get_context() and disassociation.
As a side effect, every access to ucontext in the commands is via
ib_uverbs_get_context() with an error check, or via the uobject, so there
is no longer any need for the core code to check ucontext on every command
call. These checks are also removed.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This approach matches the standard flow of the typical write method that
relies on the HW object to store the device and the uobject to access the
ucontext. Avoids the use of the devx_ufile2uctx in several places will
make revising the semantics of ib_uverbs_get_ucontext() in the next patch
simpler.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Expose the mlx5 flow steering parsing trees, exposing the functionality to
user space.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add support to set a destination that is a flow table, this can come from
the DEVX destination.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add support to set a public flow steering rule when its destination is a
TIR by using raw specification data.
The logic follows the verbs API but instead of using ib_spec(s) the raw,
device specific, description is used.
This allows supporting specialty matchers without having to define new
matches in the verbs struct based language.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Introduce driver create and destroy flow methods on the uverbs flow
object.
This allows the driver to get its specific device attributes to match the
underlay specification while still using the generic ib_flow object for
cleanup and code sharing.
The IB object's attributes are set via the ib_set_flow() helper function.
The specific implementation for the given specification is added in
downstream patches.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Introduce flow steering matcher object and its create and destroy methods.
This matcher object holds some mlx5 specific driver properties that
matches the underlay device specification when an mlx5 flow steering group
is created.
It will be used in downstream patches to be part of mlx5 specific create
flow method.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>