Patches for 4.16 that are dependent on patches sent to 4.15-rc.
These are small clean ups for the vmw_pvrdma and i40iw drivers.
* 'from-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git:
RDMA/vmw_pvrdma: Remove usage of BIT() from UAPI header
RDMA/vmw_pvrdma: Use refcount_t instead of atomic_t
RDMA/vmw_pvrdma: Use more specific sizeof in kcalloc
RDMA/vmw_pvrdma: Clarify QP and CQ is_kernel logic
RDMA/vmw_pvrdma: Add UAR SRQ macros in ABI header file
i40iw: Change accelerated flag to bool
refcount_t is the preferred type for refcounts. Change the
QP and CQ refcnt fields to use refcount_t.
Reviewed-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Convert the sizeof(void *) in two kcalloc calls to be more
specific for the arrays that are being allocated.
Reviewed-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Be more consistent in setting and checking is_kernel
flag for QPs and CQs.
Reviewed-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The accelerated flag only utilizes two values: 0 and 1.
Modify accelerated flag in struct i40iw_cm_node to bool.
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
ibmr.device is being set only after ib_alloc_mr() is
(successfully) complete. Therefore, in case mlx5_core_create_mkey()
return with error, the error flow calls mlx5_free_priv_descs()
which uses ibmr.device (which doesn't exist yet), causing
a NULL dereference oops.
To fix this, the IB device should be set in the mr struct earlier
stage (e.g. prior to calling mlx5_core_create_mkey()).
Fixes: 8a187ee52b ("IB/mlx5: Support the new memory registration API")
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Nitzan Carmi <nitzanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
User-space applications can do mmap and munmap directly at
any time.
Since the VMA list is not protected with a mutex, concurrent
accesses to the VMA list from the mmap and munmap can cause
data corruption. Add a mutex around the list.
Cc: <stable@vger.kernel.org> # v4.7
Fixes: 7c2344c3bb ("IB/mlx5: Implements disassociate_ucontext API")
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Change the slid arg to ingress_pkey_table_fail() to a full
32Bits and do not convert to 16Bits in caller. This is so we
can keep everything 32bit in the kernel and only change to
16bit at the uapi boundary.
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The accepting QP ORD value should be adjusted not to
exceed the peer QP IRD value (RFC 6581). This is
skipped for loopback. After the ORD is validated
by i40iw_record_ird_ord(), adjust the ORD value of
the loopback accepting QP to prevent overrunning the
IRD space of the peer QP. Also move the ORD accounting
for 0-byte RDMA read to i40iw_record_ird_ord().
Fixes: f27b4746f3 ("i40iw: add connection management code")
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Casting to u16 before validating IRD/ORD connection
parameters could cause recording wrong IRD/ORD values
in the cm_node. Validate the IRD/ORD parameters as
they are passed by the application before recording
them.
Fixes: f27b4746f3 ("i40iw: add connection management code")
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The LLP_DOUBT_REACHABILITY Asynchronous Event (AE) is an early
warning of a connection issue. It is followed by LLP_TOO_MANY_RETRIES
AE, if the retransmit threshold is reached and recovery is not possible
for the connection.
Currently we terminate the connection on receiving the
LLP_DOUBT_REACHABILITY AE. Ignore this AE and
terminate the connection only on LLP_TOO_MANY_RETRIES AE.
This improves the user experience on cable disconnect/reconnect
scenario while running iWARP traffic. On cable disconnect,
the QP traffic is paused and the user has a larger and more
reasonable timeout within which if the cable is reconnected,
traffic can continue.
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Partial FPDU processing is broken as the sequence number
for the first partial FPDU is wrong due to incorrect
Q2 buffer offset. The offset should be 64 rather than 16.
Fixes: 786c6adb3a ("i40iw: add puda code")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
On IP address change event, all connected QPs are torn down
irrespective of whether IP address is involved in a connection.
Only teardown connections those source or destination address
matches the netdev interface IP address being changed, and if
they are on the same VLAN as the netdev.
Fixes: e5e74b61b1 ("i40iw: Add IP addr handling on netdev events")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Register a netdevice notifier for netdev UP/DOWN
notification events and report the appropriate ib event.
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Lower Inbound RDMA Read Queue (Q1) object count by a factor of 2
as it is incorrectly doubled. Also, round up Q1 and Transmit FIFO (XF)
object count to power of 2 to satisfy hardware requirement.
Fixes: 86dbcd0f12 ("i40iw: add file to handle cqp calls")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Consolidate all power of 2 round calculations to
use kernel utility function roundup_pow_of_two().
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Increase I40IW_MAX_IRD_SIZE to 64 which is the device limit.
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The accelerated flag only utilizes two values: 0 and 1.
Modify accelerated flag in struct nes_cm_node to bool.
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
During driver init, various registers are saved to allow restoration
after an FLR or gen3 bump. Some of these registers are not available
in some circumstances (i.e. Virtual machines).
This bug makes the driver unusable when the PCI device is passed into
a VM, it fails during probe.
Delete unnecessary register read/write, and only access register if
the capability exists.
Cc: <stable@vger.kernel.org> # 4.14.x
Fixes: a618b7e40a ("IB/hfi1: Move saving PCI values to a separate function")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
These duplicate includes have been found with scripts/checkincludes.pl but
they have been removed manually to avoid removing false positives.
Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch adds eq support for hip08. The eq table can
be multi-hop addressed.
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Reviewed-by: Lijun Ou <oulijun@huawei.com>
Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Considering the compatibility of supporting hip08's eq
process and possible changes of data structure, this patch
refactors the eq code structure of hip06.
We move all the eq process code for hip06 from hns_roce_eq.c
into hns_roce_hw_v1.c, and also for hns_roce_eq.h. With
these changes, it will be convenient to add the eq support
for later hardware version.
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Reviewed-by: Lijun Ou <oulijun@huawei.com>
Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Congestion counters are counted and queried per physical function.
When working in LAG mode, CNP packets can be sent or received on both
of the functions, thus congestion counters should be aggregated from
the two physical functions.
Fixes: e1f24a79f4 ("IB/mlx5: Support congestion related counters")
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The use of wait queues in vmw_pvrdma for handling concurrent
access to a resource leaves a race condition which can cause a use
after free bug.
Fix this by using the pattern from other drivers, complete() protected by
dec_and_test to ensure complete() is called only once.
Fixes: 29c8d9eba5 ("IB: Add vmw_pvrdma driver")
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
refcount_dec generates a warning when the operation
causes the refcount to hit zero. Avoid this by using
refcount_dec_and_test.
Fixes: 8b10ba783c ("RDMA/vmw_pvrdma: Add shared receive queue support")
Reviewed-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
If a wr chain was posted and needed to be flushed, only the first
wr in the chain was completed with FLUSHED status. The rest were
never completed. This caused isert to hang on shutdown due to the
missing completions which left iscsi IO commands referenced, stalling
the shutdown.
Fixes: 4fe7c2962e ("iw_cxgb4: refactor sq/rq drain logic")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The flush/drain logic was not retaining the original wr opcode in
its completion. This can cause problems if the application uses
the completion opcode to make decisions.
Use bit 10 of the CQE header word to indicate the CQE is a special
drain completion, and save the original WR opcode in the cqe header
opcode field.
Fixes: 4fe7c2962e ("iw_cxgb4: refactor sq/rq drain logic")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
If the RECV CQE is in error, ignore the MSN check. This was causing
recvs that were flushed into the sw cq to be completed with the wrong
status (BAD_MSN instead of FLUSHED).
Cc: stable@vger.kernel.org
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Debugfs file reset_stats is created with S_IRUSR permissions,
but ocrdma_dbgfs_ops_read() doesn't support OCRDMA_RESET_STATS,
whereas ocrdma_dbgfs_ops_write() supports only OCRDMA_RESET_STATS.
The patch fixes misstype with permissions.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Smatch complains about this code:
drivers/infiniband/hw/mlx4/qp.c:1827 _mlx4_set_path()
error: buffer overflow 'dev->dev->caps.gid_table_len' 3 <= 255
The mlx4_ib_gid_index_to_real_index() does check that "port" is within
bounds, but we don't check the return value for errors. It seems simple
enough to add a check for that.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The story is that Smatch marks skb->data as untrusted so it generates
a warning message here:
drivers/infiniband/hw/cxgb4/cm.c:4100 process_work()
error: buffer overflow 'work_handlers' 241 <= 255
In other places which handle this such as t4_uld_rx_handler() there is
some checking to make sure that the function pointer is not NULL. I
have added bounds checking and a check for NULL here as well.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The pointer reg_workq is local to the source and does not need to be
in global scope, so make it static.
Cleans up sparse warning:
drivers/infiniband/hw/cxgb4/device.c:69:25: warning: symbol 'reg_workq'
was not declared. Should it be static?
Fixes: 1c8f1da5d8 ("iw_cxgb4: Fix possible circular dependency locking warning")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The debugfs file prints the difference between host timestamps as a
seconds/nanoseconds tuple, along with a 64-bit nanoseconds hardware
timestamp. The host time is read using getnstimeofday() which is
deprecated because of the y2038 overflow, and it suffers from time jumps
during settimeofday() and leap seconds.
Converting to ktime_get_ts64() would solve those two, but I'm going
a little further here by changing to ktime_get() and printing 64-bit
nanoseconds on both host and hw timestamps. This simplifies the code
further and makes the output easier to understand.
The format of the debugfs file obviously changes here, but this should
only be read by humans and not scripts, so I assume it's fine.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The bnxt_qplib_disable_nq() call is redundant as it occurs
after 'goto fail' and hence it called twice. Remove it.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
A warning that I thought I had fixed before occasionally comes
back in rare randconfig builds (I found 7 instances in the last
100000 builds, originally it was much more frequent):
drivers/infiniband/hw/mlx5/mr.c: In function 'mlx5_ib_reg_user_mr':
drivers/infiniband/hw/mlx5/mr.c:1229:5: error: 'order' may be used uninitialized in this function [-Werror=maybe-uninitialized]
if (order <= mr_cache_max_order(dev)) {
^
drivers/infiniband/hw/mlx5/mr.c:1247:8: error: 'ncont' may be used uninitialized in this function [-Werror=maybe-uninitialized]
drivers/infiniband/hw/mlx5/mr.c:1247:8: error: 'page_shift' may be used uninitialized in this function [-Werror=maybe-uninitialized]
drivers/infiniband/hw/mlx5/mr.c:1260:2: error: 'npages' may be used uninitialized in this function [-Werror=maybe-uninitialized]
I've looked at all those findings again and noticed that they are all
with CONFIG_INFINIBAND_USER_MEM=n, which means ib_umem_get() returns
an error unconditionally and we never initialize or use those variables.
This triggers a condition in gcc iff mr_umem_get() is partially but not
entirely inlined, which in turn depends on the exact combination of
optimization settings. This is a known problem with gcc, with no
easy solution in the compiler, so this adds another workaround that
should be more reliable than my previous attempt.
Returning an error from mlx5_ib_reg_user_mr() earlier means that we
can completely bypass the logic that caused the warning, the compiler
can now see that the variable is never accessed.
Fixes: 14ab8896f5 ("IB/mlx5: avoid bogus -Wmaybe-uninitialized warning")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
add_sd_cnt in info structure passed to i40iw_create_hmc_obj_type
must be 0 and since it is modified during the call, it must be
reset in the loop. This avoids unnecessarily reprogramming the
SDs multiple times with the same values.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Use sqsize instead of I40IW_CQP_SW_SQSIZE_2048 to initialize
cqp_requests elements in the for-loop as sqsize is used
to allocate memory for cqp_requests.
Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There is a stale entry in nes_cm_tcp_context that has apparently
never been used in mainline linux.
I'm trying to kill off all users of timeval as part of the
y2038-safety work, so let's just remove this one.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There is a stale entry in i40iw_cm_tcp_context that apparently
was copied from the 'nes' driver but never used in i40iw.
I'm trying to kill off all users of timeval as part of the
y2038-safety work, so let's just remove this one.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There is no need to re-calculate the number of pages since it is already
done in ib_umem_get.
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Tested-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Perhaps the function is better written without
the empty bail: label and without setting ret
and just using return.
Combining the int/bool conversion of any and the
direct returns makes the resulting code clearer.
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The variable all is being set but is never read after this
hence it can be removed from the for loop initialization.
Cleans up clang warning:
drivers/infiniband/hw/qib/qib_file_ops.c:640:7: warning: Value
stored to 'any' is never read
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This failure exists with qib:
ver_rc_compare_swap:
mismatch, sequence 2, expected 123456789abcdef, got 0
The request builder was using the incorrect inlines to
build the request header resulting in incorrect data
in the atomic header.
Fix by using the appropriate inlines to create the request.
Cc: <stable@vger.kernel.org> # 4.9.x+
Fixes: 261a435184 ("IB/qib,IB/hfi: Use core common header file")
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Currently, if a port is queried that has an invalid
Maximum Transmission Unit, driver reports default MTU of 2048.
This in incorrect.
Use default value of 4096 if invalid.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
16B packets require that the path bits are masked with the LMC.
This mask is done correctly in all 16B header creation but was
left out for the RC Acknowledge.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Only insert our special drain CQEs to support ib_drain_sq/rq() after
the wq is flushed. Otherwise, existing but not yet polled CQEs can be
returned out of order to the user application. This can happen when the
QP has exited RTS but not yet flushed the QP, which can happen during
a normal close (vs abortive close).
In addition never count the drain CQEs when determining how many CQEs
need to be synthesized during the flush operation. This latter issue
should never happen if the QP is properly flushed before inserting the
drain CQE, but I wanted to avoid corrupting the CQ state. So we handle
it and log a warning once.
Fixes: 4fe7c2962e ("iw_cxgb4: refactor sq/rq drain logic")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>