Leon Romanovsky says:
====================
This is short series for mlx5 from Meir that adds a DevX UID to the UAR.
====================
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* mellanox/mlx5-next:
IB/mlx5: Enable UAR to have DevX UID
net/mlx5: Add uid field to UAR allocation structures
UID field was added to alloc_uar and dealloc_uar PRM command, to specify
DevX UID for UAR. This change enables firmware validating user access to
its own UAR resources.
For the kernel allocated UARs the UID will stay 0 as of today.
Signed-off-by: Meir Lichtinger <meirl@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
This patch removes kmem cache allocation and deallocation in favor of
having the ipoib_txreq in the ring.
The consumer is now the packet sending side allocating tx descriptors from
ring and the producer is the napi interrupt handling freeing tx
descriptors.
The locks are now eliminated because the napi tx lock insures a single
consumer and the napi handling insures a single producer.
The napi poll is converted to memory poll looking for items that have been
marked completed.
Fixes: d99dc602e2 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
Link: https://lore.kernel.org/r/20210913132826.131370.4397.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
gcc 8.3 and 5.4 throw this:
In function 'modify_qp_init_to_rtr',
././include/linux/compiler_types.h:322:38: error: call to '__compiletime_assert_1859' declared with attribute error: FIELD_PREP: value too large for the field
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
[..]
drivers/infiniband/hw/hns/hns_roce_common.h:91:52: note: in expansion of macro 'FIELD_PREP'
*((__le32 *)ptr + (field_h) / 32) |= cpu_to_le32(FIELD_PREP( \
^~~~~~~~~~
drivers/infiniband/hw/hns/hns_roce_common.h:95:39: note: in expansion of macro '_hr_reg_write'
#define hr_reg_write(ptr, field, val) _hr_reg_write(ptr, field, val)
^~~~~~~~~~~~~
drivers/infiniband/hw/hns/hns_roce_hw_v2.c:4412:2: note: in expansion of macro 'hr_reg_write'
hr_reg_write(context, QPC_LP_PKTN_INI, lp_pktn_ini);
Because gcc has miscalculated the constantness of lp_pktn_ini:
mtu = ib_mtu_enum_to_int(ib_mtu);
if (WARN_ON(mtu < 0)) [..]
lp_pktn_ini = ilog2(MAX_LP_MSG_LEN / mtu);
Since mtu is limited to {256,512,1024,2048,4096} lp_pktn_ini is between 4
and 8 which is compatible with the 4 bit field in the FIELD_PREP.
Work around this broken compiler by adding a 'can never be true'
constraint on lp_pktn_ini's value which clears out the problem.
Fixes: f0cb411aad ("RDMA/hns: Use new interface to modify QP context")
Link: https://lore.kernel.org/r/0-v1-c773ecb137bc+11f-hns_gcc8_jgg@nvidia.com
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Due to duplicate reset flags, CQP commands are processed during reset.
This leads CQP failures such as below:
irdma0: [Delete Local MAC Entry Cmd Error][op_code=49] status=-27 waiting=1 completion_err=0 maj=0x0 min=0x0
Remove the redundant flag and set the correct reset flag so CPQ is paused
during reset
Fixes: 8498a30e1b ("RDMA/irdma: Register auxiliary driver and implement private channel OPs")
Link: https://lore.kernel.org/r/20210916191222.824-2-shiraz.saleem@intel.com
Reported-by: LiLiang <liali@redhat.com>
Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
When the FRMR is allocated with single page, driver is attempting to
create a level 0 HWQ and not allocating any page because the nopte field
is set. This causes the crash during post_send as the pbl is not
populated.
To avoid this crash, check for the nopte bit during HWQ creation with
single page and create a level 1 page table and populate the pbl address
correctly.
Link: https://lore.kernel.org/r/1631709163-2287-9-git-send-email-selvin.xavier@broadcom.com
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Driver has 1ms delay between the polling for atomic command completion.
Polling immediately after issuing command usually doesn't report any
completions. So all commands in the blocking path needs two iterations. So
effectively 1ms spend on each command. HW requires much lesser time for
each command. So reduce the delay to 1us and increase the iteration count
to wait for the same time.
Link: https://lore.kernel.org/r/1631709163-2287-5-git-send-email-selvin.xavier@broadcom.com
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
clang becomes confused due to the comparison to NULL in a integer constant
expression context:
>> drivers/infiniband/hw/qib/qib_sysfs.c:413:1: error: static_assert expression is not an integral constant expression
QIB_DIAGC_ATTR(rc_resends);
^~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/infiniband/hw/qib/qib_sysfs.c:406:16: note: expanded from macro 'QIB_DIAGC_ATTR'
static_assert(&((struct qib_ibport *)0)->rvp.n_##N != (u64 *)NULL); \
Nathan found __same_type that solves this problem nicely, so use it instead.
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Pull rdma fixes from Jason Gunthorpe:
"I don't usually send a second PR in the merge window, but the fix to
mlx5 is significant enough that it should start going through the
process ASAP. Along with it comes some of the usual -rc stuff that
would normally wait for a -rc2 or so.
Summary:
Important error case regression fixes in mlx5:
- Wrong size used when computing the error path smaller allocation
request leads to corruption
- Confusing but ultimately harmless alignment mis-calculation
Static checker warning fixes:
- NULL pointer subtraction in qib
- kcalloc in bnxt_re
- Missing static on global variable in hfi1"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/hfi1: make hist static
RDMA/bnxt_re: Prefer kcalloc over open coded arithmetic
IB/qib: Fix null pointer subtraction compiler warning
RDMA/mlx5: Fix xlt_chunk_align calculation
RDMA/mlx5: Fix number of allocated XLT entries
As noted in the "Deprecated Interfaces, Language Features, Attributes, and
Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing. This could lead to
values wrapping around and a smaller allocation being made than the caller
was expecting. Using those allocations could lead to linear overflows of
heap memory and other misbehaviors.
In this case this is not actually dynamic sizes: both sides of the
multiplication are constant values. However it is best to refactor this
anyway, just to keep the open-coded math idiom out of code.
So, use the purpose specific kcalloc() function instead of the argument
size * count in the kzalloc() function.
Also, remove the unnecessary initialization of the sqp_tbl variable since
it is set a few lines later.
[1] https://www.kernel.org/doc/html/v5.14/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments
Link: https://lore.kernel.org/r/20210905081812.17113-1-len.baker@gmx.com
Signed-off-by: Len Baker <len.baker@gmx.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>> drivers/infiniband/hw/qib/qib_sysfs.c:411:1: warning: performing pointer subtraction with a null pointer has undefined behavior
+[-Wnull-pointer-subtraction]
QIB_DIAGC_ATTR(rc_resends);
^~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/infiniband/hw/qib/qib_sysfs.c:408:51: note: expanded from macro 'QIB_DIAGC_ATTR'
.counter = &((struct qib_ibport *)0)->rvp.n_##N - (u64 *)0, \
Use offsetof and accomplish the type check using static_assert.
Fixes: 4a7aaf88c8 ("RDMA/qib: Use attributes for the port sysfs")
Link: https://lore.kernel.org/r/0-v1-43ae3c759177+65-qib_type_jgg@nvidia.com
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The XLT chunk alignment depends on ent_size not sizeof(ent_size) aka
sizeof(size_t). The incoming ent_size is either 8 or 16, so the
miscalculation when 16 is required is only an over-alignment and
functional harmless.
Fixes: 8010d74b99 ("RDMA/mlx5: Split the WR setup out of mlx5_ib_update_xlt()")
Link: https://lore.kernel.org/r/20210908081849.7948-2-schnelle@linux.ibm.com
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In commit 8010d74b99 ("RDMA/mlx5: Split the WR setup out of
mlx5_ib_update_xlt()") the allocation logic was split out of
mlx5_ib_update_xlt() and the logic was changed to enable better OOM
handling. Sadly this change introduced a miscalculation of the number of
entries that were actually allocated when under memory pressure where it
can actually become 0 which on s390 lets dma_map_single() fail.
It can also lead to corruption of the free pages list when the wrong
number of entries is used in the calculation of sg->length which is used
as argument for free_pages().
Fix this by using the allocation size instead of misusing get_order(size).
Cc: stable@vger.kernel.org
Fixes: 8010d74b99 ("RDMA/mlx5: Split the WR setup out of mlx5_ib_update_xlt()")
Link: https://lore.kernel.org/r/20210908081849.7948-1-schnelle@linux.ibm.com
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Pull rdma updates from Jason Gunthorpe:
"This is quite a small cycle, no major series stands out. The HNS and
rxe drivers saw the most activity this cycle, with rxe being broken
for a good chunk of time. The significant deleted line count is due to
a SPDX cleanup series.
Summary:
- Various cleanup and small features for rtrs
- kmap_local_page() conversions
- Driver updates and fixes for: efa, rxe, mlx5, hfi1, qed, hns
- Cache the IB subnet prefix
- Rework how CRC is calcuated in rxe
- Clean reference counting in iwpm's netlink
- Pull object allocation and lifecycle for user QPs to the uverbs
core code
- Several small hns features and continued general code cleanups
- Fix the scatterlist confusion of orig_nents/nents introduced in an
earlier patch creating the append operation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (90 commits)
RDMA/mlx5: Relax DCS QP creation checks
RDMA/hns: Delete unnecessary blank lines.
RDMA/hns: Encapsulate the qp db as a function
RDMA/hns: Adjust the order in which irq are requested and enabled
RDMA/hns: Remove RST2RST error prints for hw v1
RDMA/hns: Remove dqpn filling when modify qp from Init to Init
RDMA/hns: Fix QP's resp incomplete assignment
RDMA/hns: Fix query destination qpn
RDMA/hfi1: Convert to SPDX identifier
IB/rdmavt: Convert to SPDX identifier
RDMA/hns: Bugfix for incorrect association between dip_idx and dgid
RDMA/hns: Bugfix for the missing assignment for dip_idx
RDMA/hns: Bugfix for data type of dip_idx
RDMA/hns: Fix incorrect lsn field
RDMA/irdma: Remove the repeated declaration
RDMA/core/sa_query: Retry SA queries
RDMA: Use the sg_table directly and remove the opencoded version from umem
lib/scatterlist: Fix wrong update of orig_nents
lib/scatterlist: Provide a dedicated function to support table append
RDMA/hns: Delete unused hns bitmap interface
...
From Maor Gottlieb
====================
Fix the use of nents and orig_nents in the sg table append helpers. The
nents should be used by the DMA layer to store the number of DMA mapped
sges, the orig_nents is the number of CPU sges.
Since the sg append logic doesn't always create a SGL with exactly
orig_nents entries store a total_nents as well to allow the table to be
properly free'd and reorganize the freeing logic to share across all the
use cases.
====================
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* 'sg_nents':
RDMA: Use the sg_table directly and remove the opencoded version from umem
lib/scatterlist: Fix wrong update of orig_nents
lib/scatterlist: Provide a dedicated function to support table append
dip_idx and dgid should be a one-to-one mapping relationship, but when
qp_num loops back to the start number, it may happen that two different
dgid are assiociated to the same dip_idx incorrectly.
One solution is to store the qp_num that is not assigned to dip_idx in an
array. When a dip_idx needs to be allocated to a new dgid, an spare qp_num
is extracted and assigned to dip_idx.
Fixes: f91696f2f0 ("RDMA/hns: Support congestion control type selection according to the FW")
Link: https://lore.kernel.org/r/1629884592-23424-4-git-send-email-liangwenpeng@huawei.com
Signed-off-by: Junxian Huang <huangjunxian4@hisilicon.com>
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>