Commit Graph

13104 Commits

Author SHA1 Message Date
Sergey Gorenko
39b169ea0d IB/iser: Fix RNR errors
Some users complain about RNR errors on the target, when heavy
high-priority tasks run on the initiator. After the investigation, we
found out that the receive WRs were exhausted, because the initiator could
not post them on time.

Receive work reqeusts are posted in chunks to reduce the number of hits to
the HCA. The WRs are posted in the receive completion handler when the
number of free receive buffers reaches the threshold. But on a high-loaded
host, receive CQEs processing can be delayed and all receive WRs will be
exhausted. In this case, the target will get an RNR error.

To avoid this, we post receive WR, as soon as possible and not in a
batch. This increases the number of hits to the HCA, but also the common
implementation in most of Linux ULPs (e.g. NVMe-oF/RDMA). As a rule of
thumb, performance improvements and heuristics are being added to the RDMA
core layer or vendors low level drivers and it's about time to align iSER
as well.

Link: https://lore.kernel.org/r/20211215135721.3662-3-mgurtovoy@nvidia.com
Signed-off-by: Sergey Gorenko <sergeygo@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Israel Rukshin <israelr@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-05 19:36:20 -04:00
Max Gurtovoy
b28801a089 IB/iser: Remove deprecated pi_guard module param
No need for this dead code. This commit doesn't change any functionality
since one can still run "modprobe ib_iser pi_guard=<type>".

Link: https://lore.kernel.org/r/20211215135721.3662-2-mgurtovoy@nvidia.com
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-05 19:36:20 -04:00
Maher Sanalla
fbdb0ba705 IB/mlx5: Expose NDR speed through MAD
Under MAD query port, Report NDR speed when NDR is supported in the port
capability mask.

Link: https://lore.kernel.org/r/a2ab630d2a634547db9b581faa9d65da2edb9d05.1639554831.git.leonro@nvidia.com
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-05 16:40:35 -04:00
Leon Romanovsky
b35a0f4dd5 RDMA/core: Don't infoleak GRH fields
If dst->is_global field is not set, the GRH fields are not cleared
and the following infoleak is reported.

=====================================================
BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:121 [inline]
BUG: KMSAN: kernel-infoleak in _copy_to_user+0x1c9/0x270 lib/usercopy.c:33
 instrument_copy_to_user include/linux/instrumented.h:121 [inline]
 _copy_to_user+0x1c9/0x270 lib/usercopy.c:33
 copy_to_user include/linux/uaccess.h:209 [inline]
 ucma_init_qp_attr+0x8c7/0xb10 drivers/infiniband/core/ucma.c:1242
 ucma_write+0x637/0x6c0 drivers/infiniband/core/ucma.c:1732
 vfs_write+0x8ce/0x2030 fs/read_write.c:588
 ksys_write+0x28b/0x510 fs/read_write.c:643
 __do_sys_write fs/read_write.c:655 [inline]
 __se_sys_write fs/read_write.c:652 [inline]
 __ia32_sys_write+0xdb/0x120 fs/read_write.c:652
 do_syscall_32_irqs_on arch/x86/entry/common.c:114 [inline]
 __do_fast_syscall_32+0x96/0xf0 arch/x86/entry/common.c:180
 do_fast_syscall_32+0x34/0x70 arch/x86/entry/common.c:205
 do_SYSENTER_32+0x1b/0x20 arch/x86/entry/common.c:248
 entry_SYSENTER_compat_after_hwframe+0x4d/0x5c

Local variable resp created at:
 ucma_init_qp_attr+0xa4/0xb10 drivers/infiniband/core/ucma.c:1214
 ucma_write+0x637/0x6c0 drivers/infiniband/core/ucma.c:1732

Bytes 40-59 of 144 are uninitialized
Memory access of size 144 starts at ffff888167523b00
Data copied to user address 0000000020000100

CPU: 1 PID: 25910 Comm: syz-executor.1 Not tainted 5.16.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
=====================================================

Fixes: 4ba66093bd ("IB/core: Check for global flag when using ah_attr")
Link: https://lore.kernel.org/r/0e9dd51f93410b7b2f4f5562f52befc878b71afa.1641298868.git.leonro@nvidia.com
Reported-by: syzbot+6d532fa8f9463da290bc@syzkaller.appspotmail.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-05 16:30:19 -04:00
Kamal Heib
e375b9c929 RDMA/cxgb4: Set queue pair state when being queried
The API for ib_query_qp requires the driver to set cur_qp_state on return,
add the missing set.

Fixes: 67bbc05512 ("RDMA/cxgb4: Add query_qp support")
Link: https://lore.kernel.org/r/20211220152530.60399-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-05 15:51:51 -04:00
Chengchang Tang
38d2208824 RDMA/hns: Remove support for HIP06
HIP06 is no longer supported. In order to reduce unnecessary maintenance,
the code of HIP06 is removed.

Link: https://lore.kernel.org/r/20211220130558.61585-1-liangwenpeng@huawei.com
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-05 15:50:56 -04:00
Leon Romanovsky
36783dec8d RDMA/rxe: Delete deprecated module parameters interface
Starting from the commit 66920e1b25 ("rdma_rxe: Use netlink messages
to add/delete links") from the 2019, the RXE modules parameters are marked
as deprecated in favour of rdmatool. So remove the kernel code too.

Link: https://lore.kernel.org/r/c8376d7517aebe7cc851f0baaeef7b13707cf767.1641372460.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-05 15:18:47 -04:00
Leon Romanovsky
d82e2b27ad RDMA/mad: Delete duplicated init_query_mad functions
Several drivers used same function to initialize query MAD,
so move that function to global header file.

Link: https://lore.kernel.org/r/af6f35c590ff5ef56d0137351b8b295af0f7c13c.1641369858.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-05 15:18:36 -04:00
Li Zhijian
d8b0afd29c RDMA/rxe: Fix indentations and operators sytle
* Fix these up to always have the '+', and '|' on the continuing line
  which is the normal kernel style.
* Fix indentations correspondingly

NOTE: this patch also remove the 2 redundant plus in
IB_OPCODE_RD_FETCH_ADD and IB_OPCODE_RD_COMPARE_SWAP

Link: https://lore.kernel.org/r/20220105042605.14343-1-lizhijian@fujitsu.com
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-05 15:17:13 -04:00
Greg Kroah-Hartman
01097139e7 RDMA: Use default_groups in kobj_type
There are currently 2 ways to create a set of sysfs files for a kobj_type,
through the default_attrs field, and the default_groups field.  Move the
IB code to use default_groups field which has been the preferred way since
commit aa30f47cf6 ("kobject: Add support for default attribute groups to
kobj_type") so that we can soon get rid of the obsolete default_attrs
field.

Link: https://lore.kernel.org/r/20220103152259.531034-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-05 15:15:15 -04:00
Jiasheng Jiang
7694a7de22 RDMA/uverbs: Check for null return of kmalloc_array
Because of the possible failure of the allocation, data might be NULL
pointer and will cause the dereference of the NULL pointer later.
Therefore, it might be better to check it and return -ENOMEM.

Fixes: 6884c6c4bd ("RDMA/verbs: Store the write/write_ex uapi entry points in the uverbs_api")
Link: https://lore.kernel.org/r/20211231093315.1917667-1-jiasheng@iscas.ac.cn
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-05 14:16:53 -04:00
Dust Li
a7ad9ddeb5 RDMA/mlx5: Print wc status on CQE error and dump needed
mlx5_handle_error_cqe() only dump the content of the CQE which is raw hex
data, and not straighforward for debug.  Print WC status message when we
got CQE error and dump is need.

Here is an example of how the dmesg log looks like with this:

 infiniband mlx5_0: mlx5_handle_error_cqe:333:(pid 0): WC error: 10, message: remote access error
 infiniband mlx5_0: dump_cqe:272:(pid 0): dump error cqe
 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 00000030: 00 00 00 00 00 00 88 13 08 03 61 b3 1e a1 42 d3

Link: https://lore.kernel.org/r/20211227123806.47530-1-dust.li@linux.alibaba.com
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-05 14:08:51 -04:00
Chengguang Xu
8d1cfb884e RDMA/rxe: Fix a typo in opcode name
There is a redundant ']' in the name of opcode IB_OPCODE_RC_SEND_MIDDLE,
so just fix it.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20211218112320.3558770-1-cgxu519@mykernel.net
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-05 13:57:36 -04:00
Zhu Yanjun
8803836fe7 RDMA/rxe: Remove the unused xmit_errors member
The member variable xmit_errors can be replaced with

 rxe_counter_inc(rxe, RXE_CNT_SEND_ERR)

Link: https://lore.kernel.org/r/20211216054842.1099428-1-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-05 13:56:50 -04:00
Minghao Chi
47920e4d2c RDMA/rxe: Remove redundant err variable
Return value directly instead of taking this in another redundant
variable.

Link: https://lore.kernel.org/20211215075258.442930-1-chi.minghao@zte.com.cn
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Reviewed-by: Devesh Sharma <Devesh.s.sharma@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-05 13:47:43 -04:00
Minghao Chi
37c995ed19 RDMA/ocrdma: Remove unneeded variable
Return status directly from function called.

Link: https://lore.kernel.org/r/20211215055421.441375-1-chi.minghao@zte.com.cn
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-05 13:47:43 -04:00
Maor Gottlieb
4163cb3d19 Revert "RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow"
This patch is not the full fix and still causes to call traces
during mlx5_ib_dereg_mr().

This reverts commit f0ae4afe3d.

Fixes: f0ae4afe3d ("RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow")
Link: https://lore.kernel.org/r/20211222101312.1358616-1-maorg@nvidia.com
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-05 09:04:30 -04:00
Li Zhijian
8ff5f5d9d8 RDMA/rxe: Prevent double freeing rxe_map_set()
The same rxe_map_set could be freed twice:

rxe_reg_user_mr()
  -> rxe_mr_init_user()
    -> rxe_mr_free_map_set() # 1st

  -> rxe_drop_ref()
   ...
    -> rxe_mr_cleanup()
      -> rxe_mr_free_map_set() # 2nd

Follow normal convection and put resource cleanup either in the error
unwind of the allocator, or the overall free function. Leave the object
unchanged with a NULL cur_map_set on failure and remove the unncessary
free in rxe_mr_init_user().

Link: https://lore.kernel.org/r/20211228014406.1033444-1-lizhijian@cn.fujitsu.com
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-04 10:29:34 -04:00
David S. Miller
e63a023489 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-12-30

The following pull-request contains BPF updates for your *net-next* tree.

We've added 72 non-merge commits during the last 20 day(s) which contain
a total of 223 files changed, 3510 insertions(+), 1591 deletions(-).

The main changes are:

1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii.

2) Beautify and de-verbose verifier logs, from Christy.

3) Composable verifier types, from Hao.

4) bpf_strncmp helper, from Hou.

5) bpf.h header dependency cleanup, from Jakub.

6) get_func_[arg|ret|arg_cnt] helpers, from Jiri.

7) Sleepable local storage, from KP.

8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-31 14:35:40 +00:00
Jakub Kicinski
aec53e60e0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
  commit 077cdda764 ("net/mlx5e: TC, Fix memory leak with rules with internal port")
  commit 31108d142f ("net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'")
  commit 4390c6edc0 ("net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'")
  https://lore.kernel.org/all/20211229065352.30178-1-saeed@kernel.org/

net/smc/smc_wr.c
  commit 49dc9013e3 ("net/smc: Use the bitmap API when applicable")
  commit 349d43127d ("net/smc: fix kernel panic caused by race of smc_sock")
  bitmap_zero()/memset() is removed by the fix

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-30 12:12:12 -08:00
Jakub Kicinski
b6459415b3 net: Don't include filter.h from net/sock.h
sock.h is pretty heavily used (5k objects rebuilt on x86 after
it's touched). We can drop the include of filter.h from it and
add a forward declaration of struct sk_filter instead.
This decreases the number of rebuilt objects when bpf.h
is touched from ~5k to ~1k.

There's a lot of missing includes this was masking. Primarily
in networking tho, this time.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/bpf/20211229004913.513372-1-kuba@kernel.org
2021-12-29 08:48:14 -08:00
David E. Box
27963d3da4 RDMA/irdma: Use auxiliary_device driver data helpers
Use auxiliary_get_drvdata and auxiliary_set_drvdata helpers.

Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com>
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20211221235852.323752-2-david.e.box@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-22 13:59:01 +01:00
Jakub Kicinski
7cd2802d74 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16 16:13:19 -08:00
David S. Miller
823f7a5497 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:

====================
mlx5-next branch 2021-12-15

Hi Dave, Jakub, Jason

This pulls mlx5-next branch into net-next and rdma branches.
All patches already reviewed on both rdma and netdev mailing lists.

Please pull and let me know if there's any problem.

1) Add multiple FDB steering priorities [1]
2) Introduce HW bits needed to configure MAC list size of VF/SF.
   Required for ("net/mlx5: Memory optimizations") upcoming series [2].

[1] https://lore.kernel.org/netdev/20211201193621.9129-1-saeed@kernel.org/
[2] https://lore.kernel.org/lkml/20211208141722.13646-1-shayd@nvidia.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16 10:22:20 +00:00
Jason Gunthorpe
c8f476da84 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:
====================
Currently, the driver ignores the user's priority for flow steering rules
in FDB namespace. Change it and create the rule in the right priority.

It will allow to create FDB steering rules in up to 16 different
priorities.
====================

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

* mellanox/mlx5-next:
  RDMA/mlx5: Add support to multiple priorities for FDB rules
  net/mlx5: Create more priorities for FDB bypass namespace
  net/mlx5: Refactor mlx5_get_flow_namespace
  net/mlx5: Separate FDB namespace
2021-12-15 08:15:53 -04:00
Kees Cook
59aa7fcfe2 IB/mthca: Use memset_startat() for clearing mpt_entry
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.

Use memset_startat() so memset() doesn't get confused about writing beyond
the destination member that is intended to be the starting point of
zeroing through the end of the struct.

Link: https://lore.kernel.org/r/20211213223331.135412-15-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-14 20:21:23 -04:00
Kees Cook
c2ed5611af iw_cxgb4: Use memset_startat() for cpl_t5_pass_accept_rpl
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.

Use memset_startat() so memset() doesn't get confused about writing beyond
the destination member that is intended to be the starting point of
zeroing through the end of the struct. Additionally, since everything
appears to perform a roundup (including allocation), just change the size
of the struct itself and add a build-time check to validate the expected
size.

Link: https://lore.kernel.org/r/20211213223331.135412-13-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-14 20:21:22 -04:00
Kees Cook
e517f76a3c RDMA/mlx5: Use memset_after() to zero struct mlx5_ib_mr
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.

Use memset_after() to zero the end of struct mlx5_ib_mr that should be
initialized.

Link: https://lore.kernel.org/r/20211213223331.135412-10-keescook@chromium.org
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-14 20:21:22 -04:00
Jason Gunthorpe
4922f09209 Merge tag 'v5.16-rc5' into rdma.git for-next
Required due to dependencies in following patches.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-14 20:18:48 -04:00
Jiacheng Shi
12d3bbdd6b RDMA/hns: Replace kfree() with kvfree()
Variables allocated by kvmalloc_array() should not be freed by kfree.
Because they may be allocated by vmalloc.  So we replace kfree() with
kvfree() here.

Fixes: 6fd610c573 ("RDMA/hns: Support 0 hop addressing for SRQ buffer")
Link: https://lore.kernel.org/r/20211210094234.5829-1-billsjc@sjtu.edu.cn
Signed-off-by: Jiacheng Shi <billsjc@sjtu.edu.cn>
Acked-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-14 20:13:29 -04:00
Avihai Horon
20679094a0 RDMA/cma: Let cma_resolve_ib_dev() continue search even after empty entry
Currently, when cma_resolve_ib_dev() searches for a matching GID it will
stop searching after encountering the first empty GID table entry. This
behavior is wrong since neither IB nor RoCE spec enforce tightly packed
GID tables.

For example, when the matching valid GID entry exists at index N, and if a
GID entry is empty at index N-1, cma_resolve_ib_dev() will fail to find
the matching valid entry.

Fix it by making cma_resolve_ib_dev() continue searching even after
encountering missing entries.

Fixes: f17df3b0de ("RDMA/cma: Add support for AF_IB to rdma_resolve_addr()")
Link: https://lore.kernel.org/r/b7346307e3bb396c43d67d924348c6c496493991.1639055490.git.leonro@nvidia.com
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-14 20:09:50 -04:00
Avihai Horon
483d805191 RDMA/core: Let ib_find_gid() continue search even after empty entry
Currently, ib_find_gid() will stop searching after encountering the first
empty GID table entry. This behavior is wrong since neither IB nor RoCE
spec enforce tightly packed GID tables.

For example, when a valid GID entry exists at index N, and if a GID entry
is empty at index N-1, ib_find_gid() will fail to find the valid entry.

Fix it by making ib_find_gid() continue searching even after encountering
missing entries.

Fixes: 5eb620c81c ("IB/core: Add helpers for uncached GID and P_Key searches")
Link: https://lore.kernel.org/r/e55d331b96cecfc2cf19803d16e7109ea966882d.1639055490.git.leonro@nvidia.com
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-14 20:09:50 -04:00
Avihai Horon
109f2d39a6 RDMA/core: Modify rdma_query_gid() to return accurate error codes
Modify rdma_query_gid() to return -ENOENT for empty entries. This will
make error reporting more accurate and will be used in next patches.

Link: https://lore.kernel.org/r/1f2b65dfb4d995e74b621e3e21e7c7445d187956.1639055490.git.leonro@nvidia.com
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-14 20:09:50 -04:00
José Expósito
bee90911e0 IB/qib: Fix memory leak in qib_user_sdma_queue_pkts()
The wrong goto label was used for the error case and missed cleanup of the
pkt allocation.

Fixes: d39bf40e55 ("IB/qib: Protect from buffer overflow in struct qib_user_sdma_pkt fields")
Link: https://lore.kernel.org/r/20211208175238.29983-1-jose.exposito89@gmail.com
Addresses-Coverity-ID: 1493352 ("Resource leak")
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-14 20:05:13 -04:00
Yixing Liu
0045e0d3f4 RDMA/hns: Support direct wqe of userspace
The current write wqe mechanism is to write to DDR first, and then notify
the hardware through doorbell to read the data. Direct wqe is a mechanism
to fill wqe directly into the hardware. In the case of light load, the wqe
will be filled into pcie bar space of the hardware, this will reduce one
memory access operation and therefore reduce the latency. SIMD
instructions allows cpu to write the 512 bits at one time to device
memory, thus it can be used for posting direct wqe.

Add direct wqe enable switch and address mapping.

Link: https://lore.kernel.org/r/20211207124901.42123-2-liangwenpeng@huawei.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-14 19:59:07 -04:00
Yangyang Li
4ad8181426 RDMA/hns: Fix RNR retransmission issue for HIP08
Due to the discrete nature of the HIP08 timer unit, a requester might
finish the timeout period sooner, in elapsed real time, than its responder
does, even when both sides share the identical RNR timeout length included
in the RNR Nak packet and the responder indeed starts the timing prior to
the requester. Furthermore, if a 'providential' resend packet arrived
before the responder's timeout period expired, the responder is certainly
entitled to drop the packet silently in the light of IB protocol.

To address this problem, our team made good use of certain hardware facts:

1) The timing resolution regards the transmission arrangements is 1
   microsecond, e.g. if cq_period field is set to 3, it would be
   interpreted as 3 microsecond by hardware

2) A QPC field shall inform the hardware how many timing unit (ticks)
   constitutes a full microsecond, which, by default, is 1000

3) It takes 14ns for the processor to handle a packet in the buffer, so
   the RNR timeout length of 10ns would ensure our processing mechanism is
   disabled during the entire timeout period and the packet won't be
   dropped silently

To achieve (3), we permanently set the QPC field mentioned in (2) to zero
which nominally indicates every time tick is equivalent to a microsecond
in wall-clock time; now, a RNR timeout period at face value of 10 would
only last 10 ticks, which is 10ns in wall-clock time.

It's worth noting that we adapt the driver by magnifying certain
configuration parameters(cq_period, eq_period and ack_timeout)by 1000
given the user assumes the configuring timing unit to be microseconds.

Also, this particular improvisation is only deployed on HIP08 since other
hardware has already solved this issue.

Fixes: cfc85f3e4b ("RDMA/hns: Add profile support for hip08 driver")
Link: https://lore.kernel.org/r/20211209140655.49493-1-liangwenpeng@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-14 19:45:04 -04:00
Maor Gottlieb
a973f86b41 RDMA/mlx5: Add support to multiple priorities for FDB rules
Currently, the driver ignores the user's priority for flow steering
rules in FDB namespace. Change it and create the rule in the right
priority.
It will allow to create FDB steering rules in up to 16 different
priorities.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-13 16:03:00 -08:00
Maor Gottlieb
22c3f2f56b net/mlx5: Separate FDB namespace
This patch doesn't add an additional namespaces, but just separates the
naming to be used by each FDB user, bypass and kernel.
Downstream patches will actually split this up and allow to have more
than single priority for the bypass users.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-13 16:02:59 -08:00
Nitesh Narayan Lal
fb5bd85471 RDMA/irdma: Use irq_update_affinity_hint()
The driver uses irq_set_affinity_hint() to update the affinity_hint mask
that is consumed by the userspace to distribute the interrupts. However,
under the hood irq_set_affinity_hint() also applies the provided cpumask
(if not NULL) as the affinity for the given interrupt which is an
undocumented side effect.

To remove this side effect irq_set_affinity_hint() has been marked
as deprecated and new interfaces have been introduced. Hence, replace the
irq_set_affinity_hint() with the new interface irq_update_affinity_hint()
that only updates the affinity_hint pointer.

Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://lore.kernel.org/r/20210903152430.244937-7-nitesh@redhat.com
2021-12-10 20:47:38 +01:00
Kamal Heib
b1a4da64bf RDMA/qedr: Fix reporting max_{send/recv}_wr attrs
Fix the wrongly reported max_send_wr and max_recv_wr attributes for user
QP by making sure to save their valuse on QP creation, so when query QP is
called the attributes will be reported correctly.

Fixes: cecbcddf64 ("qedr: Add support for QP verbs")
Link: https://lore.kernel.org/r/20211206201314.124947-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07 14:59:26 -04:00
Zhu Yanjun
3fe6d228a0 RDMA/rxe: Remove the unnecessary variable
The variable pkey is assigned from a macro. Then this variable is passed
to a function bth_init directly, and pkey is not used again. So remove it
and use the macro directly.

Fixes: 76251e15ea ("RDMA/rxe: Remove pkey table")
Link: https://lore.kernel.org/r/20211207194057.713289-1-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07 13:56:22 -04:00
Tatyana Nikolova
10467ce09f RDMA/irdma: Don't arm the CQ more than two times if no CE for this CQ
Completion events (CEs) are lost if the application is allowed to arm the
CQ more than two times when no new CE for this CQ has been generated by
the HW.

Check if arming has been done for the CQ and if not, arm the CQ for any
event otherwise promote to arm the CQ for any event only when the last arm
event was solicited.

Fixes: b48c24c2d7 ("RDMA/irdma: Implement device supported verb APIs")
Link: https://lore.kernel.org/r/20211201231509.1930-2-shiraz.saleem@intel.com
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07 13:53:01 -04:00
Shiraz Saleem
25b5d6fd6d RDMA/irdma: Report correct WC errors
Return IBV_WC_REM_OP_ERR for responder QP errors instead of
IBV_WC_REM_ACCESS_ERR.

Return IBV_WC_LOC_QP_OP_ERR for errors detected on the SQ with bad opcodes

Fixes: 44d9e52977 ("RDMA/irdma: Implement device initialization definitions")
Link: https://lore.kernel.org/r/20211201231509.1930-1-shiraz.saleem@intel.com
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07 13:53:01 -04:00
Christophe JAILLET
117697cc93 RDMA/irdma: Fix a potential memory allocation issue in 'irdma_prm_add_pble_mem()'
'pchunk->bitmapbuf' is a bitmap. Its size (in number of bits) is stored in
'pchunk->sizeofbitmap'.

When it is allocated, the size (in bytes) is computed by:
   size_in_bits >> 3

There are 2 issues (numbers bellow assume that longs are 64 bits):
   - there is no guarantee here that 'pchunk->bitmapmem.size' is modulo
     BITS_PER_LONG but bitmaps are stored as longs
     (sizeofbitmap=8 bits will only allocate 1 byte, instead of 8 (1 long))

   - the number of bytes is computed with a shift, not a round up, so we
     may allocate less memory than needed
     (sizeofbitmap=65 bits will only allocate 8 bytes (i.e. 1 long), when 2
     longs are needed = 16 bytes)

Fix both issues by using 'bitmap_zalloc()' and remove the useless
'bitmapmem' from 'struct irdma_chunk'.

While at it, remove some useless NULL test before calling
kfree/bitmap_free.

Fixes: 915cc7ac0f ("RDMA/irdma: Add miscellaneous utility definitions")
Link: https://lore.kernel.org/r/5e670b640508e14b1869c3e8e4fb970d78cbe997.1638692171.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07 13:45:48 -04:00
Shiraz Saleem
1e11a39a82 RDMA/irdma: Fix a user-after-free in add_pble_prm
When irdma_hmc_sd_one fails, 'chunk' is freed while its still on the PBLE
info list.

Add the chunk entry to the PBLE info list only after successful setting of
the SD in irdma_hmc_sd_one.

Fixes: e8c4dbc2fc ("RDMA/irdma: Add PBLE resource manager")
Link: https://lore.kernel.org/r/20211207152135.2192-1-shiraz.saleem@intel.com
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07 13:43:52 -04:00
Mike Marciniszyn
60a8b5a161 IB/hfi1: Fix leak of rcvhdrtail_dummy_kvaddr
This buffer is currently allocated in hfi1_init():

	if (reinit)
		ret = init_after_reset(dd);
	else
		ret = loadtime_init(dd);
	if (ret)
		goto done;

	/* allocate dummy tail memory for all receive contexts */
	dd->rcvhdrtail_dummy_kvaddr = dma_alloc_coherent(&dd->pcidev->dev,
							 sizeof(u64),
							 &dd->rcvhdrtail_dummy_dma,
							 GFP_KERNEL);

	if (!dd->rcvhdrtail_dummy_kvaddr) {
		dd_dev_err(dd, "cannot allocate dummy tail memory\n");
		ret = -ENOMEM;
		goto done;
	}

The reinit triggered path will overwrite the old allocation and leak it.

Fix by moving the allocation to hfi1_alloc_devdata() and the deallocation
to hfi1_free_devdata().

Link: https://lore.kernel.org/r/20211129192008.101968.91302.stgit@awfm-01.cornelisnetworks.com
Cc: stable@vger.kernel.org
Fixes: 46b010d3ee ("staging/rdma/hfi1: Workaround to prevent corruption during packet delivery")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07 13:22:54 -04:00
Mike Marciniszyn
f6a3cfec3c IB/hfi1: Fix early init panic
The following trace can be observed with an init failure such as firmware
load failures:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  PGD 0 P4D 0
  Oops: 0010 [#1] SMP PTI
  CPU: 0 PID: 537 Comm: kworker/0:3 Tainted: G           OE    --------- -  - 4.18.0-240.el8.x86_64 #1
  Workqueue: events work_for_cpu_fn
  RIP: 0010:0x0
  Code: Bad RIP value.
  RSP: 0000:ffffae5f878a3c98 EFLAGS: 00010046
  RAX: 0000000000000000 RBX: ffff95e48e025c00 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff95e48e025c00
  RBP: ffff95e4bf3660a4 R08: 0000000000000000 R09: ffffffff86d5e100
  R10: ffff95e49e1de600 R11: 0000000000000001 R12: ffff95e4bf366180
  R13: ffff95e48e025c00 R14: ffff95e4bf366028 R15: ffff95e4bf366000
  FS:  0000000000000000(0000) GS:ffff95e4df200000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffffffffffffd6 CR3: 0000000f86a0a003 CR4: 00000000001606f0
  Call Trace:
   receive_context_interrupt+0x1f/0x40 [hfi1]
   __free_irq+0x201/0x300
   free_irq+0x2e/0x60
   pci_free_irq+0x18/0x30
   msix_free_irq.part.2+0x46/0x80 [hfi1]
   msix_clean_up_interrupts+0x2b/0x70 [hfi1]
   hfi1_init_dd+0x640/0x1a90 [hfi1]
   do_init_one.isra.19+0x34d/0x680 [hfi1]
   local_pci_probe+0x41/0x90
   work_for_cpu_fn+0x16/0x20
   process_one_work+0x1a7/0x360
   worker_thread+0x1cf/0x390
   ? create_worker+0x1a0/0x1a0
   kthread+0x112/0x130
   ? kthread_flush_work_fn+0x10/0x10
   ret_from_fork+0x35/0x40

The free_irq() results in a callback to the registered interrupt handler,
and rcd->do_interrupt is NULL because the receive context data structures
are not fully initialized.

Fix by ensuring that the do_interrupt is always assigned and adding a
guards in the slow path handler to detect and handle a partially
initialized receive context and noop the receive.

Link: https://lore.kernel.org/r/20211129192003.101968.33612.stgit@awfm-01.cornelisnetworks.com
Cc: stable@vger.kernel.org
Fixes: b0ba3c18d6 ("IB/hfi1: Move normal functions from hfi1_devdata to const array")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07 13:22:54 -04:00
Mike Marciniszyn
b6d57e24ce IB/hfi1: Insure use of smp_processor_id() is preempt disabled
The following BUG has just surfaced with our 5.16 testing:

  BUG: using smp_processor_id() in preemptible [00000000] code: mpicheck/1581081
  caller is sdma_select_user_engine+0x72/0x210 [hfi1]
  CPU: 0 PID: 1581081 Comm: mpicheck Tainted: G S                5.16.0-rc1+ #1
  Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
  Call Trace:
   <TASK>
   dump_stack_lvl+0x33/0x42
   check_preemption_disabled+0xbf/0xe0
   sdma_select_user_engine+0x72/0x210 [hfi1]
   ? _raw_spin_unlock_irqrestore+0x1f/0x31
   ? hfi1_mmu_rb_insert+0x6b/0x200 [hfi1]
   hfi1_user_sdma_process_request+0xa02/0x1120 [hfi1]
   ? hfi1_write_iter+0xb8/0x200 [hfi1]
   hfi1_write_iter+0xb8/0x200 [hfi1]
   do_iter_readv_writev+0x163/0x1c0
   do_iter_write+0x80/0x1c0
   vfs_writev+0x88/0x1a0
   ? recalibrate_cpu_khz+0x10/0x10
   ? ktime_get+0x3e/0xa0
   ? __fget_files+0x66/0xa0
   do_writev+0x65/0x100
   do_syscall_64+0x3a/0x80

Fix this long standing bug by moving the smp_processor_id() to after the
rcu_read_lock().

The rcu_read_lock() implicitly disables preemption.

Link: https://lore.kernel.org/r/20211129191958.101968.87329.stgit@awfm-01.cornelisnetworks.com
Cc: stable@vger.kernel.org
Fixes: 0cb2aa690c ("IB/hfi1: Add sysfs interface for affinity setup")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07 13:22:54 -04:00
Mike Marciniszyn
9292f8f9a2 IB/hfi1: Correct guard on eager buffer deallocation
The code tests the dma address which legitimately can be 0.

The code should test the kernel logical address to avoid leaking eager
buffer allocations that happen to map to a dma address of 0.

Fixes: 60368186fd ("IB/hfi1: Fix user-space buffers mapping with IOMMU enabled")
Link: https://lore.kernel.org/r/20211129191952.101968.17137.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07 13:22:53 -04:00
Kamal Heib
0a0575a12e RDMA/bnxt_re: Fix endianness warning for req.pkey
Fix the following sparse warning:

drivers/infiniband/hw/bnxt_re/qplib_fp.c:1260:26: sparse: warning: incorrect type in assignment (different base types)

Fixes: 0e938533d9 ("RDMA/bnxt_re: Remove dynamic pkey table")
Link: https://lore.kernel.org/r/20211205204537.14184-1-kamalheib1@gmail.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-06 19:56:30 -04:00