Commit Graph

12884 Commits

Author SHA1 Message Date
Christophe JAILLET
967a578af0 RDMA/cxgb4: Use bitmap_set() when applicable
The 'alloc->table' bitmap has just been allocated, so this is safe to use
the faster and non-atomic 'bitmap_set()' function. There is no need to
hand-write it.

Link: https://lore.kernel.org/r/fd978b837935ed04863ffecfd495c4601a986df6.1637789139.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-25 13:29:06 -04:00
Christophe JAILLET
d4fdc383c0 RDMA/cxgb4: Use bitmap_zalloc() when applicable
Use 'bitmap_zalloc()' to simplify code, improve the semantic and avoid
some open-coded arithmetic in allocator arguments.

Using the 'zalloc' version of the allocator also saves a now useless
'bitmap_zero()' call.

Also change the corresponding 'kfree()' into 'bitmap_free()' to keep
consistency.

While at it, remove an extra space in a statement just a few lines above.

Link: https://lore.kernel.org/r/e396c4aa16cd8945d43877570a8f6d926cea555a.1637789139.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-25 13:29:06 -04:00
Christophe JAILLET
675e2694fc IB/mthca: Use non-atomic bitmap functions when possible in 'mthca_mr.c'
In 'mthca_buddy_init()', the 'buddy->bits[n]' bitmap has just been
allocated, so no concurrent accesses can occur.

The other accesses to the 'buddy->bits[n]' bitmap are protected by the
'buddy->lock' spinlock, so no concurrent accesses can occur.

So prefer the non-atomic '__[set|clear]_bit()' functions to save a few
cycles.

Link: https://lore.kernel.org/r/a19b88ccdbc03972fd97306b998731814283041f.1637785902.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-25 13:29:06 -04:00
Christophe JAILLET
19453f34cf IB/mthca: Use non-atomic bitmap functions when possible in 'mthca_allocator.c'
The accesses to the 'alloc->table' bitmap are protected by the
'alloc->lock' spinlock, so no concurrent accesses can happen.

So prefer the non-atomic '__[set|clear]_bit()' functions to save a few
cycles.

Link: https://lore.kernel.org/r/5f909ca1284fa4d2cf13952b08b9e303b656c968.1637785902.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-25 13:29:06 -04:00
Christophe JAILLET
a277f38321 IB/mthca: Use bitmap_set() when applicable
The 'alloc->table' bitmap has just been allocated, so this is safe to use
the faster and non-atomic 'bitmap_set()' function. There is no need to
hand-write it.

Link: https://lore.kernel.org/r/f1bd33f6ea6c8ad519a222db6e9aa17c55610557.1637785902.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-25 13:29:05 -04:00
Christophe JAILLET
12d1e2f3c5 IB/mthca: Use bitmap_zalloc() when applicable
Use 'bitmap_zalloc()' to simplify code, improve the semantic and avoid
some open-coded arithmetic in allocator arguments.

Using the 'zalloc' version of the allocator also saves a now useless
'bitmap_zero()' call.

Also change the corresponding 'kfree()' into 'bitmap_free()' to keep
consistency.

Link: https://lore.kernel.org/r/ea9031e28f453bc179033740f66f0c19293fcf0b.1637785902.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-25 13:29:05 -04:00
Yangyang Li
b0969f8389 RDMA/hns: Do not destroy QP resources in the hw resetting phase
When hns_roce_v2_destroy_qp() is called, the brief calling process of the
driver is as follows:

 ......
 hns_roce_v2_destroy_qp
 hns_roce_v2_qp_modify
	   hns_roce_cmd_mbox
 hns_roce_qp_destroy

If hns_roce_cmd_mbox() detects that the hardware is being reset during the
execution of the hns_roce_cmd_mbox(), the driver will not be able to get
the return value from the hardware (the firmware cannot respond to the
driver's mailbox during the hardware reset phase).

The driver needs to wait for the hardware reset to complete before
continuing to execute hns_roce_qp_destroy(), otherwise it may happen that
the driver releases the resources but the hardware is still accessing. In
order to fix this problem, HNS RoCE needs to add a piece of code to wait
for the hardware reset to complete.

The original interface get_hw_reset_stat() is the instantaneous state of
the hardware reset, which cannot accurately reflect whether the hardware
reset is completed, so it needs to be replaced with the ae_dev_reset_cnt
interface.

The sign that the hardware reset is complete is that the return value of
the ae_dev_reset_cnt interface is greater than the original value
reset_cnt recorded by the driver.

Fixes: 6a04aed6af ("RDMA/hns: Fix the chip hanging caused by sending mailbox&CMQ during reset")
Link: https://lore.kernel.org/r/20211123142402.26936-1-liangwenpeng@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-25 13:20:24 -04:00
Yangyang Li
52414e27d6 RDMA/hns: Do not halt commands during reset until later
is_reset is used to indicate whether the hardware starts to reset. When
hns_roce_hw_v2_reset_notify_down() is called, the hardware has not yet
started to reset. If is_reset is set at this time, all mailbox operations
of resource destroy actions will be intercepted by driver. When the driver
cleans up resources, but the hardware is still accessed, the following
errors will appear:

  arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
  arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000350100000010
  arm-smmu-v3 arm-smmu-v3.2.auto: 	0x000002088000003f
  arm-smmu-v3 arm-smmu-v3.2.auto: 	0x00000000a50e0800
  arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000000000000000
  arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
  arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000350100000010
  arm-smmu-v3 arm-smmu-v3.2.auto: 	0x000002088000043e
  arm-smmu-v3 arm-smmu-v3.2.auto: 	0x00000000a50a0800
  arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000000000000000
  arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
  arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000350100000010
  arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000020880000436
  arm-smmu-v3 arm-smmu-v3.2.auto: 	0x00000000a50a0880
  arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000000000000000
  arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
  arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000350100000010
  arm-smmu-v3 arm-smmu-v3.2.auto: 	0x000002088000043a
  arm-smmu-v3 arm-smmu-v3.2.auto: 	0x00000000a50e0840
  hns3 0000:35:00.0: INT status: CMDQ(0x0) HW errors(0x0) other(0x0)
  arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000000000000000
  hns3 0000:35:00.0: received unknown or unhandled event of vector0
  arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
  arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000350100000010
  {34}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 7

is_reset will be set correctly in check_aedev_reset_status(), so the
setting in hns_roce_hw_v2_reset_notify_down() should be deleted.

Fixes: 726be12f5c ("RDMA/hns: Set reset flag when hw resetting")
Link: https://lore.kernel.org/r/20211123084809.37318-1-liangwenpeng@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-25 13:20:07 -04:00
Alaa Hleihel
f0ae4afe3d RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow
For the case of IB_MR_TYPE_DM the mr does doesn't have a umem, even though
it is a user MR. This causes function mlx5_free_priv_descs() to think that
it is a kernel MR, leading to wrongly accessing mr->descs that will get
wrong values in the union which leads to attempt to release resources that
were not allocated in the first place.

For example:
 DMA-API: mlx5_core 0000:08:00.1: device driver tries to free DMA memory it has not allocated [device address=0x0000000000000000] [size=0 bytes]
 WARNING: CPU: 8 PID: 1021 at kernel/dma/debug.c:961 check_unmap+0x54f/0x8b0
 RIP: 0010:check_unmap+0x54f/0x8b0
 Call Trace:
  debug_dma_unmap_page+0x57/0x60
  mlx5_free_priv_descs+0x57/0x70 [mlx5_ib]
  mlx5_ib_dereg_mr+0x1fb/0x3d0 [mlx5_ib]
  ib_dereg_mr_user+0x60/0x140 [ib_core]
  uverbs_destroy_uobject+0x59/0x210 [ib_uverbs]
  uobj_destroy+0x3f/0x80 [ib_uverbs]
  ib_uverbs_cmd_verbs+0x435/0xd10 [ib_uverbs]
  ? uverbs_finalize_object+0x50/0x50 [ib_uverbs]
  ? lock_acquire+0xc4/0x2e0
  ? lock_acquired+0x12/0x380
  ? lock_acquire+0xc4/0x2e0
  ? lock_acquire+0xc4/0x2e0
  ? ib_uverbs_ioctl+0x7c/0x140 [ib_uverbs]
  ? lock_release+0x28a/0x400
  ib_uverbs_ioctl+0xc0/0x140 [ib_uverbs]
  ? ib_uverbs_ioctl+0x7c/0x140 [ib_uverbs]
  __x64_sys_ioctl+0x7f/0xb0
  do_syscall_64+0x38/0x90

Fix it by reorganizing the dereg flow and mlx5_ib_mr structure:
 - Move the ib_umem field into the user MRs structure in the union as it's
   applicable only there.
 - Function mlx5_ib_dereg_mr() will now call mlx5_free_priv_descs() only
   in case there isn't udata, which indicates that this isn't a user MR.

Fixes: f18ec42231 ("RDMA/mlx5: Use a union inside mlx5_ib_mr")
Link: https://lore.kernel.org/r/66bb1dd253c1fd7ceaa9fc411061eefa457b86fb.1637581144.git.leonro@nvidia.com
Signed-off-by: Alaa Hleihel <alaa@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-25 13:16:39 -04:00
Pavel Skripkin
84b01721e8 RDMA: Fix use-after-free in rxe_queue_cleanup
On error handling path in rxe_qp_from_init() qp->sq.queue is freed and
then rxe_create_qp() will drop last reference to this object. qp clean up
function will try to free this queue one time and it causes UAF bug.

Fix it by zeroing queue pointer after freeing queue in rxe_qp_from_init().

Fixes: 514aee660d ("RDMA: Globally allocate and release QP memory")
Link: https://lore.kernel.org/r/20211121202239.3129-1-paskripkin@gmail.com
Reported-by: syzbot+aab53008a5adf26abe91@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-25 13:15:59 -04:00
Shiraz Saleem
774a90c1e1 RDMA/irdma: Set protocol based on PF rdma_mode flag
Set the RDMA protocol to use at driver bind time based on the ice PF's
rdma_mode flag.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Tested-by: Leszek Kaliszczuk <leszek.kaliszczuk@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-11-22 08:42:15 -08:00
Xinhao Liu
9c3631d170 RDMA/hns: Remove magic number
Don't use unintelligible constants.

Link: https://lore.kernel.org/r/20211119140208.40416-10-liangwenpeng@huawei.com
Signed-off-by: Xinhao Liu <liuxinhao5@hisilicon.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-19 14:13:44 -04:00
Yixing Liu
3183559376 RDMA/hns: Remove macros that are no longer used
These macros are no longer used, so remove them.

Link: https://lore.kernel.org/r/20211119140208.40416-9-liangwenpeng@huawei.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-19 14:13:44 -04:00
Xinhao Liu
6cb6a6cbcd RDMA/hns: Correctly initialize the members of Array[][]
Each member of Array[][] should be initialized on a separate line.

Link: https://lore.kernel.org/r/20211119140208.40416-7-liangwenpeng@huawei.com
Signed-off-by: Xinhao Liu <liuxinhao@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-19 14:13:44 -04:00
Xinhao Liu
d147583ec8 RDMA/hns: Correct the type of variables participating in the shift operation
The type of the variable participating in the shift operation should be an
unsigned type instead of a signed type.

Link: https://lore.kernel.org/r/20211119140208.40416-5-liangwenpeng@huawei.com
Signed-off-by: Xinhao Liu <liuxinhao5@hisilicon.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-19 14:13:18 -04:00
Xinhao Liu
3aecfc3802 RDMA/hns: Replace tab with space in the right-side comments
There should be a space between the code and the comment on the right.

Link: https://lore.kernel.org/r/20211119140208.40416-4-liangwenpeng@huawei.com
Signed-off-by: Xinhao Liu <liuxinhao5@hisilicon.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-19 14:13:18 -04:00
Xinhao Liu
ea393549a3 RDMA/hns: Correct the print format to be consistent with the variable type
The print format should be consistent with the variable type.

Link: https://lore.kernel.org/r/20211119140208.40416-3-liangwenpeng@huawei.com
Signed-off-by: Xinhao Liu <liuxinhao5@hisilicon.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-19 14:13:18 -04:00
Xinhao Liu
994baacc6b RDMA/hns: Correct the hex print format
The hex printf format should be "0xff" instead of "ff".

Link: https://lore.kernel.org/r/20211119140208.40416-2-liangwenpeng@huawei.com
Signed-off-by: Xinhao Liu <liuxinhao5@hisilicon.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-19 14:13:17 -04:00
Bob Pearson
88f9335fa7 RDMA/rxe: Remove some #defines from rxe_pool.h
RXE_POOL_ALIGN is only used in rxe_pool.c so move RXE_POOL_ALIGN to
rxe_pool.c from rxe_pool.h.  RXE_POOL_CACHE_FLAGS is never used so it is
deleted from rxe_pool.h

Link: https://lore.kernel.org/r/20211103050241.61293-8-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-19 13:47:08 -04:00
Bob Pearson
38ee25a311 RDMA/rxe: Remove #include "rxe_loc.h" from rxe_pool.c
rxe_loc.h is already included in rxe.h so do not include it in rxe_pool.c

Link: https://lore.kernel.org/r/20211103050241.61293-7-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-19 13:47:08 -04:00
Bob Pearson
b92d766c87 RDMA/rxe: Save object pointer in pool element
In rxe_pool.c currently there are many cases where it is necessary to
compute the offset from a pool element struct to the object containing it
in a type independent way where the offset is different for each type.  By
saving a pointer to the object when they are created extra work can be
saved.

Link: https://lore.kernel.org/r/20211103050241.61293-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-19 13:29:15 -04:00
Bob Pearson
c95acedbff RDMA/rxe: Copy setup parameters into rxe_pool
In rxe_pool.c copy remaining pool setup parameters from rxe_pool_info into
rxe_pool. This saves looking up rxe_pool_info in the performance path.

Link: https://lore.kernel.org/r/20211103050241.61293-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-19 13:29:15 -04:00
Bob Pearson
02827b6708 RDMA/rxe: Cleanup rxe_pool_entry
Currently three different names are used to describe rxe pool elements.
They are referred to as entries, elems or pelems. This patch chooses one
'elem' and changes the other ones.

Link: https://lore.kernel.org/r/20211103050241.61293-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-19 13:29:14 -04:00
Bob Pearson
21adfa7a3c RDMA/rxe: Replace irqsave locks with bh locks
Most of the locks in the rxe driver are _irqsave/restore locks but in fact
there are no interrupt threads that run rxe code or share data with
rxe. There are softirq threads and data sharing so the appropriate lock
type is _bh. This patch replaces all irqsave type locks with bh type
locks.

Link: https://lore.kernel.org/r/20211103050241.61293-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-19 13:29:14 -04:00
Dan Carpenter
10f2d1cbf8 RDMA/usnic: Clean up usnic_ib_alloc_pd()
Remove the unnecessary "umem_pd" variable.  And usnic_uiom_alloc_pd()
never returns NULL so remove the NULL check.

Link: https://lore.kernel.org/r/20211118113924.GH1147@kili
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-19 12:02:30 -04:00
Kamal Heib
46c87b4277 RDMA/cxgb4: Use helper function to set GUIDs
Use the addrconf_addr_eui48() helper function to set the GUIDs, Also make
sure the GUIDs are valid EUI-64 identifiers.

Link: https://lore.kernel.org/r/20211118100456.45423-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-19 11:44:21 -04:00
Kamal Heib
2a67fcfa0d RDMA/hns: Validate the pkey index
Before query pkey, make sure that the queried index is valid.

Fixes: 9a4435375c ("IB/hns: Add driver files for hns RoCE driver")
Link: https://lore.kernel.org/r/20211117145954.123893-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-17 16:52:18 -04:00
Kamal Heib
679f2b7552 RDMA/ocrdma: Use helper function to set GUIDs
Use addrconf_addr_eui48() helper function to set the GUIDs and remove the
driver specific version.

Link: https://lore.kernel.org/r/20211117090205.96523-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-17 16:52:07 -04:00
Leon Romanovsky
d821f7c13c RDMA/nldev: Check stat attribute before accessing it
The access to non-existent netlink attribute causes to the following
kernel panic. Fix it by checking existence before trying to read it.

  general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
  KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
  CPU: 0 PID: 6744 Comm: syz-executor.0 Not tainted 5.15.0-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:nla_get_u32 include/net/netlink.h:1554 [inline]
  RIP: 0010:nldev_stat_set_mode_doit drivers/infiniband/core/nldev.c:1909 [inline]
  RIP: 0010:nldev_stat_set_doit+0x578/0x10d0 drivers/infiniband/core/nldev.c:2040
  Code: fa 4c 8b a4 24 f8 02 00 00 48 b8 00 00 00 00 00 fc ff df c7 84 24 80 00 00 00 00 00 00 00 49 8d 7c 24 04 48 89
  fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 02
  RSP: 0018:ffffc90004acf2e8 EFLAGS: 00010247
  RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90002b94000
  RDX: 0000000000000000 RSI: ffffffff8684c5ff RDI: 0000000000000004
  RBP: ffff88807cda4000 R08: 0000000000000000 R09: ffff888023fb8027
  R10: ffffffff8684c5d7 R11: 0000000000000000 R12: 0000000000000000
  R13: 0000000000000001 R14: ffff888041024280 R15: ffff888031ade780
  FS:  00007eff9dddd700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000001b2ef24000 CR3: 0000000036902000 CR4: 00000000003506f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   rdma_nl_rcv_msg+0x36d/0x690 drivers/infiniband/core/netlink.c:195
   rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
   rdma_nl_rcv+0x2ee/0x430 drivers/infiniband/core/netlink.c:259
   netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
   netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1345
   netlink_sendmsg+0x86d/0xda0 net/netlink/af_netlink.c:1916
   sock_sendmsg_nosec net/socket.c:704 [inline]
   sock_sendmsg+0xcf/0x120 net/socket.c:724
   ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
   ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
   __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
   do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: 822cf785ac ("RDMA/nldev: Split nldev_stat_set_mode_doit out of nldev_stat_set_doit")
Link: https://lore.kernel.org/r/b21967c366f076ff1988862f9c8a1aa0244c599f.1637151999.git.leonro@nvidia.com
Reported-by: syzbot+9111d2255a9710e87562@syzkaller.appspotmail.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-17 16:45:16 -04:00
Jack Wang
378c67413d RDMA/mlx4: Do not fail the registration on port stats
If the FW doesn't support MLX4_DEV_CAP_FLAG2_DIAG_PER_PORT, mlx4 driver
will fail the ib_setup_port_attrs, which is called from
ib_register_device()/enable_device_and_get(), in the end leads to device
not detected[1][2]

To fix it, add a new mlx4_ib_hw_stats_ops1, w/o alloc_hw_port_stats if FW
does not support MLX4_DEV_CAP_FLAG2_DIAG_PER_PORT.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2014094
[2] https://lore.kernel.org/linux-rdma/CAMGffEn2wvEnmzc0xe=xYiCLqpphiHDBxCxqAELrBofbUAMQxw@mail.gmail.com

Fixes: 4b5f4d3fb4 ("RDMA: Split the alloc_hw_stats() ops to port and device variants")
Link: https://lore.kernel.org/r/20211115101519.27210-1-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-17 16:45:16 -04:00
Christophe JAILLET
a917dfb66c RDMA/bnxt_re: Scan the whole bitmap when checking if "disabling RCFW with pending cmd-bit"
The 'cmdq->cmdq_bitmap' bitmap is 'rcfw->cmdq_depth' bits long.  The size
stored in 'cmdq->bmap_size' is the size of the bitmap in bytes.

Remove this erroneous 'bmap_size' and use 'rcfw->cmdq_depth' directly in
'bnxt_qplib_disable_rcfw_channel()'. Otherwise some error messages may be
missing.

Other uses of 'cmdq_bitmap' already take into account 'rcfw->cmdq_depth'
directly.

Fixes: 1ac5a40479 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Link: https://lore.kernel.org/r/47ed717c3070a1d0f53e7b4c768a4fd11caf365d.1636707421.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-16 13:54:24 -04:00
Changcheng Deng
dd566d586f RDMA/bnxt_re: Remove unneeded variable
Fix the following coccicheck review:
./drivers/infiniband/hw/bnxt_re/main.c: 896: 5-7: Unneeded variable

Remove unneeded variable used to store return value.

Link: https://lore.kernel.org/r/20211109113227.132596-1-deng.changcheng@zte.com.cn
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn>
Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-16 13:54:24 -04:00
Kamal Heib
fc9d19e18a RDMA/irdma: Use helper function to set GUIDs
Use the addrconf_addr_eui48() helper function to set the GUIDs for both
RoCE and iWARP modes, Also make sure the GUIDs are valid EUI-64
identifiers.

Link: https://lore.kernel.org/r/20211107212227.44610-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-16 13:54:24 -04:00
Dennis Dalessandro
da86dc175b IB/hfi1: Properly allocate rdma counter desc memory
When optional counter support was added the allocation of the memory
holding the counter descriptors was not cleared properly. This caused
WARN_ON()s in the IB/sysfs code to be hit.

This is because the uninitialized memory made some of the counters wrongly
look like optional counters. Use kzalloc.

While here change the sizeof() calls to use the pointer rather than the
name of the type.

  WARNING: CPU: 0 PID: 32644 at drivers/infiniband/core/sysfs.c:1064 ib_setup_port_attrs+0x7e1/0x890 [ib_core]
  CPU: 0 PID: 32644 Comm: kworker/0:2 Tainted: G S      W 5.15.0+ #36
  Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0018.C4.072020161249 07/20/2016
  Workqueue: events work_for_cpu_fn
  RIP: 0010:ib_setup_port_attrs+0x7e1/0x890 [ib_core]
  RSP: 0018:ffffc90006ea3c40 EFLAGS: 00010202
  RAX: 0000000000000068 RBX: ffff888106ad8000 RCX: 0000000000000138
  RDX: ffff888126c84c00 RSI: ffff888103c41000 RDI: 0000000000000124
  RBP: ffff88810f63a801 R08: ffff888126c8a000 R09: 0000000000000001
  R10: ffffffffa09acf20 R11: 0000000000000065 R12: ffff88810f63a800
  R13: ffff88810f63a800 R14: ffff88810f63a8e0 R15: 0000000000000001
  FS:  0000000000000000(0000) GS:ffff888667a00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00005590102cb078 CR3: 000000000240a003 CR4: 00000000001706f0
  Call Trace:
   ib_register_device.cold.44+0x23e/0x2d0 [ib_core]
   rvt_register_device+0xfa/0x230 [rdmavt]
   hfi1_register_ib_device+0x623/0x690 [hfi1]
   init_one.cold.36+0x2d1/0x49b [hfi1]
   local_pci_probe+0x45/0x80
   work_for_cpu_fn+0x16/0x20
   process_one_work+0x1b1/0x360
   worker_thread+0x1d4/0x3a0
   kthread+0x11a/0x140
   ret_from_fork+0x22/0x30

Fixes: 5e2ddd1e59 ("RDMA/counter: Add optional counter support")
Link: https://lore.kernel.org/r/20211115200913.124104.47770.stgit@awfm-01.cornelisnetworks.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-16 13:18:24 -04:00
Leon Romanovsky
6cd7397d01 RDMA/core: Set send and receive CQ before forwarding to the driver
Preset both receive and send CQ pointers prior to call to the drivers and
overwrite it later again till the mlx4 is going to be changed do not
overwrite ibqp properties.

This change is needed for mlx5, because in case of QP creation failure, it
will go to the path of QP destroy which relies on proper CQ pointers.

 BUG: KASAN: use-after-free in create_qp.cold+0x164/0x16e [mlx5_ib]
 Write of size 8 at addr ffff8880064c55c0 by task a.out/246

 CPU: 0 PID: 246 Comm: a.out Not tainted 5.15.0+ #291
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 Call Trace:
  dump_stack_lvl+0x45/0x59
  print_address_description.constprop.0+0x1f/0x140
  kasan_report.cold+0x83/0xdf
  create_qp.cold+0x164/0x16e [mlx5_ib]
  mlx5_ib_create_qp+0x358/0x28a0 [mlx5_ib]
  create_qp.part.0+0x45b/0x6a0 [ib_core]
  ib_create_qp_user+0x97/0x150 [ib_core]
  ib_uverbs_handler_UVERBS_METHOD_QP_CREATE+0x92c/0x1250 [ib_uverbs]
  ib_uverbs_cmd_verbs+0x1c38/0x3150 [ib_uverbs]
  ib_uverbs_ioctl+0x169/0x260 [ib_uverbs]
  __x64_sys_ioctl+0x866/0x14d0
  do_syscall_64+0x3d/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae

 Allocated by task 246:
  kasan_save_stack+0x1b/0x40
  __kasan_kmalloc+0xa4/0xd0
  create_qp.part.0+0x92/0x6a0 [ib_core]
  ib_create_qp_user+0x97/0x150 [ib_core]
  ib_uverbs_handler_UVERBS_METHOD_QP_CREATE+0x92c/0x1250 [ib_uverbs]
  ib_uverbs_cmd_verbs+0x1c38/0x3150 [ib_uverbs]
  ib_uverbs_ioctl+0x169/0x260 [ib_uverbs]
  __x64_sys_ioctl+0x866/0x14d0
  do_syscall_64+0x3d/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae

 Freed by task 246:
  kasan_save_stack+0x1b/0x40
  kasan_set_track+0x1c/0x30
  kasan_set_free_info+0x20/0x30
  __kasan_slab_free+0x10c/0x150
  slab_free_freelist_hook+0xb4/0x1b0
  kfree+0xe7/0x2a0
  create_qp.part.0+0x52b/0x6a0 [ib_core]
  ib_create_qp_user+0x97/0x150 [ib_core]
  ib_uverbs_handler_UVERBS_METHOD_QP_CREATE+0x92c/0x1250 [ib_uverbs]
  ib_uverbs_cmd_verbs+0x1c38/0x3150 [ib_uverbs]
  ib_uverbs_ioctl+0x169/0x260 [ib_uverbs]
  __x64_sys_ioctl+0x866/0x14d0
  do_syscall_64+0x3d/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: 514aee660d ("RDMA: Globally allocate and release QP memory")
Link: https://lore.kernel.org/r/2dbb2e2cbb1efb188a500e5634be1d71956424ce.1636631035.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-16 13:16:50 -04:00
Linus Torvalds
fe91c4725a Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
 "This consists of the usual driver updates (ufs, smartpqi, lpfc,
  target, megaraid_sas, hisi_sas, qla2xxx) and minor updates and bug
  fixes.

  Notable core changes are the removal of scsi->tag which caused some
  churn in obsolete drivers and a sweep through all drivers to call
  scsi_done() directly instead of scsi->done() which removes a pointer
  indirection from the hot path and a move to register core sysfs files
  earlier, which means they're available to KOBJ_ADD processing, which
  necessitates switching all drivers to using attribute groups"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (279 commits)
  scsi: lpfc: Update lpfc version to 14.0.0.3
  scsi: lpfc: Allow fabric node recovery if recovery is in progress before devloss
  scsi: lpfc: Fix link down processing to address NULL pointer dereference
  scsi: lpfc: Allow PLOGI retry if previous PLOGI was aborted
  scsi: lpfc: Fix use-after-free in lpfc_unreg_rpi() routine
  scsi: lpfc: Correct sysfs reporting of loop support after SFP status change
  scsi: lpfc: Wait for successful restart of SLI3 adapter during host sg_reset
  scsi: lpfc: Revert LOG_TRACE_EVENT back to LOG_INIT prior to driver_resource_setup()
  scsi: ufs: ufshcd-pltfrm: Fix memory leak due to probe defer
  scsi: ufs: mediatek: Avoid sched_clock() misuse
  scsi: mpt3sas: Make mpt3sas_dev_attrs static
  scsi: scsi_transport_sas: Add 22.5 Gbps link rate definitions
  scsi: target: core: Stop using bdevname()
  scsi: aha1542: Use memcpy_{from,to}_bvec()
  scsi: sr: Add error handling support for add_disk()
  scsi: sd: Add error handling support for add_disk()
  scsi: target: Perform ALUA group changes in one step
  scsi: target: Replace lun_tg_pt_gp_lock with rcu in I/O path
  scsi: target: Fix alua_tg_pt_gps_count tracking
  scsi: target: Fix ordered tag handling
  ...
2021-11-05 08:42:02 -07:00
Linus Torvalds
5c904c66ed Merge tag 'char-misc-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
 "Here is the big set of char and misc and other tiny driver subsystem
  updates for 5.16-rc1.

  Loads of things in here, all of which have been in linux-next for a
  while with no reported problems (except for one called out below.)

  Included are:

   - habanana labs driver updates, including dma_buf usage, reviewed and
     acked by the dma_buf maintainers

   - iio driver update (going through this tree not staging as they
     really do not belong going through that tree anymore)

   - counter driver updates

   - hwmon driver updates that the counter drivers needed, acked by the
     hwmon maintainer

   - xillybus driver updates

   - binder driver updates

   - extcon driver updates

   - dma_buf module namespaces added (will cause a build error in arm64
     for allmodconfig, but that change is on its way through the drm
     tree)

   - lkdtm driver updates

   - pvpanic driver updates

   - phy driver updates

   - virt acrn and nitr_enclaves driver updates

   - smaller char and misc driver updates"

* tag 'char-misc-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (386 commits)
  comedi: dt9812: fix DMA buffers on stack
  comedi: ni_usb6501: fix NULL-deref in command paths
  arm64: errata: Enable TRBE workaround for write to out-of-range address
  arm64: errata: Enable workaround for TRBE overwrite in FILL mode
  coresight: trbe: Work around write to out of range
  coresight: trbe: Make sure we have enough space
  coresight: trbe: Add a helper to determine the minimum buffer size
  coresight: trbe: Workaround TRBE errata overwrite in FILL mode
  coresight: trbe: Add infrastructure for Errata handling
  coresight: trbe: Allow driver to choose a different alignment
  coresight: trbe: Decouple buffer base from the hardware base
  coresight: trbe: Add a helper to pad a given buffer area
  coresight: trbe: Add a helper to calculate the trace generated
  coresight: trbe: Defer the probe on offline CPUs
  coresight: trbe: Fix incorrect access of the sink specific data
  coresight: etm4x: Add ETM PID for Kryo-5XX
  coresight: trbe: Prohibit trace before disabling TRBE
  coresight: trbe: End the AUX handle on truncation
  coresight: trbe: Do not truncate buffer on IRQ
  coresight: trbe: Fix handling of spurious interrupts
  ...
2021-11-04 08:21:47 -07:00
Linus Torvalds
25edbc383b Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
 "A typical collection of patches this cycle, mostly fixing with a few
  new features:

   - Fixes from static tools. clang warnings, dead code, unused
     variable, coccinelle sweeps, etc

   - Driver bug fixes and minor improvements in rxe, bnxt_re, hfi1,
     mlx5, irdma, qedr

   - rtrs ULP bug fixes an improvments

   - Additional counters for bnxt_re

   - Support verbs CQ notifications in EFA

   - Continued reworking and fixing of rxe

   - netlink control to enable/disable optional device counters

   - rxe now can use AH objects for its UD path, fixing various bugs in
     the process

   - Add DMABUF support to EFA"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (103 commits)
  RDMA/core: Require the driver to set the IOVA correctly during rereg_mr
  RDMA/bnxt_re: Remove unsupported bnxt_re_modify_ah callback
  RDMA/irdma: optimize rx path by removing unnecessary copy
  RDMA/qed: Use helper function to set GUIDs
  RDMA/hns: Use the core code to manage the fixed mmap entries
  IB/opa_vnic: Rebranding of OPA VNIC driver to Cornelis Networks
  IB/qib: Rebranding of qib driver to Cornelis Networks
  IB/hfi1: Rebranding of hfi1 driver to Cornelis Networks
  RDMA/bnxt_re: Use helper function to set GUIDs
  RDMA/bnxt_re: Fix kernel panic when trying to access bnxt_re_stat_descs
  RDMA/qedr: Fix NULL deref for query_qp on the GSI QP
  RDMA/hns: Modify the value of MAX_LP_MSG_LEN to meet hardware compatibility
  RDMA/hns: Fix initial arm_st of CQ
  RDMA/rxe: Make rxe_type_info static const
  RDMA/rxe: Use 'bitmap_zalloc()' when applicable
  RDMA/rxe: Save a few bytes from struct rxe_pool
  RDMA/irdma: Remove the unused variable local_qp
  RDMA/core: Fix missed initialization of rdma_hw_stats::lock
  RDMA/efa: Add support for dmabuf memory regions
  RDMA/umem: Allow pinned dmabuf umem usage
  ...
2021-11-03 08:05:59 -07:00
Aharon Landau
f1a090f09f RDMA/core: Require the driver to set the IOVA correctly during rereg_mr
If the driver returns a new MR during rereg it has to fill it with the
IOVA from the proper source. If IB_MR_REREG_TRANS is set then the IOVA is
cmd.hca_va, otherwise the IOVA comes from the old MR. mlx5 for example has
two calls inside rereg_mr:

		return create_real_mr(new_pd, umem, mr->ibmr.iova,
				      new_access_flags);
and
		return create_real_mr(new_pd, new_umem, iova, new_access_flags);

Unconditionally overwriting the iova in the newly allocated MR will
corrupt the iova if the first path is used.

Remove the redundant initializations from ib_uverbs_rereg_mr().

Fixes: 6e0954b11c ("RDMA/uverbs: Allow drivers to create a new HW object during rereg_mr")
Link: https://lore.kernel.org/r/4b0a31bbc372842613286a10d7a8cbb0ee6069c7.1635400472.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-03 09:37:52 -03:00
Kamal Heib
dd83f482d2 RDMA/bnxt_re: Remove unsupported bnxt_re_modify_ah callback
There is no need to return always zero for function which is not
supported, especially since 0 is the wrong return code.

Link: https://lore.kernel.org/r/20211102073054.410838-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-03 09:06:36 -03:00
Jason Gunthorpe
6a463bc9d9 Merge branch 'for-rc' into rdma.git for-next
Patches held over for a possible rc8.

* for-rc:
  RDMA/qedr: Fix NULL deref for query_qp on the GSI QP
  RDMA/hns: Modify the value of MAX_LP_MSG_LEN to meet hardware compatibility
  RDMA/hns: Fix initial arm_st of CQ

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-01 15:08:19 -03:00
Jason Gunthorpe
a2a2a69d14 Merge tag 'v5.15' into rdma.git for-next
Pull in the accepted for-rc patches as the next merge needs a newer base.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-01 14:49:20 -03:00
Zhu Yanjun
9ed8110c9b RDMA/irdma: optimize rx path by removing unnecessary copy
In the function irdma_post_recv, the function irdma_copy_sg_list is
not needed since the struct irdma_sge and ib_sge have the similar
member variables. The struct irdma_sge can be replaced with the
struct ib_sge totally.

This can increase the rx performance of irdma.

Link: https://lore.kernel.org/r/20211030104226.253346-1-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-01 14:39:33 -03:00
Chengchang Tang
6d202d9f70 RDMA/hns: Use the core code to manage the fixed mmap entries
Add a new implementation for mmap by using the new mmap entry API. This
makes way for further use of the dynamic mmap allocator in this driver.

Link: https://lore.kernel.org/r/20211028105640.1056-1-liangwenpeng@huawei.com
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29 14:07:31 -03:00
Scott Breyer
4892298c3a IB/opa_vnic: Rebranding of OPA VNIC driver to Cornelis Networks
Changes instances of Intel to Cornelis in identifying strings

Link: https://lore.kernel.org/r/20211028124611.26694.71239.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Scott Breyer <scott.breyer@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29 13:30:43 -03:00
Scott Breyer
840f4ed2d4 IB/qib: Rebranding of qib driver to Cornelis Networks
Changes instances of Intel to Cornelis in identifying strings

Link: https://lore.kernel.org/r/20211028124606.26694.71567.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Scott Breyer <scott.breyer@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29 13:30:43 -03:00
Scott Breyer
ddf65f28dd IB/hfi1: Rebranding of hfi1 driver to Cornelis Networks
Changes instances of Intel to Cornelis in identifying strings

Link: https://lore.kernel.org/r/20211028124601.26694.35662.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Scott Breyer <scott.breyer@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29 13:30:43 -03:00
Kamal Heib
493620b1c9 RDMA/bnxt_re: Use helper function to set GUIDs
Use addrconf_addr_eui48() helper function to set the GUIDs and remove the
driver specific version.

Link: https://lore.kernel.org/r/20211028094359.160407-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29 13:24:09 -03:00
Kamal Heib
04567caf96 RDMA/bnxt_re: Fix kernel panic when trying to access bnxt_re_stat_descs
For some reason when introducing the fixed commit the "active_pds" and
"active_ahs" descriptors got dropped, which lead to the following panic
when trying to access the first entry in the descriptors.

 bnxt_re: Broadcom NetXtreme-C/E RoCE Driver
 BUG: kernel NULL pointer dereference, address: 0000000000000000
 CPU: 2 PID: 594 Comm: kworker/u32:1 Not tainted 5.15.0-rc6+ #2
 Hardware name: Dell Inc. PowerEdge R430/0CN7X8, BIOS 2.12.1 12/07/2020
 Workqueue: bnxt_re bnxt_re_task [bnxt_re]
 RIP: 0010:strlen+0x0/0x20
 Code: 48 89 f9 74 09 48 83 c1 01 80 39 00 75 f7 31 d2 44 0f b6 04 16 44 88 04 11 48 83 c2 01 45 84 c0 75 ee c3 0f 1f 80 00 00 00 00 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 31
 RSP: 0018:ffffb25fc47dfbb0 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000008100
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
 RBP: 0000000000000000 R08: 00000000fffffff4 R09: 0000000000000000
 R10: ffff8a05c71fc028 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: ffff8a05c3dee800
 FS:  0000000000000000(0000) GS:ffff8a092fc40000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000048d3da001 CR4: 00000000001706e0
 Call Trace:
  kernfs_name_hash+0x12/0x80
  kernfs_find_ns+0x35/0xd0
  kernfs_remove_by_name_ns+0x32/0x90
  remove_files+0x2b/0x60
  create_files+0x1d3/0x1f0
  internal_create_group+0x17b/0x1f0
  internal_create_groups.part.0+0x3d/0xa0
  setup_port+0x180/0x3b0 [ib_core]
  ? __cond_resched+0x16/0x40
  ? kmem_cache_alloc_trace+0x278/0x3d0
  ib_setup_port_attrs+0x99/0x240 [ib_core]
  ib_register_device+0xcc/0x160 [ib_core]
  bnxt_re_task+0xba/0x170 [bnxt_re]
  process_one_work+0x1eb/0x390
  worker_thread+0x53/0x3d0
  ? process_one_work+0x390/0x390
  kthread+0x10f/0x130
  ? set_kthread_struct+0x40/0x40
  ret_from_fork+0x22/0x30

Fixes: 13f30b0fa0 ("RDMA/counter: Add a descriptor in struct rdma_hw_stats")
Link: https://lore.kernel.org/r/20211027205448.127821-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29 12:10:33 -03:00
Alok Prasad
4f960393a0 RDMA/qedr: Fix NULL deref for query_qp on the GSI QP
This patch fixes a crash caused by querying the QP via netlink, and
corrects the state of GSI qp. GSI qp's have a NULL qed_qp.

The call trace is generated by:
 $ rdma res show

 BUG: kernel NULL pointer dereference, address: 0000000000000034
 Hardware name: Dell Inc. PowerEdge R720/0M1GCR, BIOS 1.2.6 05/10/2012
 RIP: 0010:qed_rdma_query_qp+0x33/0x1a0 [qed]
 RSP: 0018:ffffba560a08f580 EFLAGS: 00010206
 RAX: 0000000200000000 RBX: ffffba560a08f5b8 RCX: 0000000000000000
 RDX: ffffba560a08f5b8 RSI: 0000000000000000 RDI: ffff9807ee458090
 RBP: ffffba560a08f5a0 R08: 0000000000000000 R09: ffff9807890e7048
 R10: ffffba560a08f658 R11: 0000000000000000 R12: 0000000000000000
 R13: ffff9807ee458090 R14: ffff9807f0afb000 R15: ffffba560a08f7ec
 FS:  00007fbbf8bfe740(0000) GS:ffff980aafa00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000034 CR3: 00000001720ba001 CR4: 00000000000606f0
 Call Trace:
  qedr_query_qp+0x82/0x360 [qedr]
  ib_query_qp+0x34/0x40 [ib_core]
  ? ib_query_qp+0x34/0x40 [ib_core]
  fill_res_qp_entry_query.isra.26+0x47/0x1d0 [ib_core]
  ? __nla_put+0x20/0x30
  ? nla_put+0x33/0x40
  fill_res_qp_entry+0xe3/0x120 [ib_core]
  res_get_common_dumpit+0x3f8/0x5d0 [ib_core]
  ? fill_res_cm_id_entry+0x1f0/0x1f0 [ib_core]
  nldev_res_get_qp_dumpit+0x1a/0x20 [ib_core]
  netlink_dump+0x156/0x2f0
  __netlink_dump_start+0x1ab/0x260
  rdma_nl_rcv+0x1de/0x330 [ib_core]
  ? nldev_res_get_cm_id_dumpit+0x20/0x20 [ib_core]
  netlink_unicast+0x1b8/0x270
  netlink_sendmsg+0x33e/0x470
  sock_sendmsg+0x63/0x70
  __sys_sendto+0x13f/0x180
  ? setup_sgl.isra.12+0x70/0xc0
  __x64_sys_sendto+0x28/0x30
  do_syscall_64+0x3a/0xb0
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Cc: stable@vger.kernel.org
Fixes: cecbcddf64 ("qedr: Add support for QP verbs")
Link: https://lore.kernel.org/r/20211027184329.18454-1-palok@marvell.com
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29 11:54:16 -03:00