Commit Graph

1773 Commits

Author SHA1 Message Date
Leon Romanovsky
103140eccb RDMA/restrack: Move restrack_clean to be symmetrical to restrack_init
The fact that resource tracking commit 02d8883f52 ("RDMA/restrack: Add
general infrastructure to track RDMA resources") was added immediately
after commit 16c1975f10 ("IB/mlx5: Create profile infrastructure to add
and remove stages") caused us to miss the fact that PD and CQ are created
after ib_register_device, but released after ib_unregister_device() and
not before as it is expected from normal flow.

Fix introduced in commit 42cea83f95 ("IB/mlx5: Fix cleanup order on
unload") revealed this fact, so this patch is needed to avoid from
restrack warnings

It fixes resource tracking warnings during shutdown.

[   43.473906] CPU: 5 PID: 3016 Comm: modprobe Not tainted 4.16.0-rc5-for-linust-perf-2018-03-19_07-01-58-14 #1
[   43.473907] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
[   43.473919] RIP: 0010:rdma_restrack_clean+0x25/0x30 [ib_core]
[   43.473921] RSP: 0018:ffffc9000267be48 EFLAGS: 00010282
[   43.473924] RAX: 0000000000000000 RBX: ffff88033c690070 RCX: 0000000180080006
[   43.473925] RDX: ffff88035ce922e0 RSI: ffffea000cf1a200 RDI: ffff88033c6907c8
[   43.473926] RBP: ffff88033c690070 R08: ffff88033c689000 R09: 0000000180080006
[   43.473927] R10: 000000003c68a001 R11: ffff88033c689000 R12: ffff88033c690000
[   43.473929] R13: ffff88033c69005c R14: 0000000000000000 R15: 0000000000000000
[   43.473932] FS:  00007f5928359740(0000) GS:ffff88036c540000(0000) knlGS:0000000000000000
[   43.473933] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   43.473935] CR2: 00007ffffc760cc8 CR3: 000000035620c000 CR4: 00000000000006e0
[   43.473940] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   43.473941] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   43.473942] Call Trace:
[   43.473969]  ib_unregister_device+0xf5/0x190 [ib_core]
[   43.474000]  __mlx5_ib_remove+0x2e/0x40 [mlx5_ib]
[   43.474098]  mlx5_remove_device+0xf5/0x120 [mlx5_core]
[   43.474132]  mlx5_unregister_interface+0x37/0x90 [mlx5_core]
[   43.474142]  mlx5_ib_cleanup+0xc/0x16a [mlx5_ib]
[   43.474152]  SyS_delete_module+0x159/0x260
[   43.474159]  do_syscall_64+0x61/0x110
[   43.474165]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[   43.474168] RIP: 0033:0x7f59278466b7
[   43.474170] RSP: 002b:00007ffffc763e38 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
[   43.474172] RAX: ffffffffffffffda RBX: 000000000130d590 RCX: 00007f59278466b7
[   43.474173] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000000000130d5f8
[   43.474175] RBP: 0000000000000000 R08: 00007f5927b0b060 R09: 00007f59278b6a40
[   43.474176] R10: 00007ffffc763bc0 R11: 0000000000000202 R12: 0000000000000000
[   43.474177] R13: 0000000000000001 R14: 000000000130d5f8 R15: 0000000000000000
[   43.474179] Code: 84 00 00 00 00 00 0f 1f 44 00 00 48 83 c7 28 31 c0
eb 0c 48 83 c0 08 48 3d 00 08 00 00 74 0f 48 8d 14 07 48 8b 12 48 85 d2
74 e8 <0f> 0b c3 f3 c3 66 0f 1f 44 00 00 0f 1f 44 00 00 53 48 8b 47 28
[   43.474221] ---[ end trace e89771e2250ffc23 ]---

Fixes: 42cea83f95 ("IB/mlx5: Fix cleanup order on unload")
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-21 14:22:22 -06:00
Leon Romanovsky
e8980d67d6 RDMA/ucma: Ensure that CM_ID exists prior to access it
Prior to access UCMA commands, the context should be initialized
and connected to CM_ID with ucma_create_id(). In case user skips
this step, he can provide non-valid ctx without CM_ID and cause
to multiple NULL dereferences.

Also there are situations where the create_id can be raced with
other user access, ensure that the context is only shared to
other threads once it is fully initialized to avoid the races.

[  109.088108] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[  109.090315] IP: ucma_connect+0x138/0x1d0
[  109.092595] PGD 80000001dc02d067 P4D 80000001dc02d067 PUD 1da9ef067 PMD 0
[  109.095384] Oops: 0000 [#1] SMP KASAN PTI
[  109.097834] CPU: 0 PID: 663 Comm: uclose Tainted: G    B 4.16.0-rc1-00062-g2975d5de6428 #45
[  109.100816] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
[  109.105943] RIP: 0010:ucma_connect+0x138/0x1d0
[  109.108850] RSP: 0018:ffff8801c8567a80 EFLAGS: 00010246
[  109.111484] RAX: 0000000000000000 RBX: 1ffff100390acf50 RCX: ffffffff9d7812e2
[  109.114496] RDX: 1ffffffff3f507a5 RSI: 0000000000000297 RDI: 0000000000000297
[  109.117490] RBP: ffff8801daa15600 R08: 0000000000000000 R09: ffffed00390aceeb
[  109.120429] R10: 0000000000000001 R11: ffffed00390aceea R12: 0000000000000000
[  109.123318] R13: 0000000000000120 R14: ffff8801de6459c0 R15: 0000000000000118
[  109.126221] FS:  00007fabb68d6700(0000) GS:ffff8801e5c00000(0000) knlGS:0000000000000000
[  109.129468] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  109.132523] CR2: 0000000000000020 CR3: 00000001d45d8003 CR4: 00000000003606b0
[  109.135573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  109.138716] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  109.142057] Call Trace:
[  109.144160]  ? ucma_listen+0x110/0x110
[  109.146386]  ? wake_up_q+0x59/0x90
[  109.148853]  ? futex_wake+0x10b/0x2a0
[  109.151297]  ? save_stack+0x89/0xb0
[  109.153489]  ? _copy_from_user+0x5e/0x90
[  109.155500]  ucma_write+0x174/0x1f0
[  109.157933]  ? ucma_resolve_route+0xf0/0xf0
[  109.160389]  ? __mod_node_page_state+0x1d/0x80
[  109.162706]  __vfs_write+0xc4/0x350
[  109.164911]  ? kernel_read+0xa0/0xa0
[  109.167121]  ? path_openat+0x1b10/0x1b10
[  109.169355]  ? fsnotify+0x899/0x8f0
[  109.171567]  ? fsnotify_unmount_inodes+0x170/0x170
[  109.174145]  ? __fget+0xa8/0xf0
[  109.177110]  vfs_write+0xf7/0x280
[  109.179532]  SyS_write+0xa1/0x120
[  109.181885]  ? SyS_read+0x120/0x120
[  109.184482]  ? compat_start_thread+0x60/0x60
[  109.187124]  ? SyS_read+0x120/0x120
[  109.189548]  do_syscall_64+0xeb/0x250
[  109.192178]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[  109.194725] RIP: 0033:0x7fabb61ebe99
[  109.197040] RSP: 002b:00007fabb68d5e98 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[  109.200294] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fabb61ebe99
[  109.203399] RDX: 0000000000000120 RSI: 00000000200001c0 RDI: 0000000000000004
[  109.206548] RBP: 00007fabb68d5ec0 R08: 0000000000000000 R09: 0000000000000000
[  109.209902] R10: 0000000000000000 R11: 0000000000000202 R12: 00007fabb68d5fc0
[  109.213327] R13: 0000000000000000 R14: 00007fff40ab2430 R15: 00007fabb68d69c0
[  109.216613] Code: 88 44 24 2c 0f b6 84 24 6e 01 00 00 88 44 24 2d 0f
b6 84 24 69 01 00 00 88 44 24 2e 8b 44 24 60 89 44 24 30 e8 da f6 06 ff
31 c0 <66> 41 83 7c 24 20 1b 75 04 8b 44 24 64 48 8d 74 24 20 4c 89 e7
[  109.223602] RIP: ucma_connect+0x138/0x1d0 RSP: ffff8801c8567a80
[  109.226256] CR2: 0000000000000020

Fixes: 7521663857 ("RDMA/cma: Export rdma cm interface to userspace")
Reported-by: <syzbot+36712f50b0552615bf59@syzkaller.appspotmail.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-20 11:07:21 -06:00
Matan Barak
185899ee8d IB/uverbs: Enable ioctl() uAPI by default for new verbs
Enable the ioctl() uAPI for IB by default if the standard write()
uAPI (INFINIBAND_USER_ACCESS) is enabled. Verbs that are
also available under the old write() uAPI are put inside a new
INFINIBAND_EXP_LEGACY_VERBS_NEW_UAPI Kconfig.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:45:17 -06:00
Matan Barak
41b2a71fc8 IB/uverbs: Move ioctl path of create_cq and destroy_cq to a new file
Currently, all objects are declared in uverbs_std_types. This could lead
to a huge file once we implement all objects, methods and handlers.
Moving each object to its own file to keep the files smaller and more
readable. uverbs_std_types.c will only contain the parsing tree
definition and objects without any methods.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:45:17 -06:00
Matan Barak
dfb1395573 IB/uverbs: Expose parsing tree of all common objects to providers
The ioctl() based uverbs is based on merging feature trees. This teaches
the generic parser how to parse methods according to the provider's
support. In order to support merging with the common objects, exporting
the common-object-tree to the provider drivers.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:45:17 -06:00
Matan Barak
c66db31113 IB/uverbs: Safely extend existing attributes
Previously, we've used UVERBS_ATTR_SPEC_F_MIN_SZ for extending existing
attributes. The behavior of this flag was the kernel accepts anything
bigger than the minimum size it specified. This is unsafe, since in
order to safely extend an attribute, we need to make sure unknown size
is zeroed. Replacing UVERBS_ATTR_SPEC_F_MIN_SZ with
UVERBS_ATTR_SPEC_F_MIN_SZ_OR_ZERO, which essentially checks that the
unknown size is zero. In addition, attributes are now decorated with
UVERBS_ATTR_TYPE and UVERBS_ATTR_STRUCT, so we can provide the minimum
and known length.

Users of this flag needs to use copy_from_or_zero functions/macros.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:45:17 -06:00
Matan Barak
1f07e08fab IB/uverbs: Enable compact representation of uverbs_attr_spec
Downstream patches extend uverbs_attr_spec with new fields.
In order to save space, we move the type and flags fields to
the various attribute flavors contained in the union.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:45:17 -06:00
Matan Barak
0ede73bc01 IB/uverbs: Extend uverbs_ioctl header with driver_id
Extending uverbs_ioctl header with driver_id and another reserved
field. driver_id should be used in order to identify the driver.
Since every driver could have its own parsing tree, this is necessary
for strace support.
Downstream patches take off the EXPERIMENTAL flag from the ioctl() IB
support and thus we add some reserved fields for future usage.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:45:17 -06:00
Matan Barak
1f7ff9d5d3 IB/uverbs: Move to new headers and make naming consistent
Use macros to make names consistent in ioctl() uAPI:
The ioctl() uAPI works with object-method hierarchy. The method part
also states which handler should be executed when this method is called
from user-space. Therefore, we need to tie method, method's id, method's
handler and the object owning this method together.
Previously, this was done through explicit developer chosen names.
This makes grepping the code harder. Changing the method's name,
method's handler and object's name to be automatically generated based
on the ids.

The headers are split in a way so they be included and used by
user-space. One header strictly contains structures that are used
directly by user-space applications, where another header is used for
internal library (i.e. libibverbs) to form the ioctl() commands.
Other header simply contains the required general command structure.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:45:17 -06:00
Leon Romanovsky
ed65a4dc22 RDMA/ucma: Fix use-after-free access in ucma_close
The error in ucma_create_id() left ctx in the list of contexts belong
to ucma file descriptor. The attempt to close this file descriptor causes
to use-after-free accesses while iterating over such list.

Fixes: 7521663857 ("RDMA/cma: Export rdma cm interface to userspace")
Reported-by: <syzbot+dcfd344365a56fbebd0f@syzkaller.appspotmail.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:01:35 -06:00
Parav Pandit
6d5b2047fe IB/core: Use rdma_is_port_valid()
Use rdma_is_port_valid() which performs port validity check instead of
open coding the same check.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 11:41:40 -06:00
Honggang Li
311d0da974 IB/core: Set speed string to SDR for invalid active rates
Before commit f1b65df5a2 ("IB/mlx5: Add support for active_width and
active_speed in RoCE"), the mlx5_ib driver set default active_width and
active_speed to IB_WIDTH_4X and IB_SPEED_QDR.

Now, the active_width and active_speed are zeros if the RoCE port
is in DOWN state. The speed string should be set to " SDR" instead of
a blank string when active_speed is zero.

Signed-off-by: Honggang Li <honli@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 11:39:47 -06:00
Leon Romanovsky
7d9a935e16 RDMA/restrack: Don't rely on uninitialized variable in restrack_add flow
The restrack code relies on the fact that object structures are zeroed at
the allocation stage, the mlx4 CQ wasn't allocated with kzalloc and it
caused to the following crash.

[  137.392209] general protection fault: 0000 [#1] SMP KASAN PTI
[  137.392972] CPU: 0 PID: 622 Comm: ibv_rc_pingpong Tainted: G        W        4.16.0-rc1-00099-g00313983cda6 #11
[  137.395079] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
[  137.396866] RIP: 0010:rdma_restrack_del+0xc8/0xf0
[  137.397762] RSP: 0018:ffff8801b54e7968 EFLAGS: 00010206
[  137.399008] RAX: 0000000000000000 RBX: ffff8801d8bcbae8 RCX: ffffffffb82314df
[  137.400055] RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: 70696b533d454741
[  137.401103] RBP: ffff8801d90c07a0 R08: ffff8801d8bcbb00 R09: 0000000000000000
[  137.402470] R10: 0000000000000001 R11: ffffed0036a9cf52 R12: ffff8801d90c0ad0
[  137.403318] R13: ffff8801d853fb20 R14: ffff8801d8bcbb28 R15: 0000000000000014
[  137.404736] FS:  00007fb415d43740(0000) GS:ffff8801e5c00000(0000) knlGS:0000000000000000
[  137.406074] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  137.407101] CR2: 00007fb41557df20 CR3: 00000001b580c001 CR4: 00000000003606b0
[  137.408308] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  137.409352] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  137.410385] Call Trace:
[  137.411058]  ib_destroy_cq+0x23/0x60
[  137.411460]  uverbs_free_cq+0x37/0xa0
[  137.412040]  remove_commit_idr_uobject+0x38/0xf0
[  137.413042]  _rdma_remove_commit_uobject+0x5c/0x160
[  137.413782]  ? lookup_get_idr_uobject+0x39/0x50
[  137.414737]  rdma_remove_commit_uobject+0x3b/0x70
[  137.415742]  ib_uverbs_destroy_cq+0x114/0x1d0
[  137.416260]  ? ib_uverbs_req_notify_cq+0x160/0x160
[  137.417073]  ? kernel_text_address+0x5c/0x90
[  137.417805]  ? __kernel_text_address+0xe/0x30
[  137.418766]  ? unwind_get_return_address+0x2f/0x50
[  137.419558]  ib_uverbs_write+0x453/0x6a0
[  137.420220]  ? show_ibdev+0x90/0x90
[  137.420653]  ? __kasan_slab_free+0x136/0x180
[  137.421155]  ? kmem_cache_free+0x78/0x1e0
[  137.422192]  ? remove_vma+0x83/0x90
[  137.422614]  ? do_munmap+0x447/0x6c0
[  137.423045]  ? vm_munmap+0xb0/0x100
[  137.423481]  ? SyS_munmap+0x1d/0x30
[  137.424120]  ? do_syscall_64+0xeb/0x250
[  137.424984]  ? entry_SYSCALL_64_after_hwframe+0x21/0x86
[  137.425611]  ? lru_add_drain_all+0x270/0x270
[  137.426116]  ? lru_add_drain_cpu+0xa3/0x170
[  137.426616]  ? lru_add_drain+0x11/0x20
[  137.427058]  ? free_pages_and_swap_cache+0xa6/0x120
[  137.427672]  ? tlb_flush_mmu_free+0x78/0x90
[  137.428168]  ? arch_tlb_finish_mmu+0x6d/0xb0
[  137.428680]  __vfs_write+0xc4/0x350
[  137.430917]  ? kernel_read+0xa0/0xa0
[  137.432758]  ? remove_vma+0x90/0x90
[  137.434781]  ? __kasan_slab_free+0x14b/0x180
[  137.437486]  ? remove_vma+0x83/0x90
[  137.439836]  ? kmem_cache_free+0x78/0x1e0
[  137.442195]  ? percpu_counter_add_batch+0x1d/0x90
[  137.444389]  vfs_write+0xf7/0x280
[  137.446030]  SyS_write+0xa1/0x120
[  137.447867]  ? SyS_read+0x120/0x120
[  137.449670]  ? mm_fault_error+0x180/0x180
[  137.451539]  ? _cond_resched+0x16/0x50
[  137.453697]  ? SyS_read+0x120/0x120
[  137.455883]  do_syscall_64+0xeb/0x250
[  137.457686]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[  137.459595] RIP: 0033:0x7fb415637b94
[  137.461315] RSP: 002b:00007ffdebea7d88 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  137.463879] RAX: ffffffffffffffda RBX: 00005565022d1bd0 RCX: 00007fb415637b94
[  137.466519] RDX: 0000000000000018 RSI: 00007ffdebea7da0 RDI: 0000000000000003
[  137.469543] RBP: 00007ffdebea7d98 R08: 0000000000000000 R09: 00005565022d40c0
[  137.472479] R10: 00000000000009cf R11: 0000000000000246 R12: 00005565022d2520
[  137.475125] R13: 00000000000003e8 R14: 0000000000000000 R15: 00007ffdebea7fd0
[  137.477760] Code: f7 e8 dd 0d 0b ff 48 c7 43 40 00 00 00 00 48 89 df e8 0d 0b 0b ff 48 8d 7b 28 c6 03 00 e8 41 0d 0b ff 48 8b 7b 28 48 85 ff 74 06 <f0> ff 4f 48 74 10 5b 48 89 ef 5d 41 5c 41 5d 41 5e e9 32 b0 ee
[  137.483375] RIP: rdma_restrack_del+0xc8/0xf0 RSP: ffff8801b54e7968
[  137.486436] ---[ end trace 81835a1ea6722eed ]---
[  137.488566] Kernel panic - not syncing: Fatal exception
[  137.491162] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Fixes: 00313983cd ("RDMA/nldev: provide detailed CM_ID information")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-16 16:35:25 -06:00
Leon Romanovsky
2975d5de64 RDMA/ucma: Check AF family prior resolving address
Garbage supplied by user will cause to UCMA module provide zero
memory size for memcpy(), because it wasn't checked, it will
produce unpredictable results in rdma_resolve_addr().

[   42.873814] BUG: KASAN: null-ptr-deref in rdma_resolve_addr+0xc8/0xfb0
[   42.874816] Write of size 28 at addr 00000000000000a0 by task resaddr/1044
[   42.876765]
[   42.876960] CPU: 1 PID: 1044 Comm: resaddr Not tainted 4.16.0-rc1-00057-gaa56a5293d7e #34
[   42.877840] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
[   42.879691] Call Trace:
[   42.880236]  dump_stack+0x5c/0x77
[   42.880664]  kasan_report+0x163/0x380
[   42.881354]  ? rdma_resolve_addr+0xc8/0xfb0
[   42.881864]  memcpy+0x34/0x50
[   42.882692]  rdma_resolve_addr+0xc8/0xfb0
[   42.883366]  ? deref_stack_reg+0x88/0xd0
[   42.883856]  ? vsnprintf+0x31a/0x770
[   42.884686]  ? rdma_bind_addr+0xc40/0xc40
[   42.885327]  ? num_to_str+0x130/0x130
[   42.885773]  ? deref_stack_reg+0x88/0xd0
[   42.886217]  ? __read_once_size_nocheck.constprop.6+0x10/0x10
[   42.887698]  ? unwind_get_return_address_ptr+0x50/0x50
[   42.888302]  ? replace_slot+0x147/0x170
[   42.889176]  ? delete_node+0x12c/0x340
[   42.890223]  ? __radix_tree_lookup+0xa9/0x160
[   42.891196]  ? ucma_resolve_ip+0xb7/0x110
[   42.891917]  ucma_resolve_ip+0xb7/0x110
[   42.893003]  ? ucma_resolve_addr+0x190/0x190
[   42.893531]  ? _copy_from_user+0x5e/0x90
[   42.894204]  ucma_write+0x174/0x1f0
[   42.895162]  ? ucma_resolve_route+0xf0/0xf0
[   42.896309]  ? dequeue_task_fair+0x67e/0xd90
[   42.897192]  ? put_prev_entity+0x7d/0x170
[   42.897870]  ? ring_buffer_record_is_on+0xd/0x20
[   42.898439]  ? tracing_record_taskinfo_skip+0x20/0x50
[   42.899686]  __vfs_write+0xc4/0x350
[   42.900142]  ? kernel_read+0xa0/0xa0
[   42.900602]  ? firmware_map_remove+0xdf/0xdf
[   42.901135]  ? do_task_dead+0x5d/0x60
[   42.901598]  ? do_exit+0xcc6/0x1220
[   42.902789]  ? __fget+0xa8/0xf0
[   42.903190]  vfs_write+0xf7/0x280
[   42.903600]  SyS_write+0xa1/0x120
[   42.904206]  ? SyS_read+0x120/0x120
[   42.905710]  ? compat_start_thread+0x60/0x60
[   42.906423]  ? SyS_read+0x120/0x120
[   42.908716]  do_syscall_64+0xeb/0x250
[   42.910760]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[   42.912735] RIP: 0033:0x7f138b0afe99
[   42.914734] RSP: 002b:00007f138b799e98 EFLAGS: 00000287 ORIG_RAX: 0000000000000001
[   42.917134] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f138b0afe99
[   42.919487] RDX: 000000000000002e RSI: 0000000020000c40 RDI: 0000000000000004
[   42.922393] RBP: 00007f138b799ec0 R08: 00007f138b79a700 R09: 0000000000000000
[   42.925266] R10: 00007f138b79a700 R11: 0000000000000287 R12: 00007f138b799fc0
[   42.927570] R13: 0000000000000000 R14: 00007ffdbae757c0 R15: 00007f138b79a9c0
[   42.930047]
[   42.932681] Disabling lock debugging due to kernel taint
[   42.934795] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
[   42.936939] IP: memcpy_erms+0x6/0x10
[   42.938864] PGD 80000001bea92067 P4D 80000001bea92067 PUD 1bea96067 PMD 0
[   42.941576] Oops: 0002 [#1] SMP KASAN PTI
[   42.943952] CPU: 1 PID: 1044 Comm: resaddr Tainted: G    B 4.16.0-rc1-00057-gaa56a5293d7e #34
[   42.946964] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
[   42.952336] RIP: 0010:memcpy_erms+0x6/0x10
[   42.954707] RSP: 0018:ffff8801c8b479c8 EFLAGS: 00010286
[   42.957227] RAX: 00000000000000a0 RBX: ffff8801c8b47ba0 RCX: 000000000000001c
[   42.960543] RDX: 000000000000001c RSI: ffff8801c8b47bbc RDI: 00000000000000a0
[   42.963867] RBP: ffff8801c8b47b60 R08: 0000000000000000 R09: ffffed0039168ed1
[   42.967303] R10: 0000000000000001 R11: ffffed0039168ed0 R12: ffff8801c8b47bbc
[   42.970685] R13: 00000000000000a0 R14: 1ffff10039168f4a R15: 0000000000000000
[   42.973631] FS:  00007f138b79a700(0000) GS:ffff8801e5d00000(0000) knlGS:0000000000000000
[   42.976831] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   42.979239] CR2: 00000000000000a0 CR3: 00000001be908002 CR4: 00000000003606a0
[   42.982060] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   42.984877] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   42.988033] Call Trace:
[   42.990487]  rdma_resolve_addr+0xc8/0xfb0
[   42.993202]  ? deref_stack_reg+0x88/0xd0
[   42.996055]  ? vsnprintf+0x31a/0x770
[   42.998707]  ? rdma_bind_addr+0xc40/0xc40
[   43.000985]  ? num_to_str+0x130/0x130
[   43.003410]  ? deref_stack_reg+0x88/0xd0
[   43.006302]  ? __read_once_size_nocheck.constprop.6+0x10/0x10
[   43.008780]  ? unwind_get_return_address_ptr+0x50/0x50
[   43.011178]  ? replace_slot+0x147/0x170
[   43.013517]  ? delete_node+0x12c/0x340
[   43.016019]  ? __radix_tree_lookup+0xa9/0x160
[   43.018755]  ? ucma_resolve_ip+0xb7/0x110
[   43.021270]  ucma_resolve_ip+0xb7/0x110
[   43.023968]  ? ucma_resolve_addr+0x190/0x190
[   43.026312]  ? _copy_from_user+0x5e/0x90
[   43.029384]  ucma_write+0x174/0x1f0
[   43.031861]  ? ucma_resolve_route+0xf0/0xf0
[   43.034782]  ? dequeue_task_fair+0x67e/0xd90
[   43.037483]  ? put_prev_entity+0x7d/0x170
[   43.040215]  ? ring_buffer_record_is_on+0xd/0x20
[   43.042990]  ? tracing_record_taskinfo_skip+0x20/0x50
[   43.045595]  __vfs_write+0xc4/0x350
[   43.048624]  ? kernel_read+0xa0/0xa0
[   43.051604]  ? firmware_map_remove+0xdf/0xdf
[   43.055379]  ? do_task_dead+0x5d/0x60
[   43.058000]  ? do_exit+0xcc6/0x1220
[   43.060783]  ? __fget+0xa8/0xf0
[   43.063133]  vfs_write+0xf7/0x280
[   43.065677]  SyS_write+0xa1/0x120
[   43.068647]  ? SyS_read+0x120/0x120
[   43.071179]  ? compat_start_thread+0x60/0x60
[   43.074025]  ? SyS_read+0x120/0x120
[   43.076705]  do_syscall_64+0xeb/0x250
[   43.079006]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[   43.081606] RIP: 0033:0x7f138b0afe99
[   43.083679] RSP: 002b:00007f138b799e98 EFLAGS: 00000287 ORIG_RAX: 0000000000000001
[   43.086802] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f138b0afe99
[   43.089989] RDX: 000000000000002e RSI: 0000000020000c40 RDI: 0000000000000004
[   43.092866] RBP: 00007f138b799ec0 R08: 00007f138b79a700 R09: 0000000000000000
[   43.096233] R10: 00007f138b79a700 R11: 0000000000000287 R12: 00007f138b799fc0
[   43.098913] R13: 0000000000000000 R14: 00007ffdbae757c0 R15: 00007f138b79a9c0
[   43.101809] Code: 90 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48
c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48
89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
[   43.107950] RIP: memcpy_erms+0x6/0x10 RSP: ffff8801c8b479c8

Reported-by: <syzbot+1d8c43206853b369d00c@syzkaller.appspotmail.com>
Fixes: 7521663857 ("RDMA/cma: Export rdma cm interface to userspace")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 20:12:38 -06:00
Parav Pandit
e41a7c4194 IB/core: Move rdma_addr_find_l2_eth_by_grh to core_priv.h
Before commit [1], rdma_addr_find_l2_eth_by_grh() was an exported function
and therefore declaration in include/rdma/ib_addr.h was fine.

But now that its scope is limited to ib_core module, its better to have it
in core_priv.h.

[1] commit 1060f86534 ("IB/{core/cm}: Fix generating a return AH for
RoCEE")

Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 15:33:39 -06:00
Parav Pandit
cb12a8e2fa IB/cm: Introduce and use helper function to get cm_port from path
Introduce and use helper function get_cm_port_from_path() to get
cm_port based on the the path record entry.

Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 15:17:49 -06:00
Parav Pandit
0a51415935 IB/core: Refactor ib_init_ah_attr_from_path() for RoCE
Resolving route for RoCE for a path record is needed only for the
received CM requests.
Therefore,
(a) ib_init_ah_attr_from_path() is refactored first to isolate the
code of resolving route.
(b) Setting dlid, path bits is not needed for RoCE.

Additionally ah attribute initialization is done from the path record
entry, so it is better to refer to path record entry type for
different link layer instead of ah attribute type while initializing
ah attribute itself.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 15:17:49 -06:00
Parav Pandit
a22af59ea9 IB/cm: Add and use a helper function to add cm_id's to the port list
Add and use helper function add_cm_id_to_port_list() to attach
cm_id to port list.

Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 15:17:48 -06:00
Parav Pandit
a9c06aeba9 IB/core: Remove rdma_resolve_ip_route() as exported symbol
rdma_resolve_ip_route() is used only by ib_core module. Therefore it is
removed as an exported symbol.

Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 15:17:45 -06:00
Parav Pandit
5ac08a3413 IB/cma: Use rdma_protocol_roce() and remove cma_protocol_roce_dev_port()
rdma_protocol_roce() API from the ib_core already provides a way to
detect whether a given device+port is RoCE or not.
Therefore, make use of it and avoid implementing it again in rdmacm
module.

Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 14:40:38 -06:00
Parav Pandit
563c4ba3bd IB/core: Honor port_num while resolving GID for IB link layer
ah_attr contains the port number to which cm_id is bound. However, while
searching for GID table for matching GID entry, the port number is
ignored.

This could cause the wrong GID to be used when the ah_attr is converted to
an AH.

Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 14:40:37 -06:00
Parav Pandit
6d337179f2 IB/core: Honor return status of ib_init_ah_from_mcmember()
The return status of ib_init_ah_from_mcmember() is ignored by
cma_ib_mc_handler().  Honor it and return error event if ah attribute
initialization failed.

Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 14:40:37 -06:00
Parav Pandit
b26c4a1138 IB/{core, ipoib}: Simplify ib_find_gid() for unused ndev
ib_find_gid() is only used by IPoIB driver. For IB link layer, GID table
entries are not based on netdevice. Netdevice parameter is unused here.
Therefore, it is removed.

Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 14:40:37 -06:00
Parav Pandit
6612b4983f IB/core: Fix comments of GID query functions
Exported symbol's comments should be with function definition and not in
the header file. Therefore comments of ib_find_cached_gid() and
ib_find_cached_gid_by_port() functions are moved closer to their
definitions.

The function name in then comment is different than the actual function
name, fix it to be same as ib_cache_gid_find_by_filter().

Also current comment section of ib_find_cached_gid_by_port() contains the
desciption of ib_find_cached_gid(), fix that as well.

Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 14:40:36 -06:00
Doug Ledford
2d873449a2 Merge branch 'k.o/wip/dl-for-rc' into k.o/wip/dl-for-next
Due to bug fixes found by the syzkaller bot and taken into the for-rc
branch after development for the 4.17 merge window had already started
being taken into the for-next branch, there were fairly non-trivial
merge issues that would need to be resolved between the for-rc branch
and the for-next branch.  This merge resolves those conflicts and
provides a unified base upon which ongoing development for 4.17 can
be based.

Conflicts:
	drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f95
	(IB/mlx5: Fix cleanup order on unload) added to for-rc and
	commit b5ca15ad7e (IB/mlx5: Add proper representors support)
	add as part of the devel cycle both needed to modify the
	init/de-init functions used by mlx5.  To support the new
	representors, the new functions added by the cleanup patch
	needed to be made non-static, and the init/de-init list
	added by the representors patch needed to be modified to
	match the init/de-init list changes made by the cleanup
	patch.
Updates:
	drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function
	prototypes added by representors patch to reflect new function
	names as changed by cleanup patch
	drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init
	stage list to match new order from cleanup patch

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 19:28:58 -04:00
Andrew Morton
6ee687735e drivers/infiniband/core/verbs.c: fix build with gcc-4.4.4
gcc-4.4.4 has issues with initialization of anonymous unions.

drivers/infiniband/core/verbs.c: In function '__ib_drain_sq':
drivers/infiniband/core/verbs.c:2204: error: unknown field 'wr_cqe' specified in initializer
drivers/infiniband/core/verbs.c:2204: warning: initialization makes integer from pointer without a cast

Work around this.

Fixes: a1ae7d0345 ("RDMA/core: Avoid that ib_drain_qp() triggers an out-of-bounds stack access")
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 18:14:08 -04:00
Leon Romanovsky
0c81ffc60d RDMA/ucma: Don't allow join attempts for unsupported AF family
Users can provide garbage while calling to ucma_join_ip_multicast(),
it will indirectly cause to rdma_addr_size() return 0, making the
call to ucma_process_join(), which had the right checks, but it is
better to check the input as early as possible.

The following crash from syzkaller revealed it.

kernel BUG at lib/string.c:1052!
invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 4113 Comm: syz-executor0 Not tainted 4.16.0-rc5+ #261
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:fortify_panic+0x13/0x20 lib/string.c:1051
RSP: 0018:ffff8801ca81f8f0 EFLAGS: 00010286
RAX: 0000000000000022 RBX: 1ffff10039503f23 RCX: 0000000000000000
RDX: 0000000000000022 RSI: 1ffff10039503ed3 RDI: ffffed0039503f12
RBP: ffff8801ca81f8f0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000006 R11: 0000000000000000 R12: ffff8801ca81f998
R13: ffff8801ca81f938 R14: ffff8801ca81fa58 R15: 000000000000fa00
FS:  0000000000000000(0000) GS:ffff8801db200000(0063) knlGS:000000000a12a900
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 0000000008138024 CR3: 00000001cbb58004 CR4: 00000000001606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 memcpy include/linux/string.h:344 [inline]
 ucma_join_ip_multicast+0x36b/0x3b0 drivers/infiniband/core/ucma.c:1421
 ucma_write+0x2d6/0x3d0 drivers/infiniband/core/ucma.c:1633
 __vfs_write+0xef/0x970 fs/read_write.c:480
 vfs_write+0x189/0x510 fs/read_write.c:544
 SYSC_write fs/read_write.c:589 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:581
 do_syscall_32_irqs_on arch/x86/entry/common.c:330 [inline]
 do_fast_syscall_32+0x3ec/0xf9f arch/x86/entry/common.c:392
 entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7f9ec99
RSP: 002b:00000000ff8172cc EFLAGS: 00000282 ORIG_RAX: 0000000000000004
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000020000100
RDX: 0000000000000063 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Code: 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b 48 89 df e8 42 2c e3 fb eb de
55 48 89 fe 48 c7 c7 80 75 98 86 48 89 e5 e8 85 95 94 fb <0f> 0b 90 90 90 90
90 90 90 90 90 90 90 55 48 89 e5 41 57 41 56
RIP: fortify_panic+0x13/0x20 lib/string.c:1051 RSP: ffff8801ca81f8f0

Fixes: 5bc2b7b397 ("RDMA/ucma: Allow user space to specify AF_IB when joining multicast")
Reported-by: <syzbot+2287ac532caa81900a4e@syzkaller.appspotmail.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 16:31:11 -04:00
Leon Romanovsky
7688f2c3bb RDMA/ucma: Fix access to non-initialized CM_ID object
The attempt to join multicast group without ensuring that CMA device
exists will lead to the following crash reported by syzkaller.

[   64.076794] BUG: KASAN: null-ptr-deref in rdma_join_multicast+0x26e/0x12c0
[   64.076797] Read of size 8 at addr 00000000000000b0 by task join/691
[   64.076797]
[   64.076800] CPU: 1 PID: 691 Comm: join Not tainted 4.16.0-rc1-00219-gb97853b65b93 #23
[   64.076802] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-proj4
[   64.076803] Call Trace:
[   64.076809]  dump_stack+0x5c/0x77
[   64.076817]  kasan_report+0x163/0x380
[   64.085859]  ? rdma_join_multicast+0x26e/0x12c0
[   64.086634]  rdma_join_multicast+0x26e/0x12c0
[   64.087370]  ? rdma_disconnect+0xf0/0xf0
[   64.088579]  ? __radix_tree_replace+0xc3/0x110
[   64.089132]  ? node_tag_clear+0x81/0xb0
[   64.089606]  ? idr_alloc_u32+0x12e/0x1a0
[   64.090517]  ? __fprop_inc_percpu_max+0x150/0x150
[   64.091768]  ? tracing_record_taskinfo+0x10/0xc0
[   64.092340]  ? idr_alloc+0x76/0xc0
[   64.092951]  ? idr_alloc_u32+0x1a0/0x1a0
[   64.093632]  ? ucma_process_join+0x23d/0x460
[   64.094510]  ucma_process_join+0x23d/0x460
[   64.095199]  ? ucma_migrate_id+0x440/0x440
[   64.095696]  ? futex_wake+0x10b/0x2a0
[   64.096159]  ucma_join_multicast+0x88/0xe0
[   64.096660]  ? ucma_process_join+0x460/0x460
[   64.097540]  ? _copy_from_user+0x5e/0x90
[   64.098017]  ucma_write+0x174/0x1f0
[   64.098640]  ? ucma_resolve_route+0xf0/0xf0
[   64.099343]  ? rb_erase_cached+0x6c7/0x7f0
[   64.099839]  __vfs_write+0xc4/0x350
[   64.100622]  ? perf_syscall_enter+0xe4/0x5f0
[   64.101335]  ? kernel_read+0xa0/0xa0
[   64.103525]  ? perf_sched_cb_inc+0xc0/0xc0
[   64.105510]  ? syscall_exit_register+0x2a0/0x2a0
[   64.107359]  ? __switch_to+0x351/0x640
[   64.109285]  ? fsnotify+0x899/0x8f0
[   64.111610]  ? fsnotify_unmount_inodes+0x170/0x170
[   64.113876]  ? __fsnotify_update_child_dentry_flags+0x30/0x30
[   64.115813]  ? ring_buffer_record_is_on+0xd/0x20
[   64.117824]  ? __fget+0xa8/0xf0
[   64.119869]  vfs_write+0xf7/0x280
[   64.122001]  SyS_write+0xa1/0x120
[   64.124213]  ? SyS_read+0x120/0x120
[   64.126644]  ? SyS_read+0x120/0x120
[   64.128563]  do_syscall_64+0xeb/0x250
[   64.130732]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[   64.132984] RIP: 0033:0x7f5c994ade99
[   64.135699] RSP: 002b:00007f5c99b97d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   64.138740] RAX: ffffffffffffffda RBX: 00000000200001e4 RCX: 00007f5c994ade99
[   64.141056] RDX: 00000000000000a0 RSI: 00000000200001c0 RDI: 0000000000000015
[   64.143536] RBP: 00007f5c99b97ec0 R08: 0000000000000000 R09: 0000000000000000
[   64.146017] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5c99b97fc0
[   64.148608] R13: 0000000000000000 R14: 00007fff660e1c40 R15: 00007f5c99b989c0
[   64.151060]
[   64.153703] Disabling lock debugging due to kernel taint
[   64.156032] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0
[   64.159066] IP: rdma_join_multicast+0x26e/0x12c0
[   64.161451] PGD 80000001d0298067 P4D 80000001d0298067 PUD 1dea39067 PMD 0
[   64.164442] Oops: 0000 [#1] SMP KASAN PTI
[   64.166817] CPU: 1 PID: 691 Comm: join Tainted: G    B 4.16.0-rc1-00219-gb97853b65b93 #23
[   64.170004] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-proj4
[   64.174985] RIP: 0010:rdma_join_multicast+0x26e/0x12c0
[   64.177246] RSP: 0018:ffff8801c8207860 EFLAGS: 00010282
[   64.179901] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff94789522
[   64.183344] RDX: 1ffffffff2d50fa5 RSI: 0000000000000297 RDI: 0000000000000297
[   64.186237] RBP: ffff8801c8207a50 R08: 0000000000000000 R09: ffffed0039040ea7
[   64.189328] R10: 0000000000000001 R11: ffffed0039040ea6 R12: 0000000000000000
[   64.192634] R13: 0000000000000000 R14: ffff8801e2022800 R15: ffff8801d4ac2400
[   64.196105] FS:  00007f5c99b98700(0000) GS:ffff8801e5d00000(0000) knlGS:0000000000000000
[   64.199211] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   64.202046] CR2: 00000000000000b0 CR3: 00000001d1c48004 CR4: 00000000003606a0
[   64.205032] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   64.208221] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   64.211554] Call Trace:
[   64.213464]  ? rdma_disconnect+0xf0/0xf0
[   64.216124]  ? __radix_tree_replace+0xc3/0x110
[   64.219337]  ? node_tag_clear+0x81/0xb0
[   64.222140]  ? idr_alloc_u32+0x12e/0x1a0
[   64.224422]  ? __fprop_inc_percpu_max+0x150/0x150
[   64.226588]  ? tracing_record_taskinfo+0x10/0xc0
[   64.229763]  ? idr_alloc+0x76/0xc0
[   64.232186]  ? idr_alloc_u32+0x1a0/0x1a0
[   64.234505]  ? ucma_process_join+0x23d/0x460
[   64.237024]  ucma_process_join+0x23d/0x460
[   64.240076]  ? ucma_migrate_id+0x440/0x440
[   64.243284]  ? futex_wake+0x10b/0x2a0
[   64.245302]  ucma_join_multicast+0x88/0xe0
[   64.247783]  ? ucma_process_join+0x460/0x460
[   64.250841]  ? _copy_from_user+0x5e/0x90
[   64.253878]  ucma_write+0x174/0x1f0
[   64.257008]  ? ucma_resolve_route+0xf0/0xf0
[   64.259877]  ? rb_erase_cached+0x6c7/0x7f0
[   64.262746]  __vfs_write+0xc4/0x350
[   64.265537]  ? perf_syscall_enter+0xe4/0x5f0
[   64.267792]  ? kernel_read+0xa0/0xa0
[   64.270358]  ? perf_sched_cb_inc+0xc0/0xc0
[   64.272575]  ? syscall_exit_register+0x2a0/0x2a0
[   64.275367]  ? __switch_to+0x351/0x640
[   64.277700]  ? fsnotify+0x899/0x8f0
[   64.280530]  ? fsnotify_unmount_inodes+0x170/0x170
[   64.283156]  ? __fsnotify_update_child_dentry_flags+0x30/0x30
[   64.286182]  ? ring_buffer_record_is_on+0xd/0x20
[   64.288749]  ? __fget+0xa8/0xf0
[   64.291136]  vfs_write+0xf7/0x280
[   64.292972]  SyS_write+0xa1/0x120
[   64.294965]  ? SyS_read+0x120/0x120
[   64.297474]  ? SyS_read+0x120/0x120
[   64.299751]  do_syscall_64+0xeb/0x250
[   64.301826]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[   64.304352] RIP: 0033:0x7f5c994ade99
[   64.306711] RSP: 002b:00007f5c99b97d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   64.309577] RAX: ffffffffffffffda RBX: 00000000200001e4 RCX: 00007f5c994ade99
[   64.312334] RDX: 00000000000000a0 RSI: 00000000200001c0 RDI: 0000000000000015
[   64.315783] RBP: 00007f5c99b97ec0 R08: 0000000000000000 R09: 0000000000000000
[   64.318365] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5c99b97fc0
[   64.320980] R13: 0000000000000000 R14: 00007fff660e1c40 R15: 00007f5c99b989c0
[   64.323515] Code: e8 e8 79 08 ff 4c 89 ff 45 0f b6 a7 b8 01 00 00 e8 68 7c 08 ff 49 8b 1f 4d 89 e5 49 c1 e4 04 48 8
[   64.330753] RIP: rdma_join_multicast+0x26e/0x12c0 RSP: ffff8801c8207860
[   64.332979] CR2: 00000000000000b0
[   64.335550] ---[ end trace 0c00c17a408849c1 ]---

Reported-by: <syzbot+e6aba77967bd72cbc9d6@syzkaller.appspotmail.com>
Fixes: c8f6a362bf ("RDMA/cma: Add multicast communication support")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 16:22:55 -04:00
Tatyana Nikolova
9dea9a2ff6 RDMA/core: Do not use invalid destination in determining port reuse
cma_port_is_unique() allows local port reuse if the quad (source
address and port, destination address and port) for this connection
is unique. However, if the destination info is zero or unspecified, it
can't make a correct decision but still allows port reuse. For example,
sometimes rdma_bind_addr() is called with unspecified destination and
reusing the port can lead to creating a connection with a duplicate quad,
after the destination is resolved. The issue manifests when MPI scale-up
tests hang after the duplicate quad is used.

Set the destination address family and add checks for zero destination
address and port to prevent source port reuse based on invalid destination.

Fixes: 19b752a19d ("IB/cma: Allow port reuse for rdma_id")
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 15:57:05 -04:00
Leon Romanovsky
19b1f54099 RDMA/verbs: Simplify modify QP check
All callers to ib_modify_qp_is_ok() provides enum ib_qp_state
makes the checks of out-of-scope redundant. Let's remove them
together with updating function signature to return boolean result.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 15:34:25 -04:00
Leon Romanovsky
88de869bbe RDMA/uverbs: Ensure validity of current QP state value
The QP state is internal enum which is checked at the driver
level by calling to ib_modify_qp_is_ok(). Move this check closer
to user and leave kernel users to be checked by compiler.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 15:34:25 -04:00
Steve Wise
29cf1351d4 RDMA/nldev: provide detailed PD information
Implement the RDMA nldev netlink interface for dumping detailed PD
information.

Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-08 15:03:03 -05:00
Steve Wise
fccec5b89a RDMA/nldev: provide detailed MR information
Implement the RDMA nldev netlink interface for dumping detailed
MR information.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-08 15:03:03 -05:00
Steve Wise
a34fc0893e RDMA/nldev: provide detailed CQ information
Implement the RDMA nldev netlink interface for dumping detailed
CQ information.

Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-08 15:03:03 -05:00
Steve Wise
00313983cd RDMA/nldev: provide detailed CM_ID information
Implement RDMA nldev netlink interface to get detailed CM_ID information.

Because cm_id's are attached to rdma devices in various work queue
contexts, the pid and task information at restrak_add() time is sometimes
not useful.  For example, an nvme/f host connection cm_id ends up being
bound to a device in a work queue context and the resulting pid at attach
time no longer exists after connection setup.  So instead we mark all
cm_id's created via the rdma_ucm as "user", and all others as "kernel".
This required tweaking the restrack code a little.  It also required
wrapping some rdma_cm functions to allow passing the module name string.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-08 15:03:03 -05:00
Steve Wise
a3b641af72 RDMA/CM: move rdma_id_private to cma_priv.h
Move struct rdma_id_private to a new header cma_priv.h so the resource
tracking services in core/nldev.c can read useful information about cm_ids.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-08 15:03:03 -05:00
Steve Wise
d12ff62482 RDMA/nldev: common resource dumpit function
Create a common dumpit function that can be used by all common resource
types.  This reduces code replication and simplifies the code as we add
more resource types.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-08 15:03:03 -05:00
Steve Wise
88831a2cfe RDMA/restrack: clean up res_to_dev()
Simplify res_to_dev() to make it easier to read/maintain.

Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-08 15:03:03 -05:00
Leon Romanovsky
a5880b8443 RDMA/ucma: Check that user doesn't overflow QP state
The QP state is limited and declared in enum ib_qp_state,
but ucma user was able to supply any possible (u32) value.

Reported-by: syzbot+0df1ab766f8924b1edba@syzkaller.appspotmail.com
Fixes: 7521663857 ("RDMA/cma: Export rdma cm interface to userspace")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-07 15:25:40 -05:00
Leon Romanovsky
6a21dfc0d0 RDMA/ucma: Limit possible option size
Users of ucma are supposed to provide size of option level,
in most paths it is supposed to be equal to u8 or u16, but
it is not the case for the IB path record, where it can be
multiple of struct ib_path_rec_data.

This patch takes simplest possible approach and prevents providing
values more than possible to allocate.

Reported-by: syzbot+a38b0e9f694c379ca7ce@syzkaller.appspotmail.com
Fixes: 7ce86409ad ("RDMA/ucma: Allow user space to set service type")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-07 15:18:03 -05:00
Parav Pandit
bb7f8f199c IB/core: Fix possible crash to access NULL netdev
resolved_dev returned might be NULL as ifindex is transient number.
Ignoring NULL check of resolved_dev might crash the kernel.
Therefore perform NULL check before accessing resolved_dev.

Additionally rdma_resolve_ip_route() invokes addr_resolve() which
performs check and address translation for loopback ifindex.
Therefore, checking it again in rdma_resolve_ip_route() is not helpful.
Therefore, the code is simplified to avoid IFF_LOOPBACK check.

Fixes: 200298326b ("IB/core: Validate route when we init ah")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-07 15:15:40 -05:00
Max Gurtovoy
d3b9e8ad42 RDMA/core: Reduce poll batch for direct cq polling
Fix warning limit for kernel stack consumption:

drivers/infiniband/core/cq.c: In function 'ib_process_cq_direct':
drivers/infiniband/core/cq.c:78:1: error: the frame size of 1032 bytes
is larger than 1024 bytes [-Werror=frame-larger-than=]

Using smaller ib_wc array on the stack brings us comfortably below that
limit again.

Fixes: 246d8b184c ("IB/cq: Don't force IB_POLL_DIRECT poll context for ib_process_cq_direct")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Sergey Gorenko <sergeygo@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 20:08:39 -07:00
Bart Van Assche
a1ae7d0345 RDMA/core: Avoid that ib_drain_qp() triggers an out-of-bounds stack access
This patch fixes the following KASAN complaint:

==================================================================
BUG: KASAN: stack-out-of-bounds in rxe_post_send+0x77d/0x9b0 [rdma_rxe]
Read of size 8 at addr ffff880061aef860 by task 01/1080

CPU: 2 PID: 1080 Comm: 01 Not tainted 4.16.0-rc3-dbg+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0x85/0xc7
print_address_description+0x65/0x270
kasan_report+0x231/0x350
rxe_post_send+0x77d/0x9b0 [rdma_rxe]
__ib_drain_sq+0x1ad/0x250 [ib_core]
ib_drain_qp+0x9/0x30 [ib_core]
srp_destroy_qp+0x51/0x70 [ib_srp]
srp_free_ch_ib+0xfc/0x380 [ib_srp]
srp_create_target+0x1071/0x19e0 [ib_srp]
kernfs_fop_write+0x180/0x210
__vfs_write+0xb1/0x2e0
vfs_write+0xf6/0x250
SyS_write+0x99/0x110
do_syscall_64+0xee/0x2b0
entry_SYSCALL_64_after_hwframe+0x42/0xb7

The buggy address belongs to the page:
page:ffffea000186bbc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x4000000000000000()
raw: 4000000000000000 0000000000000000 0000000000000000 00000000ffffffff
raw: 0000000000000000 ffffea000186bbe0 0000000000000000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff880061aef700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff880061aef780: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00
>ffff880061aef800: f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 f2 f2 f2 f2
                                                      ^
ffff880061aef880: f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 f2 f2
ffff880061aef900: f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

Fixes: 765d67748b ("IB: new common API for draining queues")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 16:00:51 -07:00
David Ahern
b75cc8f90f net/ipv6: Pass skb to route lookup
IPv6 does path selection for multipath routes deep in the lookup
functions. The next patch adds L4 hash option and needs the skb
for the forward path. To get the skb to the relevant FIB lookup
functions it needs to go through the fib rules layer, so add a
lookup_data argument to the fib_lookup_arg struct.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-04 13:04:22 -05:00
Markus Elfring
c7ec83772a RDMA/iwpm: Delete an error message for a failed memory allocation in iwpm_create_nlmsg()
Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 13:57:39 -07:00
Jason Gunthorpe
8efe991e8b IB/uverbs: Tidy uverbs_uobject_add
Maintaining the uobjects list is mandatory, hoist it into the common
rdma_alloc_commit_uobject() function and inline it as there is now
only one caller.

Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 12:55:03 -07:00
Muneendra Kumar M
4cd482c12b IB/core : Add null pointer check in addr_resolve
dev_get_by_index is being called in addr_resolve
function which returns NULL and NULL pointer access
leads to kernel crash.

Following call trace is observed while running
rdma_lat test application

[  146.173149] BUG: unable to handle kernel NULL pointer dereference
at 00000000000004a0
[  146.173198] IP: addr_resolve+0x9e/0x3e0 [ib_core]
[  146.173221] PGD 0 P4D 0
[  146.173869] Oops: 0000 [#1] SMP PTI
[  146.182859] CPU: 8 PID: 127 Comm: kworker/8:1 Tainted: G  O 4.15.0-rc6+ #18
[  146.183758] Hardware name: LENOVO System x3650 M5: -[8871AC1]-/01KN179,
 BIOS-[TCE132H-2.50]- 10/11/2017
[  146.184691] Workqueue: ib_cm cm_work_handler [ib_cm]
[  146.185632] RIP: 0010:addr_resolve+0x9e/0x3e0 [ib_core]
[  146.186584] RSP: 0018:ffffc9000362faa0 EFLAGS: 00010246
[  146.187521] RAX: 000000000000001b RBX: ffffc9000362fc08 RCX:
0000000000000006
[  146.188472] RDX: 0000000000000000 RSI: 0000000000000096 RDI
: ffff88087fc16990
[  146.189427] RBP: ffffc9000362fb18 R08: 00000000ffffff9d R09:
00000000000004ac
[  146.190392] R10: 00000000000001e7 R11: 0000000000000001 R12:
ffff88086af2e090
[  146.191361] R13: 0000000000000000 R14: 0000000000000001 R15:
00000000ffffff9d
[  146.192327] FS:  0000000000000000(0000) GS:ffff88087fc00000(0000)
knlGS:0000000000000000
[  146.193301] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  146.194274] CR2: 00000000000004a0 CR3: 000000000220a002 CR4:
00000000003606e0
[  146.195258] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  146.196256] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  146.197231] Call Trace:
[  146.198209]  ? rdma_addr_register_client+0x30/0x30 [ib_core]
[  146.199199]  rdma_resolve_ip+0x1af/0x280 [ib_core]
[  146.200196]  rdma_addr_find_l2_eth_by_grh+0x154/0x2b0 [ib_core]

The below patch adds the missing NULL pointer check
returned by dev_get_by_index before accessing the netdev to
avoid kernel crash.

We observed the below crash when we try to do the below test.

 server                       client
 ---------                    ---------
 |1.1.1.1|<----rxe-channel--->|1.1.1.2|
 ---------                    ---------

On server: rdma_lat -c -n 2 -s 1024
On client:rdma_lat 1.1.1.1 -c -n 2 -s 1024

Fixes: 200298326b ("IB/core: Validate route when we init ah")
Signed-off-by: Muneendra <muneendra.kumar@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 12:10:33 -07:00
Parav Pandit
2fb4f4eadd IB/core: Fix missing RDMA cgroups release in case of failure to register device
During IB device registration process, if query_device() fails or if
ib_core fails to registers sysfs entries, rdma cgroup cleanup is
skipped.

Cc: <stable@vger.kernel.org> # v4.2+
Fixes: 4be3a4fa51 ("IB/core: Fix kernel crash during fail to initialize device")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 12:10:32 -07:00
Kirill Tkhai
25354866e0 net: Convert cma_pernet_operations
These pernet_operations just create and destroy IDR.
So, we mark them as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 11:01:36 -05:00
Leon Romanovsky
87915bf82e RDMA/verbs: Return proper error code for not supported system call
The proper return error is -EOPNOTSUPP and not -ENOSYS, so update
all places in verbs.c to match this semantics.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22 22:31:18 -05:00
Leon Romanovsky
372e15c5db RDMA/uverbs: Reduce number of command header flags checks
Simplify the code by directly checking the availability of extended
command flog instead of doing multiple shift operations.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22 22:31:18 -05:00
Leon Romanovsky
cd35cf4b40 RDMA/uverbs: Replace user's types with kernel's types
The internal to kernel variable declarations don't need to be
declared with user types. This patch converts such occurrences
appeared in ib_uverbs_write().

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22 22:31:18 -05:00
Leon Romanovsky
6284380a97 RDMA/uverbs: Refactor the header validation logic
Move all header validation logic to be performed before SRCU read lock.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22 22:29:50 -05:00
Leon Romanovsky
e21719fbbd RDMa/uverbs: Copy ex_hdr outside of SRCU read lock
The SRCU read lock protects the IB device pointer
and doesn't need to be called before copying user
provided header.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22 22:29:50 -05:00
Leon Romanovsky
491d5c6a30 RDMA/uverbs: Move uncontext check before SRCU read lock
There is no need to take SRCU lock before checking
file->ucontext, so move it do it before it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22 22:29:50 -05:00
Leon Romanovsky
eb455e329b RDMA/uverbs: Properly check command supported mask
The check based on index is not sufficient because

  IB_USER_VERBS_EX_CMD_CREATE_CQ = IB_USER_VERBS_CMD_CREATE_CQ

and IB_USER_VERBS_CMD_CREATE_CQ <= IB_USER_VERBS_CMD_OPEN_QP,
so if we execute IB_USER_VERBS_EX_CMD_CREATE_CQ this code checks
ib_dev->uverbs_cmd_mask not ib_dev->uverbs_ex_cmd_mask.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22 22:29:50 -05:00
Leon Romanovsky
77833b8a48 RDMA/uverbs: Refactor command header processing
Move all command header processing into separate function
and perform those checks before acquiring SRCU read lock.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22 22:29:37 -05:00
Leon Romanovsky
f2630ce2fb RDMA/uverbs: Unify return values of not supported command
The non-existing command is supposed to return -EOPNOTSUPP, but the
current code returns different errors for different flows for the
same failure. This patch unifies those flows.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22 22:29:36 -05:00
Leon Romanovsky
a9ed5b38aa RDMA/uverbs: Return not supported error code for unsupported commands
Command that doesn't exist means that it is not supported,
so update code to return -EOPNOTSUPP in case of failure.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22 22:29:11 -05:00
Leon Romanovsky
43ae95130d RDMA/uverbs: Fail as early as possible if not enough header data was provided
Fail as early as possible if not enough header data
was provided.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22 22:29:11 -05:00
Leon Romanovsky
a6c4a66ae9 RDMA/uverbs: Refactor flags checks and update return value
Since commit f21519b23c ("IB/core: extended command: an
improved infrastructure for uverbs commands"), the uverbs
supports extra flags as an input to the command interface.

However actually, there is only one flag available and used,
so it is better to refactor the code, so the resolution and
report to the users is done as early as possible.

As part of this change, we changed the return value of failure case
from ENOSYS to be EINVAL to be consistent with the rest flags checks.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22 22:29:11 -05:00
Leon Romanovsky
08f0e16163 RDMA/uverbs: Update sizeof users
Update sizeof() users to be consistent with coding style.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22 22:29:11 -05:00
Leon Romanovsky
b5bc598186 RDMA/uverbs: Convert command mask validity check function to be bool
The function validate_command_mask() returns only two results: success
or failure, so convert it to return bool instead of 0 and -1.

Reported-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22 22:29:10 -05:00
Leon Romanovsky
f45765872e RDMA/uverbs: Fix kernel panic while using XRC_TGT QP type
Attempt to modify XRC_TGT QP type from the user space (ibv_xsrq_pingpong
invocation) will trigger the following kernel panic. It is caused by the
fact that such QPs missed uobject initialization.

[   17.408845] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[   17.412645] IP: rdma_lookup_put_uobject+0x9/0x50
[   17.416567] PGD 0 P4D 0
[   17.419262] Oops: 0000 [#1] SMP PTI
[   17.422915] CPU: 0 PID: 455 Comm: ibv_xsrq_pingpo Not tainted 4.16.0-rc1+ #86
[   17.424765] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[   17.427399] RIP: 0010:rdma_lookup_put_uobject+0x9/0x50
[   17.428445] RSP: 0018:ffffb8c7401e7c90 EFLAGS: 00010246
[   17.429543] RAX: 0000000000000000 RBX: ffffb8c7401e7cf8 RCX: 0000000000000000
[   17.432426] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
[   17.437448] RBP: 0000000000000000 R08: 00000000000218f0 R09: ffffffff8ebc4cac
[   17.440223] R10: fffff6038052cd80 R11: ffff967694b36400 R12: ffff96769391f800
[   17.442184] R13: ffffb8c7401e7cd8 R14: 0000000000000000 R15: ffff967699f60000
[   17.443971] FS:  00007fc29207d700(0000) GS:ffff96769fc00000(0000) knlGS:0000000000000000
[   17.446623] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   17.448059] CR2: 0000000000000048 CR3: 000000001397a000 CR4: 00000000000006b0
[   17.449677] Call Trace:
[   17.450247]  modify_qp.isra.20+0x219/0x2f0
[   17.451151]  ib_uverbs_modify_qp+0x90/0xe0
[   17.452126]  ib_uverbs_write+0x1d2/0x3c0
[   17.453897]  ? __handle_mm_fault+0x93c/0xe40
[   17.454938]  __vfs_write+0x36/0x180
[   17.455875]  vfs_write+0xad/0x1e0
[   17.456766]  SyS_write+0x52/0xc0
[   17.457632]  do_syscall_64+0x75/0x180
[   17.458631]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[   17.460004] RIP: 0033:0x7fc29198f5a0
[   17.460982] RSP: 002b:00007ffccc71f018 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   17.463043] RAX: ffffffffffffffda RBX: 0000000000000078 RCX: 00007fc29198f5a0
[   17.464581] RDX: 0000000000000078 RSI: 00007ffccc71f050 RDI: 0000000000000003
[   17.466148] RBP: 0000000000000000 R08: 0000000000000078 R09: 00007ffccc71f050
[   17.467750] R10: 000055b6cf87c248 R11: 0000000000000246 R12: 00007ffccc71f300
[   17.469541] R13: 000055b6cf8733a0 R14: 0000000000000000 R15: 0000000000000000
[   17.471151] Code: 00 00 0f 1f 44 00 00 48 8b 47 48 48 8b 00 48 8b 40 10 e9 0b 8b 68 00 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 53 89 f5 <48> 8b 47 48 48 89 fb 40 0f b6 f6 48 8b 00 48 8b 40 20 e8 e0 8a
[   17.475185] RIP: rdma_lookup_put_uobject+0x9/0x50 RSP: ffffb8c7401e7c90
[   17.476841] CR2: 0000000000000048
[   17.477764] ---[ end trace 1dbcc5354071a712 ]---
[   17.478880] Kernel panic - not syncing: Fatal exception
[   17.480277] Kernel Offset: 0xd000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Fixes: 2f08ee363f ("RDMA/restrack: don't use uaccess_kernel()")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-21 13:52:19 -05:00
Steve Wise
2f08ee363f RDMA/restrack: don't use uaccess_kernel()
uaccess_kernel() isn't sufficient to determine if an rdma resource is
user-mode or not.  For example, resources allocated in the add_one()
function of an ib_client get falsely labeled as user mode, when they
are kernel mode allocations.  EG: mad qps.

The result is that these qps are skipped over during a nldev query
because of an erroneous namespace mismatch.

So now we determine if the resource is user-mode by looking at the object
struct's uobject or similar pointer to know if it was allocated for user
mode applications.

Fixes: 02d8883f52 ("RDMA/restrack: Add general infrastructure to track RDMA resources")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-16 10:18:11 -07:00
Leon Romanovsky
2188558621 RDMA/verbs: Check existence of function prior to accessing it
Update all the flows to ensure that function pointer exists prior
to accessing it.

This is much safer than checking the uverbs_ex_mask variable, especially
since we know that test isn't working properly and will be removed
in -next.

This prevents a user triggereable oops.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-16 09:18:55 -07:00
Leon Romanovsky
5d4c05c3ee RDMA/uverbs: Sanitize user entered port numbers prior to access it
==================================================================
BUG: KASAN: use-after-free in copy_ah_attr_from_uverbs+0x6f2/0x8c0
Read of size 4 at addr ffff88006476a198 by task syzkaller697701/265

CPU: 0 PID: 265 Comm: syzkaller697701 Not tainted 4.15.0+ #90
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
 dump_stack+0xde/0x164
 ? dma_virt_map_sg+0x22c/0x22c
 ? show_regs_print_info+0x17/0x17
 ? lock_contended+0x11a0/0x11a0
 print_address_description+0x83/0x3e0
 kasan_report+0x18c/0x4b0
 ? copy_ah_attr_from_uverbs+0x6f2/0x8c0
 ? copy_ah_attr_from_uverbs+0x6f2/0x8c0
 ? lookup_get_idr_uobject+0x120/0x200
 ? copy_ah_attr_from_uverbs+0x6f2/0x8c0
 copy_ah_attr_from_uverbs+0x6f2/0x8c0
 ? modify_qp+0xd0e/0x1350
 modify_qp+0xd0e/0x1350
 ib_uverbs_modify_qp+0xf9/0x170
 ? ib_uverbs_query_qp+0xa70/0xa70
 ib_uverbs_write+0x7f9/0xef0
 ? attach_entity_load_avg+0x8b0/0x8b0
 ? ib_uverbs_query_qp+0xa70/0xa70
 ? uverbs_devnode+0x110/0x110
 ? cyc2ns_read_end+0x10/0x10
 ? print_irqtrace_events+0x280/0x280
 ? sched_clock_cpu+0x18/0x200
 ? _raw_spin_unlock_irq+0x29/0x40
 ? _raw_spin_unlock_irq+0x29/0x40
 ? _raw_spin_unlock_irq+0x29/0x40
 ? time_hardirqs_on+0x27/0x670
 __vfs_write+0x10d/0x700
 ? uverbs_devnode+0x110/0x110
 ? kernel_read+0x170/0x170
 ? _raw_spin_unlock_irq+0x29/0x40
 ? finish_task_switch+0x1bd/0x7a0
 ? finish_task_switch+0x194/0x7a0
 ? prandom_u32_state+0xe/0x180
 ? rcu_read_unlock+0x80/0x80
 ? security_file_permission+0x93/0x260
 vfs_write+0x1b0/0x550
 SyS_write+0xc7/0x1a0
 ? SyS_read+0x1a0/0x1a0
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 entry_SYSCALL_64_fastpath+0x1e/0x8b
RIP: 0033:0x433c29
RSP: 002b:00007ffcf2be82a8 EFLAGS: 00000217

Allocated by task 62:
 kasan_kmalloc+0xa0/0xd0
 kmem_cache_alloc+0x141/0x480
 dup_fd+0x101/0xcc0
 copy_process.part.62+0x166f/0x4390
 _do_fork+0x1cb/0xe90
 kernel_thread+0x34/0x40
 call_usermodehelper_exec_work+0x112/0x260
 process_one_work+0x929/0x1aa0
 worker_thread+0x5c6/0x12a0
 kthread+0x346/0x510
 ret_from_fork+0x3a/0x50

Freed by task 259:
 kasan_slab_free+0x71/0xc0
 kmem_cache_free+0xf3/0x4c0
 put_files_struct+0x225/0x2c0
 exit_files+0x88/0xc0
 do_exit+0x67c/0x1520
 do_group_exit+0xe8/0x380
 SyS_exit_group+0x1e/0x20
 entry_SYSCALL_64_fastpath+0x1e/0x8b

The buggy address belongs to the object at ffff88006476a000
 which belongs to the cache files_cache of size 832
The buggy address is located 408 bytes inside of
 832-byte region [ffff88006476a000, ffff88006476a340)
The buggy address belongs to the page:
page:ffffea000191da80 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
flags: 0x4000000000008100(slab|head)
raw: 4000000000008100 0000000000000000 0000000000000000 0000000100080008
raw: 0000000000000000 0000000100000001 ffff88006bcf7a80 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88006476a080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88006476a100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88006476a180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                            ^
 ffff88006476a200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88006476a280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

Cc: syzkaller <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # 4.11
Fixes: 44c58487d5 ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 15:31:27 -07:00
Leon Romanovsky
1ff5325c3c RDMA/uverbs: Fix circular locking dependency
Avoid circular locking dependency by calling
to uobj_alloc_commit() outside of xrcd_tree_mutex lock.

======================================================
WARNING: possible circular locking dependency detected
4.15.0+ #87 Not tainted
------------------------------------------------------
syzkaller401056/269 is trying to acquire lock:
 (&uverbs_dev->xrcd_tree_mutex){+.+.}, at: [<000000006c12d2cd>] uverbs_free_xrcd+0xd2/0x360

but task is already holding lock:
 (&ucontext->uobjects_lock){+.+.}, at: [<00000000da010f09>] uverbs_cleanup_ucontext+0x168/0x730

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&ucontext->uobjects_lock){+.+.}:
       __mutex_lock+0x111/0x1720
       rdma_alloc_commit_uobject+0x22c/0x600
       ib_uverbs_open_xrcd+0x61a/0xdd0
       ib_uverbs_write+0x7f9/0xef0
       __vfs_write+0x10d/0x700
       vfs_write+0x1b0/0x550
       SyS_write+0xc7/0x1a0
       entry_SYSCALL_64_fastpath+0x1e/0x8b

-> #0 (&uverbs_dev->xrcd_tree_mutex){+.+.}:
       lock_acquire+0x19d/0x440
       __mutex_lock+0x111/0x1720
       uverbs_free_xrcd+0xd2/0x360
       remove_commit_idr_uobject+0x6d/0x110
       uverbs_cleanup_ucontext+0x2f0/0x730
       ib_uverbs_cleanup_ucontext.constprop.3+0x52/0x120
       ib_uverbs_close+0xf2/0x570
       __fput+0x2cd/0x8d0
       task_work_run+0xec/0x1d0
       do_exit+0x6a1/0x1520
       do_group_exit+0xe8/0x380
       SyS_exit_group+0x1e/0x20
       entry_SYSCALL_64_fastpath+0x1e/0x8b

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&ucontext->uobjects_lock);
                               lock(&uverbs_dev->xrcd_tree_mutex);
                               lock(&ucontext->uobjects_lock);
  lock(&uverbs_dev->xrcd_tree_mutex);

 *** DEADLOCK ***

3 locks held by syzkaller401056/269:
 #0:  (&file->cleanup_mutex){+.+.}, at: [<00000000c9f0c252>] ib_uverbs_close+0xac/0x570
 #1:  (&ucontext->cleanup_rwsem){++++}, at: [<00000000b6994d49>] uverbs_cleanup_ucontext+0xf6/0x730
 #2:  (&ucontext->uobjects_lock){+.+.}, at: [<00000000da010f09>] uverbs_cleanup_ucontext+0x168/0x730

stack backtrace:
CPU: 0 PID: 269 Comm: syzkaller401056 Not tainted 4.15.0+ #87
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
 dump_stack+0xde/0x164
 ? dma_virt_map_sg+0x22c/0x22c
 ? uverbs_cleanup_ucontext+0x168/0x730
 ? console_unlock+0x502/0xbd0
 print_circular_bug.isra.24+0x35e/0x396
 ? print_circular_bug_header+0x12e/0x12e
 ? find_usage_backwards+0x30/0x30
 ? entry_SYSCALL_64_fastpath+0x1e/0x8b
 validate_chain.isra.28+0x25d1/0x40c0
 ? check_usage+0xb70/0xb70
 ? graph_lock+0x160/0x160
 ? find_usage_backwards+0x30/0x30
 ? cyc2ns_read_end+0x10/0x10
 ? print_irqtrace_events+0x280/0x280
 ? __lock_acquire+0x93d/0x1630
 __lock_acquire+0x93d/0x1630
 lock_acquire+0x19d/0x440
 ? uverbs_free_xrcd+0xd2/0x360
 __mutex_lock+0x111/0x1720
 ? uverbs_free_xrcd+0xd2/0x360
 ? uverbs_free_xrcd+0xd2/0x360
 ? __mutex_lock+0x828/0x1720
 ? mutex_lock_io_nested+0x1550/0x1550
 ? uverbs_cleanup_ucontext+0x168/0x730
 ? __lock_acquire+0x9a9/0x1630
 ? mutex_lock_io_nested+0x1550/0x1550
 ? uverbs_cleanup_ucontext+0xf6/0x730
 ? lock_contended+0x11a0/0x11a0
 ? uverbs_free_xrcd+0xd2/0x360
 uverbs_free_xrcd+0xd2/0x360
 remove_commit_idr_uobject+0x6d/0x110
 uverbs_cleanup_ucontext+0x2f0/0x730
 ? sched_clock_cpu+0x18/0x200
 ? uverbs_close_fd+0x1c0/0x1c0
 ib_uverbs_cleanup_ucontext.constprop.3+0x52/0x120
 ib_uverbs_close+0xf2/0x570
 ? ib_uverbs_remove_one+0xb50/0xb50
 ? ib_uverbs_remove_one+0xb50/0xb50
 __fput+0x2cd/0x8d0
 task_work_run+0xec/0x1d0
 do_exit+0x6a1/0x1520
 ? fsnotify_first_mark+0x220/0x220
 ? exit_notify+0x9f0/0x9f0
 ? entry_SYSCALL_64_fastpath+0x5/0x8b
 ? entry_SYSCALL_64_fastpath+0x5/0x8b
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 ? time_hardirqs_on+0x27/0x670
 ? time_hardirqs_off+0x27/0x490
 ? syscall_return_slowpath+0x6c/0x460
 ? entry_SYSCALL_64_fastpath+0x5/0x8b
 do_group_exit+0xe8/0x380
 SyS_exit_group+0x1e/0x20
 entry_SYSCALL_64_fastpath+0x1e/0x8b
RIP: 0033:0x431ce9

Cc: syzkaller <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # 4.11
Fixes: fd3c7904db ("IB/core: Change idr objects to use the new schema")
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 15:31:27 -07:00
Leon Romanovsky
5c2e1c4f92 RDMA/uverbs: Fix bad unlock balance in ib_uverbs_close_xrcd
There is no matching lock for this mutex. Git history suggests this is
just a missed remnant from an earlier version of the function before
this locking was moved into uverbs_free_xrcd.

Originally this lock was protecting the xrcd_table_delete()

=====================================
WARNING: bad unlock balance detected!
4.15.0+ #87 Not tainted
-------------------------------------
syzkaller223405/269 is trying to release lock (&uverbs_dev->xrcd_tree_mutex) at:
[<00000000b8703372>] ib_uverbs_close_xrcd+0x195/0x1f0
but there are no more locks to release!

other info that might help us debug this:
1 lock held by syzkaller223405/269:
 #0:  (&uverbs_dev->disassociate_srcu){....}, at: [<000000005af3b960>] ib_uverbs_write+0x265/0xef0

stack backtrace:
CPU: 0 PID: 269 Comm: syzkaller223405 Not tainted 4.15.0+ #87
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
 dump_stack+0xde/0x164
 ? dma_virt_map_sg+0x22c/0x22c
 ? ib_uverbs_write+0x265/0xef0
 ? console_unlock+0x502/0xbd0
 ? ib_uverbs_close_xrcd+0x195/0x1f0
 print_unlock_imbalance_bug+0x131/0x160
 lock_release+0x59d/0x1100
 ? ib_uverbs_close_xrcd+0x195/0x1f0
 ? lock_acquire+0x440/0x440
 ? lock_acquire+0x440/0x440
 __mutex_unlock_slowpath+0x88/0x670
 ? wait_for_completion+0x4c0/0x4c0
 ? rdma_lookup_get_uobject+0x145/0x2f0
 ib_uverbs_close_xrcd+0x195/0x1f0
 ? ib_uverbs_open_xrcd+0xdd0/0xdd0
 ib_uverbs_write+0x7f9/0xef0
 ? cyc2ns_read_end+0x10/0x10
 ? ib_uverbs_open_xrcd+0xdd0/0xdd0
 ? uverbs_devnode+0x110/0x110
 ? cyc2ns_read_end+0x10/0x10
 ? cyc2ns_read_end+0x10/0x10
 ? sched_clock_cpu+0x18/0x200
 __vfs_write+0x10d/0x700
 ? uverbs_devnode+0x110/0x110
 ? kernel_read+0x170/0x170
 ? __fget+0x358/0x5d0
 ? security_file_permission+0x93/0x260
 vfs_write+0x1b0/0x550
 SyS_write+0xc7/0x1a0
 ? SyS_read+0x1a0/0x1a0
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 entry_SYSCALL_64_fastpath+0x1e/0x8b
RIP: 0033:0x4335c9

Cc: syzkaller <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # 4.11
Fixes: fd3c7904db ("IB/core: Change idr objects to use the new schema")
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 15:31:27 -07:00
Leon Romanovsky
0cba0efcc7 RDMA/restrack: Increment CQ restrack object before committing
Once the uobj is committed it is immediately possible another thread
could destroy it, which worst case, can result in a use-after-free
of the restrack objects.

Cc: syzkaller <syzkaller@googlegroups.com>
Fixes: 08f294a152 ("RDMA/core: Add resource tracking for create and destroy CQs")
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 15:31:26 -07:00
Leon Romanovsky
3f802b162d RDMA/uverbs: Protect from command mask overflow
The command number is not bounds checked against the command mask before it
is shifted, resulting in an ubsan hit. This does not cause malfunction since
the command number is eventually bounds checked, but we can make this ubsan
clean by moving the bounds check to before the mask check.

================================================================================
UBSAN: Undefined behaviour in
drivers/infiniband/core/uverbs_main.c:647:21
shift exponent 207 is too large for 64-bit type 'long long unsigned int'
CPU: 0 PID: 446 Comm: syz-executor3 Not tainted 4.15.0-rc2+ #61
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
dump_stack+0xde/0x164
? dma_virt_map_sg+0x22c/0x22c
ubsan_epilogue+0xe/0x81
__ubsan_handle_shift_out_of_bounds+0x293/0x2f7
? debug_check_no_locks_freed+0x340/0x340
? __ubsan_handle_load_invalid_value+0x19b/0x19b
? lock_acquire+0x440/0x440
? lock_acquire+0x19d/0x440
? __might_fault+0xf4/0x240
? ib_uverbs_write+0x68d/0xe20
ib_uverbs_write+0x68d/0xe20
? __lock_acquire+0xcf7/0x3940
? uverbs_devnode+0x110/0x110
? cyc2ns_read_end+0x10/0x10
? sched_clock_cpu+0x18/0x200
? sched_clock_cpu+0x18/0x200
__vfs_write+0x10d/0x700
? uverbs_devnode+0x110/0x110
? kernel_read+0x170/0x170
? __fget+0x35b/0x5d0
? security_file_permission+0x93/0x260
vfs_write+0x1b0/0x550
SyS_write+0xc7/0x1a0
? SyS_read+0x1a0/0x1a0
? trace_hardirqs_on_thunk+0x1a/0x1c
entry_SYSCALL_64_fastpath+0x18/0x85
RIP: 0033:0x448e29
RSP: 002b:00007f033f567c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007f033f5686bc RCX: 0000000000448e29
RDX: 0000000000000060 RSI: 0000000020001000 RDI: 0000000000000012
RBP: 000000000070bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000000056a0 R14: 00000000006e8740 R15: 0000000000000000
================================================================================

Cc: syzkaller <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # 4.5
Fixes: 2dbd5186a3 ("IB/core: IB/core: Allow legacy verbs through extended interfaces")
Reported-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 15:31:26 -07:00
Jason Gunthorpe
ec6f8401c4 IB/uverbs: Fix unbalanced unlock on error path for rdma_explicit_destroy
If remove_commit fails then the lock is left locked while the uobj still
exists. Eventually the kernel will deadlock.

lockdep detects this and says:

 test/4221 is leaving the kernel with locks still held!
 1 lock held by test/4221:
  #0:  (&ucontext->cleanup_rwsem){.+.+}, at: [<000000001e5c7523>] rdma_explicit_destroy+0x37/0x120 [ib_uverbs]

Fixes: 4da70da23e ("IB/core: Explicitly destroy an object while keeping uobject")
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 15:31:26 -07:00
Jason Gunthorpe
104f268d43 IB/uverbs: Improve lockdep_check
This is really being used as an assert that the expected usecnt
is being held and implicitly that the usecnt is valid. Rename it to
assert_uverbs_usecnt and tighten the checks to only accept valid
values of usecnt (eg 0 and < -1 are invalid).

The tigher checkes make the assertion cover more cases and is more
likely to find bugs via syzkaller/etc.

Fixes: 3832125624 ("IB/core: Add support for idr types")
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 14:59:47 -07:00
Leon Romanovsky
6623e3e3cd RDMA/uverbs: Protect from races between lookup and destroy of uobjects
The race is between lookup_get_idr_uobject and
uverbs_idr_remove_uobj -> uverbs_uobject_put.

We deliberately do not call sychronize_rcu after the idr_remove in
uverbs_idr_remove_uobj for performance reasons, instead we call
kfree_rcu() during uverbs_uobject_put.

However, this means we can obtain pointers to uobj's that have
already been released and must protect against krefing them
using kref_get_unless_zero.

==================================================================
BUG: KASAN: use-after-free in copy_ah_attr_from_uverbs.isra.2+0x860/0xa00
Read of size 4 at addr ffff88005fda1ac8 by task syz-executor2/441

CPU: 1 PID: 441 Comm: syz-executor2 Not tainted 4.15.0-rc2+ #56
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
dump_stack+0x8d/0xd4
print_address_description+0x73/0x290
kasan_report+0x25c/0x370
? copy_ah_attr_from_uverbs.isra.2+0x860/0xa00
copy_ah_attr_from_uverbs.isra.2+0x860/0xa00
? uverbs_try_lock_object+0x68/0xc0
? modify_qp.isra.7+0xdc4/0x10e0
modify_qp.isra.7+0xdc4/0x10e0
ib_uverbs_modify_qp+0xfe/0x170
? ib_uverbs_query_qp+0x970/0x970
? __lock_acquire+0xa11/0x1da0
ib_uverbs_write+0x55a/0xad0
? ib_uverbs_query_qp+0x970/0x970
? ib_uverbs_query_qp+0x970/0x970
? ib_uverbs_open+0x760/0x760
? futex_wake+0x147/0x410
? sched_clock_cpu+0x18/0x180
? check_prev_add+0x1680/0x1680
? do_futex+0x3b6/0xa30
? sched_clock_cpu+0x18/0x180
__vfs_write+0xf7/0x5c0
? ib_uverbs_open+0x760/0x760
? kernel_read+0x110/0x110
? lock_acquire+0x370/0x370
? __fget+0x264/0x3b0
vfs_write+0x18a/0x460
SyS_write+0xc7/0x1a0
? SyS_read+0x1a0/0x1a0
? trace_hardirqs_on_thunk+0x1a/0x1c
entry_SYSCALL_64_fastpath+0x18/0x85
RIP: 0033:0x448e29
RSP: 002b:00007f443fee0c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007f443fee16bc RCX: 0000000000448e29
RDX: 0000000000000078 RSI: 00000000209f8000 RDI: 0000000000000012
RBP: 000000000070bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000008e98 R14: 00000000006ebf38 R15: 0000000000000000

Allocated by task 1:
kmem_cache_alloc_trace+0x16c/0x2f0
mlx5_alloc_cmd_msg+0x12e/0x670
cmd_exec+0x419/0x1810
mlx5_cmd_exec+0x40/0x70
mlx5_core_mad_ifc+0x187/0x220
mlx5_MAD_IFC+0xd7/0x1b0
mlx5_query_mad_ifc_gids+0x1f3/0x650
mlx5_ib_query_gid+0xa4/0xc0
ib_query_gid+0x152/0x1a0
ib_query_port+0x21e/0x290
mlx5_port_immutable+0x30f/0x490
ib_register_device+0x5dd/0x1130
mlx5_ib_add+0x3e7/0x700
mlx5_add_device+0x124/0x510
mlx5_register_interface+0x11f/0x1c0
mlx5_ib_init+0x56/0x61
do_one_initcall+0xa3/0x250
kernel_init_freeable+0x309/0x3b8
kernel_init+0x14/0x180
ret_from_fork+0x24/0x30

Freed by task 1:
kfree+0xeb/0x2f0
mlx5_free_cmd_msg+0xcd/0x140
cmd_exec+0xeba/0x1810
mlx5_cmd_exec+0x40/0x70
mlx5_core_mad_ifc+0x187/0x220
mlx5_MAD_IFC+0xd7/0x1b0
mlx5_query_mad_ifc_gids+0x1f3/0x650
mlx5_ib_query_gid+0xa4/0xc0
ib_query_gid+0x152/0x1a0
ib_query_port+0x21e/0x290
mlx5_port_immutable+0x30f/0x490
ib_register_device+0x5dd/0x1130
mlx5_ib_add+0x3e7/0x700
mlx5_add_device+0x124/0x510
mlx5_register_interface+0x11f/0x1c0
mlx5_ib_init+0x56/0x61
do_one_initcall+0xa3/0x250
kernel_init_freeable+0x309/0x3b8
kernel_init+0x14/0x180
ret_from_fork+0x24/0x30

The buggy address belongs to the object at ffff88005fda1ab0
which belongs to the cache kmalloc-32 of size 32
The buggy address is located 24 bytes inside of
32-byte region [ffff88005fda1ab0, ffff88005fda1ad0)
The buggy address belongs to the page:
page:00000000d5655c19 count:1 mapcount:0 mapping: (null)
index:0xffff88005fda1fc0
flags: 0x4000000000000100(slab)
raw: 4000000000000100 0000000000000000 ffff88005fda1fc0 0000000180550008
raw: ffffea00017f6780 0000000400000004 ffff88006c803980 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88005fda1980: fc fc fb fb fb fb fc fc fb fb fb fb fc fc fb fb
ffff88005fda1a00: fb fb fc fc fb fb fb fb fc fc 00 00 00 00 fc fc
ffff88005fda1a80: fb fb fb fb fc fc fb fb fb fb fc fc fb fb fb fb
ffff88005fda1b00: fc fc 00 00 00 00 fc fc fb fb fb fb fc fc fb fb
ffff88005fda1b80: fb fb fc fc fb fb fb fb fc fc fb fb fb fb fc fc
==================================================================@

Cc: syzkaller <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # 4.11
Fixes: 3832125624 ("IB/core: Add support for idr types")
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 14:59:47 -07:00
Jason Gunthorpe
d9dc7a3500 IB/uverbs: Hold the uobj write lock after allocate
This clarifies the design intention that time between allocate and
commit has the uobj exclusive to the caller. We already guarantee
this by delaying publishing the uobj pointer via idr_insert,
fd_install, list_add, etc.

Additionally holding the usecnt lock during this period provides
extra clarity and more protection against future mistakes.

Fixes: 3832125624 ("IB/core: Add support for idr types")
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 14:59:46 -07:00
Matan Barak
4d39a959bc IB/uverbs: Fix possible oops with duplicate ioctl attributes
If the same attribute is listed twice by the user in the ioctl attribute
list then error unwind can cause the kernel to deref garbage.

This happens when an object with WRITE access is sent twice. The second
parse properly fails but corrupts the state required for the error unwind
it triggers.

Fixing this by making duplicates in the attribute list invalid. This is
not something we need to support.

The ioctl interface is currently recommended to be disabled in kConfig.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 14:59:46 -07:00
Matan Barak
9dfb2ff400 IB/uverbs: Add ioctl support for 32bit processes
32 bit processes running on a 64 bit kernel call compat_ioctl so that
implementations can revise any structure layout issues. Point compat_ioctl
at our normal ioctl because:

- All our structures are designed to be the same on 32 and 64 bit, ie we
  use __aligned_u64 when required and are careful to manage padding.

- Any pointers are stored in u64's and userspace is expected
  to prepare them properly.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 14:59:46 -07:00
Matan Barak
3d89459e2e IB/uverbs: Fix method merging in uverbs_ioctl_merge
Fix a bug in uverbs_ioctl_merge that looked at the object's iterator
number instead of the method's iterator number when merging methods.

While we're at it, make the uverbs_ioctl_merge code a bit more clear
and faster.

Fixes: 118620d368 ('IB/core: Add uverbs merge trees functionality')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 14:59:45 -07:00
Jason Gunthorpe
2f36028ce9 IB/uverbs: Use u64_to_user_ptr() not a union
The union approach will get the endianness wrong sometimes if the kernel's
pointer size is 32 bits resulting in EFAULTs when trying to copy to/from
user.

Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 14:59:45 -07:00
Jason Gunthorpe
6c976c30ad IB/uverbs: Use inline data transfer for UHW_IN
The rule for the API is pointers less than 8 bytes are inlined into
the .data field of the attribute. Fix the creation of the driver udata
struct to follow this rule and point to the .data itself when the size
is less than 8 bytes.

Otherwise if the UHW struct is less than 8 bytes the driver will get
EFAULT during copy_from_user.

Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 14:59:44 -07:00
Matan Barak
89d9e8d3f1 IB/uverbs: Always use the attribute size provided by the user
This fixes several bugs around the copy_to/from user path:
 - copy_to used the user provided size of the attribute
   and could copy data beyond the end of the kernel buffer into
   userspace.
 - copy_from didn't know the size of the kernel buffer and
   could have left kernel memory unexpectedly un-initialized.
 - copy_from did not use the user length to determine if the
   attribute data is inlined or not.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 14:59:44 -07:00
Leon Romanovsky
415bb699d7 RDMA/restrack: Remove unimplemented XRCD object
Resource tracking of XRCD objects is not implemented in current
version of restrack and hence can be removed.

Fixes: 02d8883f52 ("RDMA/restrack: Add general infrastructure to track RDMA resources")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 14:59:44 -07:00
Linus Torvalds
a9a08845e9 vfs: do bulk POLL* -> EPOLL* replacement
This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:

    for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
        L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
        for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
    done

with de-mangling cleanups yet to come.

NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do.  But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.

The next patch from Al will sort out the final differences, and we
should be all done.

Scripted-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-11 14:34:03 -08:00
Linus Torvalds
2246edfaf8 Second pull request for 4.16 merge window
- Clean up some function signatures in rxe for clarity
 - Tidy the RDMA netlink header to remove unimplemented constants
 - bnxt_re driver fixes, one is a regression this window.
 - Minor hns driver fixes
 - Various fixes from Dan Carpenter and his tool
 - Fix IRQ cleanup race in HFI1
 - HF1 performance optimizations and a fix to report counters in the right units
 - Fix for an IPoIB startup sequence race with the external manager
 - Oops fix for the new kabi path
 - Endian cleanups for hns
 - Fix for mlx5 related to the new automatic affinity support
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaePL/AAoJELgmozMOVy/dqhsQALUhzDuuJ+/F6supjmyqZG53
 Ak/PoFjTmHToGQfDq/1TRzyKwMx12aB2l6WGZc31FzhvCw4daPWkoEVKReNWUUJ+
 fmESxjLgo8ZRGSqpNxn9Q8agE/I/5JZQoA8bCFCYgdZPKTPNKdtAVBphpdhmrOX4
 ygjABikWf/wBsNF1A8lnX9xkfPO21cPHrFQLTnuOzOT/hc6U+PPklHSQCnS91svh
 1+Pqjtssg54rxYkJqiFq3giSnfwvmAXO8WyVGmRRPFGLpB0nIjq0Sl6ZgLLClz7w
 YJdiBGr7rlnNMgGCjlPU2ZO3lO6J0ytXQzFNqRqvKryXQOv+uVeJgep7WqHTcdQU
 UN30FCKQMgLL/F6NF8wKaKcK4X0VgXQa7gpuH2fVSXF0c3LO3/mmWNjixbGSzT2c
 Wj+EW3eOKlTddhRLhgbMOdwc32tIGhaD85z2F4+FZO+XI9ZQtJaDewWVDjYoumP/
 RlDIFw+KCgSq7+UZL8CoXuh0BuS1nu9TGfkx1HW0DLMF1+yigNiswpUfksV4cISP
 JqE2I3yH0A4UobD/a+f9IhIfk2MjxO0tJWNjU8IA9LXgUFlskQ6MpH/AcE9G8JNv
 tlfLGR3s4PJa/7j/Iy2F84og/b/KH8v7vyj4Eknq/hLq63/BiM5wj0AUBRrGulN6
 HhAMOegxGZ7IKP/y0L7I
 =xwZz
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull more rdma updates from Doug Ledford:
 "Items of note:

   - two patches fix a regression in the 4.15 kernel. The 4.14 kernel
     worked fine with NVMe over Fabrics and mlx5 adapters. That broke in
     4.15. The fix is here.

   - one of the patches (the endian notation patch from Lijun) looks
     like a lot of lines of change, but it's mostly mechanical in
     nature. It amounts to the biggest chunk of change in it (it's about
     2/3rds of the overall pull request).

  Summary:

   - Clean up some function signatures in rxe for clarity

   - Tidy the RDMA netlink header to remove unimplemented constants

   - bnxt_re driver fixes, one is a regression this window.

   - Minor hns driver fixes

   - Various fixes from Dan Carpenter and his tool

   - Fix IRQ cleanup race in HFI1

   - HF1 performance optimizations and a fix to report counters in the right units

   - Fix for an IPoIB startup sequence race with the external manager

   - Oops fix for the new kabi path

   - Endian cleanups for hns

   - Fix for mlx5 related to the new automatic affinity support"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (38 commits)
  net/mlx5: increase async EQ to avoid EQ overrun
  mlx5: fix mlx5_get_vector_affinity to start from completion vector 0
  RDMA/hns: Fix the endian problem for hns
  IB/uverbs: Use the standard kConfig format for experimental
  IB: Update references to libibverbs
  IB/hfi1: Add 16B rcvhdr trace support
  IB/hfi1: Convert kzalloc_node and kcalloc to use kcalloc_node
  IB/core: Avoid a potential OOPs for an unused optional parameter
  IB/core: Map iWarp AH type to undefined in rdma_ah_find_type
  IB/ipoib: Fix for potential no-carrier state
  IB/hfi1: Show fault stats in both TX and RX directions
  IB/hfi1: Remove blind constants from 16B update
  IB/hfi1: Convert PortXmitWait/PortVLXmitWait counters to flit times
  IB/hfi1: Do not override given pcie_pset value
  IB/hfi1: Optimize process_receive_ib()
  IB/hfi1: Remove unnecessary fecn and becn fields
  IB/hfi1: Look up ibport using a pointer in receive path
  IB/hfi1: Optimize packet type comparison using 9B and bypass code paths
  IB/hfi1: Compute BTH only for RDMA_WRITE_LAST/SEND_LAST packet
  IB/hfi1: Remove dependence on qp->s_hdrwords
  ...
2018-02-06 11:09:45 -08:00
Michael J. Ruhl
2ff124d597 IB/core: Avoid a potential OOPs for an unused optional parameter
The ev_file is an optional parameter for CQ creation. If the parameter
is not passed, the ev_file pointer will be NULL.  Using that pointer
to set the cq_context will result in an OOPs.

Verify that ev_file is not NULL before using.

Cc: <stable@vger.kernel.org> # 4.14.x
Fixes: 9ee79fce36 ("IB/core: Add completion queue (cq) object actions")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:43:32 -07:00
Dan Carpenter
f34727a135 RDMA/nldev: missing error code in nldev_res_get_doit()
We should return -ENOMEM if the allocation fails.  The current code
accidentally returns success.

Fixes: bf3c5a93c5 ("RDMA/nldev: Provide global resource utilization")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:24:32 -07:00
Linus Torvalds
7b1cd95d65 First merge window pull request for 4.16
- Misc small driver fixups to
   bnxt_re/hfi1/qib/hns/ocrdma/rdmavt/vmw_pvrdma/nes
 - Several major feature adds to bnxt_re driver: SRIOV VF RoCE support,
   HugePages support, extended hardware stats support, and SRQ support
 - A notable number of fixes to the i40iw driver from debugging scale up
   testing
 - More work to enable the new hip08 chip in the hns driver
 - Misc small ULP fixups to srp/srpt//ipoib
 - Preparation for srp initiator and target to support the RDMA-CM
   protocol for connections
 - Add RDMA-CM support to srp initiator, srp target is still a WIP
 - Fixes for a couple of places where ipoib could spam the dmesg log
 - Fix encode/decode of FDR/EDR data rates in the core
 - Many patches from Parav with ongoing work to clean up inconsistencies
   and bugs in RoCE support around the rdma_cm
 - mlx5 driver support for the userspace features 'thread domain', 'wallclock
   timestamps' and 'DV Direct Connected transport'. Support for the firmware
   dual port rocee capability
 - Core support for more than 32 rdma devices in the char dev allocation
 - kernel doc updates from Randy Dunlap
 - New netlink uAPI for inspecting RDMA objects similar in spirit to 'ss'
 - One minor change to the kobject code acked by GKH
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJacfljAAoJEDht9xV+IJsaUnwP+QFJvfIDEfRlfU2rTmcfymPs
 Rz9bW1KLgETcJx/XOE2ba2DOaqdFr56TLflsDfEfOSIL8AtzBQqH3vTqEj49bBP7
 4JZAkzWllUS/qoYD2XmvOM0IrIfFXzZtLM/lzLi+5dwK26x3GAB9hHXpKzUrJ1vj
 I1Naq14qOFXoNBndEtZJqtIKOhR/Pnd6YtxAiNCmViZGdqm3DIU3D4VJhU5B7pO9
 j6ovJs16wfJl/gV1iiz9xO49ViVFpwzSIzYE/Q2ZCegcrsF3EEVN2J4vZHkKgDuN
 0/Ar/WOvkPzKBFR8hJ7M4kwp0Fy/69/U49s7kpGNxdhML9sU3+Qfse6JYGj0M9L8
 01gTM0SShyAZMNAvjVFbIKLQPg806OAit4cooMwlObbwJ6b7B8K0uN17/uVIkIqp
 gXqertyl1BLhUtTOby/8Fox/f/oEvaZksKiwcTKSb7D1Y5jGZZUPRknJ5SwAFWQB
 RiTPJ6mY7BUsM9zuYQtRE8x2mpgIezYXFcrAz7iT76WuoZQgo1QLIyYRM1+MlhnC
 wNrp5BtqoVfW2Ps0CbSdxJ9vDtDf3cwLg0RzcCB8+NJJccsRD9IVMDev/TDY5k9U
 M9LxxtW3WuulRWgliU0Q9VaswUQoIao16vBMVL7GwUm+ClLvbRVoPe8jxgtfk+W3
 GAANAI7Kv/vUoV/6CFfP
 =sMXV
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull RDMA subsystem updates from Jason Gunthorpe:
 "Overall this cycle did not have any major excitement, and did not
  require any shared branch with netdev.

  Lots of driver updates, particularly of the scale-up and performance
  variety. The largest body of core work was Parav's patches fixing and
  restructing some of the core code to make way for future RDMA
  containerization.

  Summary:

   - misc small driver fixups to
     bnxt_re/hfi1/qib/hns/ocrdma/rdmavt/vmw_pvrdma/nes

   - several major feature adds to bnxt_re driver: SRIOV VF RoCE
     support, HugePages support, extended hardware stats support, and
     SRQ support

   - a notable number of fixes to the i40iw driver from debugging scale
     up testing

   - more work to enable the new hip08 chip in the hns driver

   - misc small ULP fixups to srp/srpt//ipoib

   - preparation for srp initiator and target to support the RDMA-CM
     protocol for connections

   - add RDMA-CM support to srp initiator, srp target is still a WIP

   - fixes for a couple of places where ipoib could spam the dmesg log

   - fix encode/decode of FDR/EDR data rates in the core

   - many patches from Parav with ongoing work to clean up
     inconsistencies and bugs in RoCE support around the rdma_cm

   - mlx5 driver support for the userspace features 'thread domain',
     'wallclock timestamps' and 'DV Direct Connected transport'. Support
     for the firmware dual port rocee capability

   - core support for more than 32 rdma devices in the char dev
     allocation

   - kernel doc updates from Randy Dunlap

   - new netlink uAPI for inspecting RDMA objects similar in spirit to 'ss'

   - one minor change to the kobject code acked by Greg KH"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (259 commits)
  RDMA/nldev: Provide detailed QP information
  RDMA/nldev: Provide global resource utilization
  RDMA/core: Add resource tracking for create and destroy PDs
  RDMA/core: Add resource tracking for create and destroy CQs
  RDMA/core: Add resource tracking for create and destroy QPs
  RDMA/restrack: Add general infrastructure to track RDMA resources
  RDMA/core: Save kernel caller name when creating PD and CQ objects
  RDMA/core: Use the MODNAME instead of the function name for pd callers
  RDMA: Move enum ib_cq_creation_flags to uapi headers
  IB/rxe: Change RDMA_RXE kconfig to use select
  IB/qib: remove qib_keys.c
  IB/mthca: remove mthca_user.h
  RDMA/cm: Fix access to uninitialized variable
  RDMA/cma: Use existing netif_is_bond_master function
  IB/core: Avoid SGID attributes query while converting GID from OPA to IB
  RDMA/mlx5: Avoid memory leak in case of XRCD dealloc failure
  IB/umad: Fix use of unprotected device pointer
  IB/iser: Combine substrings for three messages
  IB/iser: Delete an unnecessary variable initialisation in iser_send_data_out()
  IB/iser: Delete an error message for a failed memory allocation in iser_send_data_out()
  ...
2018-01-31 12:05:10 -08:00
Linus Torvalds
168fe32a07 Merge branch 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull poll annotations from Al Viro:
 "This introduces a __bitwise type for POLL### bitmap, and propagates
  the annotations through the tree. Most of that stuff is as simple as
  'make ->poll() instances return __poll_t and do the same to local
  variables used to hold the future return value'.

  Some of the obvious brainos found in process are fixed (e.g. POLLIN
  misspelled as POLL_IN). At that point the amount of sparse warnings is
  low and most of them are for genuine bugs - e.g. ->poll() instance
  deciding to return -EINVAL instead of a bitmap. I hadn't touched those
  in this series - it's large enough as it is.

  Another problem it has caught was eventpoll() ABI mess; select.c and
  eventpoll.c assumed that corresponding POLL### and EPOLL### were
  equal. That's true for some, but not all of them - EPOLL### are
  arch-independent, but POLL### are not.

  The last commit in this series separates userland POLL### values from
  the (now arch-independent) kernel-side ones, converting between them
  in the few places where they are copied to/from userland. AFAICS, this
  is the least disruptive fix preserving poll(2) ABI and making epoll()
  work on all architectures.

  As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and
  it will trigger only on what would've triggered EPOLLWRBAND on other
  architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered
  at all on sparc. With this patch they should work consistently on all
  architectures"

* 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
  make kernel-side POLL... arch-independent
  eventpoll: no need to mask the result of epi_item_poll() again
  eventpoll: constify struct epoll_event pointers
  debugging printk in sg_poll() uses %x to print POLL... bitmap
  annotate poll(2) guts
  9p: untangle ->poll() mess
  ->si_band gets POLL... bitmap stored into a user-visible long field
  ring_buffer_poll_wait() return value used as return value of ->poll()
  the rest of drivers/*: annotate ->poll() instances
  media: annotate ->poll() instances
  fs: annotate ->poll() instances
  ipc, kernel, mm: annotate ->poll() instances
  net: annotate ->poll() instances
  apparmor: annotate ->poll() instances
  tomoyo: annotate ->poll() instances
  sound: annotate ->poll() instances
  acpi: annotate ->poll() instances
  crypto: annotate ->poll() instances
  block: annotate ->poll() instances
  x86: annotate ->poll() instances
  ...
2018-01-30 17:58:07 -08:00
Leon Romanovsky
b5fa635aab RDMA/nldev: Provide detailed QP information
Implement RDMA nldev netlink interface to get detailed information on each
QP in the system. This includes the owning process or kernel ULP and
detailed information from the qp_attrs.

Currently only the dumpit variant is implemented.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-29 20:21:41 -07:00
Leon Romanovsky
bf3c5a93c5 RDMA/nldev: Provide global resource utilization
Expose through the netlink interface the global per-device utilization of
the supported object types.

Provide both dumpit and doit callbacks.

As an example of possible output from rdmatool for system with 5
mlx5 cards:

$ rdma res
1: mlx5_0: qp 4 cq 5 pd 3
2: mlx5_1: qp 4 cq 5 pd 3
3: mlx5_2: qp 4 cq 5 pd 3
4: mlx5_3: qp 2 cq 3 pd 2
5: mlx5_4: qp 4 cq 5 pd 3

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-29 20:21:40 -07:00
Leon Romanovsky
9d5f8c209b RDMA/core: Add resource tracking for create and destroy PDs
Track create and destroy operations of PD objects.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-29 20:21:40 -07:00
Leon Romanovsky
08f294a152 RDMA/core: Add resource tracking for create and destroy CQs
Track create and destroy operations of CQ objects.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-29 20:21:40 -07:00
Leon Romanovsky
78a0cd648a RDMA/core: Add resource tracking for create and destroy QPs
Track create and destroy operations of QP objects.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-29 20:21:39 -07:00
Leon Romanovsky
02d8883f52 RDMA/restrack: Add general infrastructure to track RDMA resources
The RDMA subsystem has very strict set of objects to work with, but it
completely lacks tracking facilities and has no visibility of resource
utilization.

The following patch adds such infrastructure to keep track of RDMA
resources to help with debugging of user space applications. The primary
user of this infrastructure is RDMA nldev netlink (following patches), to
be exposed to userspace via rdmatool, but it is not limited too that.

At this stage, the main three objects (PD, CQ and QP) are added, and more
will be added later.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-29 20:21:39 -07:00
Leon Romanovsky
f66c8ba4c9 RDMA/core: Save kernel caller name when creating PD and CQ objects
The KBUILD_MODNAME variable contains the module name and it is known for
kernel users during compilation, so let's reuse it to track the owners.

Followup patches will store this for resource tracking.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-29 14:01:44 -07:00
Leon Romanovsky
925f7ea7a6 RDMA/cm: Fix access to uninitialized variable
The ndev will be initialized and held only for successful
ib_get_cached_gid(), otherwise it is garbage stack memory.
Calling dev_put() in failure path is wrong.

Fixes: 16c72e4028 ("IB/cm: Refactor to avoid setting path record software only fields")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-28 14:07:16 -07:00
Parav Pandit
3cd96fddcc RDMA/cma: Use existing netif_is_bond_master function
When checking whatever the current netdev is the bond master interface,
use kernel API netif_is_bond_master() instead of hardcoding the check.
No functionality is changed.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-28 14:07:16 -07:00
Parav Pandit
708ea056b3 IB/core: Avoid SGID attributes query while converting GID from OPA to IB
SGID attributes are not used during OPA to IB GID conversion.
Therefore don't query it.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-28 14:07:16 -07:00
Jack Morgenstein
f23a5350e4 IB/umad: Fix use of unprotected device pointer
The ib_write_umad() is protected by taking the umad file mutex.
However, it accesses file->port->ib_dev -- which is protected only by the
port's mutex (field file_mutex).

The ib_umad_remove_one() calls ib_umad_kill_port() which sets
port->ib_dev to NULL under the port mutex (NOT the file mutex).
It then sets the mad agent to "dead" under the umad file mutex.

This is a race condition -- because there is a window where
port->ib_dev is NULL, while the agent is not "dead".

As a result, we saw stack traces like:

[16490.678059] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0
[16490.678246] IP: ib_umad_write+0x29c/0xa3a [ib_umad]
[16490.678333] PGD 0 P4D 0
[16490.678404] Oops: 0000 [#1] SMP PTI
[16490.678466] Modules linked in: rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx4_en(OE) ptp pps_core mlx4_ib(OE-) ib_core(OE) mlx4_core(OE) mlx_compat
(OE) memtrack(OE) devlink mst_pciconf(OE) mst_pci(OE) netconsole nfsv3 nfs_acl nfs lockd grace fscache cfg80211 rfkill esp6_offload esp6 esp4_offload esp4 sunrpc kvm_intel kvm ppdev parport_pc irqbypass
parport joydev i2c_piix4 virtio_balloon cirrus drm_kms_helper ttm drm e1000 serio_raw virtio_pci virtio_ring virtio ata_generic pata_acpi qemu_fw_cfg [last unloaded: mlxfw]
[16490.679202] CPU: 4 PID: 3115 Comm: sminfo Tainted: G           OE   4.14.13-300.fc27.x86_64 #1
[16490.679339] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
[16490.679477] task: ffff9cf753890000 task.stack: ffffaf70c26b0000
[16490.679571] RIP: 0010:ib_umad_write+0x29c/0xa3a [ib_umad]
[16490.679664] RSP: 0018:ffffaf70c26b3d90 EFLAGS: 00010202
[16490.679747] RAX: 0000000000000010 RBX: ffff9cf75610fd80 RCX: 0000000000000000
[16490.679856] RDX: 0000000000000001 RSI: 00007ffdf2bfd714 RDI: ffff9cf6bb2a9c00

In the above trace, ib_umad_write is trying to dereference the NULL
file->port->ib_dev pointer.

Fix this by using the agent's device pointer (the device field
in struct ib_mad_agent) -- which IS protected by the umad file mutex.

Cc: <stable@vger.kernel.org> # v4.11
Fixes: 44c58487d5 ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-28 14:07:16 -07:00
Jason Gunthorpe
3624a8f025 RDMA/uverbs: Use an unambiguous errno for method not supported
Returning EOPNOTSUPP is problematic because it can also be
returned by the method function, and we use it in quite a few
places in drivers these days.

Instead, dedicate EPROTONOSUPPORT to indicate that the ioctl framework
is enabled but the requested object and method are not supported by
the kernel. No other case will return this code, and it lets userspace
know to fall back to write().

grep says we do not use it today in drivers/infiniband subsystem.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-25 10:57:29 -05:00
Parav Pandit
052eac6eeb RDMA/cma: Update RoCE multicast routines to use net namespace
rdma_dev_addr contains the net namespace pointer, while referring
bound_dev_if of the rdma_dev_addr, refer to the net namespace of
rdma_cm_id stored in rdma_dev_addr.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-22 11:39:50 -07:00
Parav Pandit
66c74d746d RDMA/cma: Update cma_validate_port to honor net namespace
cma_validate_port uses rdma_dev_addr to validate the port of the cm_id.
It needs to honor the net namespace which is setup during cm_id creation
when finding netdevice.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-22 11:39:50 -07:00
Parav Pandit
2493a57bc1 RDMA/cma: Refactor to access multiple fields of rdma_dev_addr
Pass the rdma_cm_id so that multiple fields of the rdma_dev_addr
structure can be accessed, instead of passing each individual fields.

This is needed to access some additional fields in followup patches.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-22 11:39:50 -07:00
Parav Pandit
00db63c128 RDMA/cma: Check existence of netdevice during port validation
If valid netdevice is not found for RoCE, GID table should not be
searched with NULL netdevice.

Doing so causes the search routines to ignore the netdev argument and may
match the wrong GID table entry if the netdev is deleted.

Fixes: abae1b71dd ("IB/cma: cma_validate_port should verify the port and netdevice")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-22 11:39:50 -07:00
Parav Pandit
7a2f64ee4a RDMA/ucma: Use rdma cm API to query GID
Make use of rdma_read_gids() API to read SGID and DGID which returns
correct GIDs for RoCE and other transports.

rdma_addr_get_dgid() for RoCE for client side connections returns MAC
address, instead of DGID.
rdma_addr_get_sgid() for RoCE doesn't return correct SGID for IPv6 and
when more than one IP address is assigned to the netdevice.

Therefore use transport agnostic rdma_read_gids() API provided by rdma_cm
module.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-19 13:05:38 -07:00
Parav Pandit
411460ac50 RDMA/cma: Introduce API to read GIDs for multiple transports
This patch introduces an API that allows legacy applications to query
GIDs for a rdma_cm_id which is used during connection establishment.

GIDs are stored and created differently for iWarp, IB and RoCE transports.
Therefore rdma_read_gids() returns GID for all the transports hiding
such internal details to caller.
It is usable for client side and server side connections.

In general continued use of GID based addressing outside of IB is
discouraged, so rdma_read_gids() should not be used by any new ULPs.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-19 13:05:38 -07:00
Sagi Grimberg
246d8b184c IB/cq: Don't force IB_POLL_DIRECT poll context for ib_process_cq_direct
polling the completion queue directly does not interfere
with the existing polling logic, hence drop the requirement.
Be aware that running ib_process_cq_direct with non IB_POLL_DIRECT
CQ may trigger concurrent CQ processing.

This can be used for polling mode ULPs.

Cc: Bart Van Assche <bart.vanassche@wdc.com>
Reported-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
[maxg: added wcs array argument to __ib_process_cq]
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-18 14:49:20 -05:00
Max Gurtovoy
aaebd377c0 IB/core: postpone WR initialization during queue drain
No need to initialize completion and WR in case we fail
during QP modification.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-18 14:49:20 -05:00
Xiongfeng Wang
979a459c83 IB/cma: use strlcpy() instead of strncpy()
gcc-8 reports

drivers/infiniband/core/cma_configfs.c: In function 'make_cma_dev':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' specified
bound 64 equals destination size [-Wstringop-truncation]

We need to use strlcpy() to make sure the string is nul-terminated.

Signed-off-by: Xiongfeng Wang <xiongfeng.wang@linaro.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15 15:33:21 -07:00
Jason Gunthorpe
c966ea12c0 RDMA: Mark imm_data as be32 in the verbs uapi header
This matches what the userspace copy of this header has been doing
for a while. imm_data is an opaque 4 byte array carried over the network,
and invalidate_rkey is in CPU byte order.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15 15:33:21 -07:00
Parav Pandit
a6753c4d62 IB/core: Limit DMAC resolution to RoCE Connected QPs
Resolving DMAC for RoCE is applicable to only Connected mode QPs.
So resolve DMAC for only for Connected mode QPs.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15 15:33:21 -07:00
Parav Pandit
f2290d6d52 IB/core: Attempt DMAC resolution for only RoCE
Instead of returning 0 (success) for RoCE scenarios where DMAC should
not be resolved, avoid such attempt and make code consistent with
ib_create_user_ah().

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15 15:33:21 -07:00
Parav Pandit
b96ac05a87 IB/core: Limit DMAC resolution to userspace QPs
Currently ah_attr is initialized by the ib_cm layer for rdma_cm
based applications. For RoCE transport ah_attr.roce.dmac is already
initialized by ib_cm, rdma_cm either from wc, path record, route
resolve, explicit path record setting depending on active or passive
side QP. Therefore avoid resolving DMAC for QP of kernel consumers.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15 15:33:21 -07:00
Parav Pandit
b2bedfb395 IB/core: Perform modify QP on real one
Currently qp->port stores the port number whenever IB_QP_PORT
QP attribute mask is set (during QP state transition to INIT state).
This port number should be stored for the real QP when XRC target QP
is used.

Follow the ib_modify_qp() implementation and hide the access to ->real_qp.

Fixes: a512c2fbef ("IB/core: Introduce modify QP operation with udata")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15 15:33:21 -07:00
Randy Dunlap
92f617038e infiniband: fix core/fmr_pool.c kernel-doc notation
Fix kernel-doc warning for ib_fmr_pool_map_phys() and also format it
with function description and text spacing.

../drivers/infiniband/core/fmr_pool.c:404: warning: Excess function parameter 'pool' description in 'ib_fmr_pool_map_phys'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-10 22:00:34 -07:00
Randy Dunlap
1f58621e40 infiniband: fix core/verbs.c kernel-doc notation
Change function parameter name in kernel-doc notation and other comments
to eliminate a kernel-doc warning.

../drivers/infiniband/core/verbs.c:1790: warning: Excess function parameter 'wq_init_attr' description in 'ib_create_wq'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-10 22:00:34 -07:00
Parav Pandit
89838118a5 RDMA/cma: Fix rdma_cm path querying for RoCE
The 'if' logic in ucma_query_path was broken with OPA was introduced
and started to treat RoCE paths as as OPA paths. Invert the logic
of the 'if' so only OPA paths are treated as OPA paths.

Otherwise the path records returned to rdma_cma users are mangled
when in RoCE mode.

Fixes: 5752075144 ("IB/SA: Add OPA path record type")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-10 22:00:34 -07:00
Parav Pandit
8d20a1f0ec RDMA/cma: Fix rdma_cm raw IB path setting for RoCE
rdma_set_ib_path() missed setting path record fields for RoCE
transport when RoCE support was added.

This results in setting incorrect ndev, destination mac address,
incorrect GID type etc errors when user space attempts to set a raw
IB path using the roce IB path compatibility mapping from userspace.

Fixes: 3c86aa70bf ("RDMA/cm: Add RDMA CM support for IBoE devices")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-10 22:00:33 -07:00
Parav Pandit
fe75889f27 RDMA/{cma, ucma}: Simplify and rename rdma_set_ib_paths
Since 2006 there has been no user of rdmacm based application to make use
of setting multiple path records using rdma_set_ib_paths API.

Therefore code is simplified to allow setting one path record entry.
Now that it sets only single path, it is renamed to reflect the same.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-10 22:00:33 -07:00
Parav Pandit
9327c7afdc RDMA/cma: Provide a function to set RoCE path record L2 parameters
Introduce a helper function to set path record L2 fields for RoCE.
This includes setting GID type, destination mac address and netdev
ifindex.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-10 22:00:33 -07:00
Parav Pandit
608bc44634 RDMA/cma: Use the right net namespace for the rdma_cm_id
The net namespace is set in addr during create_rdma_id(),
cma_resolve_iboe_route() should use that instead of the
init namespace.

The original code was added in commit fa20105e09 ("IB/cma: Add support
for network namespaces"), but this path wasn't in use back then.

This patch updates the code to use right namespace, as preparation
for improving namespace support.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-10 22:00:33 -07:00
Huy Nguyen
8cf12d7780 IB/core: Increase number of char device minors
There is a need to increase number of possible char devices to support
large number of SR-IOV instances. The current limit is in the range of
64-128 devices/ports. Increase it to support up to 1024.

The patch performs the following steps to refactor the code:
1. Removes the split bitmap for fixed and overflow dev numbers.
2. Pre-allocates the non-legacy major number range during driver
   initialization, choosen for simplicity.
3. Add new define (RDMA_MAX_PORTS) that is shared between all drivers.
   This is the maximum total number of ports on all struct ib_devices.
4. Set RDMA_MAX_PORTS to 1024.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-10 22:00:32 -07:00
Huy Nguyen
008b656f42 IB/core: Remove the locking for character device bitmaps
Remove the locks that protect character device bitmaps of
uverbs, umad and issm.

The character device bitmaps are accessed in "client->add" and
"client->remove" calls from ib_register_device and ib_unregister_device
respectively. These calls are already protected by the "device_mutex"
mutex. Thus, the spinlocks are not needed.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-10 22:00:32 -07:00
Linus Torvalds
44596f8682 Fourth pull request for 4.15-rc
- One line fix to mlx4 error flow (same as mlx5 fix in last pull request,
   just in the mlx4 driver)
 - Fix a race condition in the IPoIB driver.  This patch is larger than
   just a one line fix, but resolves a race condition in a fairly
   straight forward manner
 - Fix a locking issue in the RDMA netlink code.  This patch is also
   larger than I would like for a late -rc.  It has, however, had a week
   to bake in the rdma tree prior to this pull request
 - One line fix to fix granting remote machine access to memory that they
   don't need and shouldn't have
 - One line fix to correct the fact that our sgid/dgid pair is swapped
   from what you would expect when receiving an incoming connection
   request
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaU+ZkAAoJELgmozMOVy/dLw8P/1f27k9c7Bg91VfuyQeIcSxA
 kyRDdzlkRzuI/6QJ4ErK+IkOH8ADG6UGmQa+fOv1dxG8do+YwVflcY7gEgjJA7fP
 k0oPuGjiq8wrEWZrFGinln38ou0KALYd4F2C32unVYrsIohQLHSr1D6Ttw0W5FA6
 NQG4nVn9FzmilgjqtkW2zOGKw4jdAn57J47tUp49KufuPBTUcxjmZCdaV5AmiuzN
 5JpZUieL49Zoc18pcm1OreqDPZcj5LV1XquDNV+AZgU9+uGKoIb932k6hQjBRuml
 FSePxpPjdN8zX/KVaa4HQHX4U4uMBp0HcRHYME1bDsKwTh/d9xKM/yTPzzCtJz+r
 wmGJ9TPr2nq8blJJq17nSXbaJ4LmzlScCwork3LomdZJi880JwWJlvjFG3M/Yir9
 HvS2zIOUJm+xZBNCDVEayYcBMkXew5XjxETtDwOvfYX8FM419LLk1WOp2y/4LKDD
 hIR8QYkZMl37lMYqWZUghNjR7Rov6jdd30KDiCGdOAO/qszlNyTSL+icWyzc1t/X
 VT4ai7vc0RTicPWwb8H8o8/dQNj8Ed8w5NnMq3hjen+KrTKShkZTMuW+or/E9jZN
 ha9jIzSPLRfOvX6mZRrQVe6hiY3fOWMZXdw7gtehUy2hX7LCSwwbn2v6FcsDxyMQ
 UW6ZVG3ccP9YSY+tBWKg
 =kUnv
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Doug Ledford:

 - One line fix to mlx4 error flow (same as mlx5 fix in last pull
   request, just in the mlx4 driver)

 - Fix a race condition in the IPoIB driver. This patch is larger than
   just a one line fix, but resolves a race condition in a fairly
   straight forward manner

 - Fix a locking issue in the RDMA netlink code. This patch is also
   larger than I would like for a late -rc. It has, however, had a week
   to bake in the rdma tree prior to this pull request

 - One line fix to fix granting remote machine access to memory that
   they don't need and shouldn't have

 - One line fix to correct the fact that our sgid/dgid pair is swapped
   from what you would expect when receiving an incoming connection
   request

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  IB/srpt: Fix ACL lookup during login
  IB/srpt: Disable RDMA access by the initiator
  RDMA/netlink: Fix locking around __ib_get_device_by_index
  IB/ipoib: Fix race condition in neigh creation
  IB/mlx4: Fix mlx4_ib_alloc_mr error flow
2018-01-08 16:17:31 -08:00
Doug Ledford
f8457d5832 Merge branch 'bart-srpt-for-next' into k.o/wip/dl-for-next
Merging in 12 patch series from Bart that required changes in the
current for-rc branch in order to apply cleanly.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-08 16:06:20 -05:00
Daniel Jurgens
32f69e4be2 {net, IB}/mlx5: Manage port association for multiport RoCE
When mlx5_ib_add is called determine if the mlx5 core device being
added is capable of dual port RoCE operation. If it is, determine
whether it is a master device or a slave device using the
num_vhca_ports and affiliate_nic_vport_criteria capabilities.

If the device is a slave, attempt to find a master device to affiliate it
with. Devices that can be affiliated will share a system image guid. If
none are found place it on a list of unaffiliated ports. If a master is
found bind the port to it by configuring the port affiliation in the NIC
vport context.

Similarly when mlx5_ib_remove is called determine the port type. If it's
a slave port, unaffiliate it from the master device, otherwise just
remove it from the unaffiliated port list.

The IB device is registered as a multiport device, even if a 2nd port is
not available for affiliation. When the 2nd port is affiliated later the
GID cache must be refreshed in order to get the default GIDs for the 2nd
port in the cache. Export roce_rescan_device to provide a mechanism to
refresh the cache after a new port is bound.

In a multiport configuration all IB object (QP, MR, PD, etc) related
commands should flow through the master mlx5_core_dev, other commands
must be sent to the slave port mlx5_core_mdev, an interface is provide
to get the correct mdev for non IB object commands.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:22 -07:00
Daniel Jurgens
908d6460b3 IB/core: Change roce_rescan_device to return void
It always returns 0. Change return type to void.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:21 -07:00
Hans Westgaard Ry
e2dda36855 RDMA/core: Add encode/decode FDR/EDR rates
The cases for FDR/EDR signalling speed, were missing in
ib_rate_to_mult and mult_to_ib_rate giving wrong return values when
drivers convert static rate to/from inter-packet-delay.

Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-05 13:50:21 -05:00
Bart Van Assche
02ee9da347 IB/core: Fix two kernel warnings triggered by rxe registration
Eliminate the WARN_ONs that create following two warnings when
registering an rxe device:

WARNING: CPU: 2 PID: 1005 at drivers/infiniband/core/device.c:449 ib_register_device+0x591/0x640 [ib_core]
CPU: 2 PID: 1005 Comm: run_tests Not tainted 4.15.0-rc4-dbg+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
RIP: 0010:ib_register_device+0x591/0x640 [ib_core]
Call Trace:
 rxe_register_device+0x3c6/0x470 [rdma_rxe]
 rxe_add+0x543/0x5e0 [rdma_rxe]
 rxe_net_add+0x37/0xb0 [rdma_rxe]
 rxe_param_set_add+0x5a/0x120 [rdma_rxe]
 param_attr_store+0x5e/0xc0
 module_attr_store+0x19/0x30
 sysfs_kf_write+0x3d/0x50
 kernfs_fop_write+0x116/0x1a0
 __vfs_write+0x23/0x120
 vfs_write+0xbe/0x1b0
 SyS_write+0x44/0xa0
 entry_SYSCALL_64_fastpath+0x23/0x9a

WARNING: CPU: 2 PID: 1005 at drivers/infiniband/core/sysfs.c:1279 ib_device_register_sysfs+0x11d/0x160 [ib_core]
CPU: 2 PID: 1005 Comm: run_tests Tainted: G        W        4.15.0-rc4-dbg+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
RIP: 0010:ib_device_register_sysfs+0x11d/0x160 [ib_core]
Call Trace:
 ib_register_device+0x3f7/0x640 [ib_core]
 rxe_register_device+0x3c6/0x470 [rdma_rxe]
 rxe_add+0x543/0x5e0 [rdma_rxe]
 rxe_net_add+0x37/0xb0 [rdma_rxe]
 rxe_param_set_add+0x5a/0x120 [rdma_rxe]
 param_attr_store+0x5e/0xc0
 module_attr_store+0x19/0x30
 sysfs_kf_write+0x3d/0x50
 kernfs_fop_write+0x116/0x1a0
 __vfs_write+0x23/0x120
 vfs_write+0xbe/0x1b0
 SyS_write+0x44/0xa0
 entry_SYSCALL_64_fastpath+0x23/0x9a

The code should accept either a parent pointer or a fully specified DMA
specification without producing warnings.

Fixes: 99db949403 ("IB/core: Remove ib_device.dma_device")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: stable@vger.kernel.org # v4.11
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-03 17:26:59 -07:00
Leon Romanovsky
e48e5e198f RDMA/cma: Mark end of CMA ID messages
The commit 1a1c116f3d ("RDMA/netlink: Simplify the put_msg and put_attr")
removes nlmsg_len calculation in ibnl_put_attr causing netlink messages and
caused to miss source and destination addresses.

Fixes: 1a1c116f3d ("RDMA/netlink: Simplify the put_msg and put_attr")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-02 14:12:47 -07:00
Leon Romanovsky
f8978bd95c RDMA/netlink: Fix locking around __ib_get_device_by_index
Holding locks is mandatory when calling __ib_device_get_by_index,
otherwise there are races during the list iteration with device removal.

Since the locks are static to device.c, __ib_device_get_by_index can
never be called correctly by any user out side the file.

Make the function static and provide a safe function that gets the
correct locks and returns a kref'd pointer. Fix all callers.

Fixes: e5c9469efc ("RDMA/netlink: Add nldev device doit implementation")
Fixes: c3f66f7b00 ("RDMA/netlink: Implement nldev port doit callback")
Fixes: 7d02f605f0 ("RDMA/netlink: Add nldev port dumpit implementation")
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-02 14:11:40 -07:00
Leon Romanovsky
c2409810c0 RDMA/nldev: Refactor setting the nldev handle to a common function
The NLDEV commands are using IB device indexes and names as a handle
for netlink communications. Put all relevant code into one function
to remove code duplication in followup patches.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-02 13:36:57 -07:00
Leon Romanovsky
924b8900a4 RDMA/core: Replace open-coded variant of put_device
There is an existing function to decrease reference counter
of the device, let's use it.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-02 13:36:57 -07:00
Leon Romanovsky
b823369b6f RDMA/netlink: Simplify code of autoload modules
The request_module() call is internally wrapped by CONFIG_MODULE,
so there is no need to check it in our RDMA code too.

Refactor to simplify the code.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-02 13:36:57 -07:00
Linus Torvalds
19286e4a7a Third pull request for 4.15-rc
- cxgb4 fix for an iser testing failure as debugged by Steve and Sagi.
   The problem was a driver bug in the handling of shutting down a QP.
 - Various vmw_pvrdma fixes for bogus WARN_ON, missed resource free on error
   unwind and a use after free bug
 - Improper congestion counter values on mlx5 when link aggregation is enabled
 - ipoib lockdep regression introduced in this merge window
 - hfi1 regression supporting the device in a VM introduced in a recent patch
 - Typo that breaks future uAPI compatibility in the verbs core
 - More SELinux related oops fixing
 - Fix an oops during error unwind in mlx5
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJaRIC/AAoJEDht9xV+IJsaJfQP/1Z97/kDlJGIJQ4vBJ52xdHV
 LfRdmCBqU5nrAihEBpFLRc2S+kaSJbYAY48tRn28Jx6s9dmSvU6v2J2IqhmnM6p6
 ruWLR0Yqjg+xHcw+eaEoscJjRw+jDUEeVOgfbYc0HViWwvMNTrBB32HpAV48HuAl
 aCbM/qrQYXdYuJBImM4glERIpjlvYKoxv4D9xCJhJRRQvTnKOymHzZpKbqNujWxl
 dzCmZeOrw+HVxNW9MHHtUxClBoLNnykfRVKzMcdDjsqJ+Fdo2bY3ksgMvgiatRwY
 NxGfixhouhOz9vjN/ljpWXxTV5TTm6Nrib8XcHuOWjcYn/AFwJMMRsM+1w1AuCKs
 Zviq7QVApZzYuvHw1ewupRGvDX+P13sufD5sbc6cfVUT3w6ZX0Clpspl4++JN4ER
 WvBZikozaviL3w9ir0drlZ6k9BDnjQ6P7wZcBjDZC/j0zXKM65rISZrTsK7TeiTk
 lBNdLCkwZhO0dvafCNwA910tTaXEPhqqAh8Okob2A5U5lUAewd0AEHJusL/iCmSl
 uXnnxu8ik61QzOqwneEHSyVMkOSLEC+kk13fiFAq/LjPUSm9N/MihZd4JNxwSa6W
 4Rah7IKdh9F6qEnaKLPEfHxPhfghhb7O51zCA8mwA/JNCneqc4Gqi0U2JXkuloml
 395aK2aZSShIkZvIwbI8
 =IkGi
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "This is the next batch of for-rc patches from RDMA. It includes the
  fix for the ipoib regression I mentioned last time, and the result of
  a fairly major debugging effort to get iser working reliably on cxgb4
  hardware - it turns out the cxgb4 driver was not handling QP error
  flushing properly causing iser to fail.

   - cxgb4 fix for an iser testing failure as debugged by Steve and
     Sagi. The problem was a driver bug in the handling of shutting down
     a QP.

   - Various vmw_pvrdma fixes for bogus WARN_ON, missed resource free on
     error unwind and a use after free bug

   - Improper congestion counter values on mlx5 when link aggregation is
     enabled

   - ipoib lockdep regression introduced in this merge window

   - hfi1 regression supporting the device in a VM introduced in a
     recent patch

   - Typo that breaks future uAPI compatibility in the verbs core

   - More SELinux related oops fixing

   - Fix an oops during error unwind in mlx5"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  IB/mlx5: Fix mlx5_ib_alloc_mr error flow
  IB/core: Verify that QP is security enabled in create and destroy
  IB/uverbs: Fix command checking as part of ib_uverbs_ex_modify_qp()
  IB/mlx5: Serialize access to the VMA list
  IB/hfi: Only read capability registers if the capability exists
  IB/ipoib: Fix lockdep issue found on ipoib_ib_dev_heavy_flush
  IB/mlx5: Fix congestion counters in LAG mode
  RDMA/vmw_pvrdma: Avoid use after free due to QP/CQ/SRQ destroy
  RDMA/vmw_pvrdma: Use refcount_dec_and_test to avoid warning
  RDMA/vmw_pvrdma: Call ib_umem_release on destroy QP path
  iw_cxgb4: when flushing, complete all wrs in a chain
  iw_cxgb4: reflect the original WR opcode in drain cqes
  iw_cxgb4: Only validate the MSN for successful completions
2017-12-28 23:06:01 -08:00
Jason Gunthorpe
76a895d9e1 Merge branch 'from-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
Patches for 4.16 that are dependent on patches sent to 4.15-rc.

These are small clean ups for the vmw_pvrdma and i40iw drivers.

* 'from-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git:
  RDMA/vmw_pvrdma: Remove usage of BIT() from UAPI header
  RDMA/vmw_pvrdma: Use refcount_t instead of atomic_t
  RDMA/vmw_pvrdma: Use more specific sizeof in kcalloc
  RDMA/vmw_pvrdma: Clarify QP and CQ is_kernel logic
  RDMA/vmw_pvrdma: Add UAR SRQ macros in ABI header file
  i40iw: Change accelerated flag to bool
2017-12-27 21:50:46 -07:00
Randy Dunlap
efac5ac052 infiniband: drop unknown function from core_priv.h
Delete ibnl_chk_listeners() and its kernel-doc comments from the
core_priv.h header file.  There is no such function.

Fixes: 233c195583 ("RDMA/netlink: Reduce exposure of RDMA netlink functions")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-27 21:27:41 -07:00
Majd Dibbiny
727b7e9a65 IB/core: Make sure that PSN does not overflow
The rq/sq->psn is 24 bits as defined in the IB spec, therefore we mask
out the 8 most significant bits to avoid overflow in modify_qp.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-27 15:40:35 -07:00
Moni Shoua
4a50881bba IB/core: Verify that QP is security enabled in create and destroy
The XRC target QP create flow sets up qp_sec only if there is an IB link with
LSM security enabled. However, several other related uAPI entry points blindly
follow the qp_sec NULL pointer, resulting in a possible oops.

Check for NULL before using qp_sec.

Cc: <stable@vger.kernel.org> # v4.12
Fixes: d291f1a652 ("IB/core: Enforce PKey security on QPs")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-27 15:24:41 -07:00
Moni Shoua
05d14e7b0c IB/uverbs: Fix command checking as part of ib_uverbs_ex_modify_qp()
If the input command length is larger than the kernel supports an error should
be returned in case the unsupported bytes are not cleared, instead of the
other way aroudn. This matches what all other callers of ib_is_udata_cleared
do and will avoid user ABI problems in the future.

Cc: <stable@vger.kernel.org> # v4.10
Fixes: 189aba99e7 ("IB/uverbs: Extend modify_qp and support packet pacing")
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-27 15:24:41 -07:00
Don Hiatt
2ef7f2e270 IB/core: Use rdma_cap_opa_mad to check for OPA
Use rdma_cap_opa_mad() to check for OPA to promote code reuse.

Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-22 13:46:11 -07:00
Venkata Sandeep Dhanalakota
af808ece5c IB/SA: Check dlid before SA agent queries for ClassPortInfo
SA queries SM for class port info when there is a LID_CHANGE event.

When a base lid is configured before fm is started ie when smlid is
not yet assigned, SA handles the LID_CHANGE event and tries query SM
with lid 0. This will cause an hang.

[ 1106.958820] INFO: task kworker/2:0:23 blocked for more than 120 seconds.
[ 1106.965082] Tainted: G O 4.12.0+ #1
[ 1106.969602] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
 this message.
[ 1106.977227] kworker/2:0 D 0 23 2 0x00000000
[ 1106.977250] Workqueue: infiniband update_ib_cpi [ib_core]
[ 1106.977261] Call Trace:
[ 1106.977273] __schedule+0x28e/0x860
[ 1106.977285] schedule+0x36/0x80
[ 1106.977298] schedule_timeout+0x1a3/0x2e0
[ 1106.977310] ? radix_tree_iter_tag_clear+0x1b/0x20
[ 1106.977322] ? idr_alloc+0x64/0x90
[ 1106.977334] wait_for_completion+0xe3/0x140
[ 1106.977347] ? wake_up_q+0x80/0x80
[ 1106.977369] update_ib_cpi+0x163/0x210 [ib_core]
[ 1106.977381] process_one_work+0x147/0x370
[ 1106.977394] worker_thread+0x4a/0x390
[ 1106.977406] kthread+0x109/0x140
[ 1106.977418] ? process_one_work+0x370/0x370
[ 1106.977430] ? kthread_park+0x60/0x60
[ 1106.977443] ret_from_fork+0x22/0x30

Always ensure a proper smlid is assigned before querying SM for cpi.

Fixes: ee1c60b1bf ("IB/SA: Modify SA to implicitly cache Class Port info")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-22 13:33:30 -07:00
Pravin Shedge
72c7fe90ee drivers: infiniband: remove duplicate includes
These duplicate includes have been found with scripts/checkincludes.pl but
they have been removed manually to avoid removing false positives.

Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-22 09:39:35 -07:00
Parav Pandit
16c72e4028 IB/cm: Refactor to avoid setting path record software only fields
When path ah_attr initialization from path record
fails, ib_cm_send_rej() uses av.ah_attr fields to send out reject
message. In such cases initialization of path record software fields
is not needed. Code is simplified for same.

Additionally in current code in cm_req_handler, when ib_get_cached_gid
fails for a given sgid_index of the GID of the GRH of the incoming CM MAD,
error code 12 is sent. This error code refers to primary GID in incoming
CM REQ and not for the GID in in MAD packet.
Therefore code is refactored to send code 5 (unsupported request) for such
error.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 15:37:12 -07:00
Parav Pandit
f6bdb14267 IB/{core, umad, cm}: Rename ib_init_ah_from_wc to ib_init_ah_attr_from_wc
Currently ib_init_ah_from_wc initializes address handle attributes and
not the address handle object itself.
To avoid confusion between ah_attr vs ah, ib_init_ah_from_wc is
renamed to ib_init_ah_attr_from_wc to reflect that its initialzes
ah_attr.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 15:37:11 -07:00
Parav Pandit
4ad6a0245e IB/{core, cm, cma, ipoib}: Rename ib_init_ah_from_path to ib_init_ah_attr_from_path
Since ib_init_ah_from_path initializes the address handle attribute, it is
renamed to reflect so.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 15:37:11 -07:00
Parav Pandit
33f93e1ebc IB/cm: Fix sleeping while spin lock is held
In case of LAP are used for RoCE, it can lead to a problem of sleeping a
context while spin lock is held in below flow.

cm_lap_handler
	->spin_lock
	-> <..switch_case..>
	-> cm_init_av_for_response
		-> ib_init_ah_from_wc
			-> rdma_addr_find_l2_eth_by_grh
				wait_for_completion()

Therefore ah attribute initialization is done for incoming lap requests
outside of the lock context.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 15:37:11 -07:00
Parav Pandit
5cf3968afc IB/cm: Handle address handle attribute init error
cm_init_av_by_path depends on ib_init_ah_from_path to initialize ah
attribute and ib_init_ah_from_path() can fail, such error should not
be ignored.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 15:37:11 -07:00
Parav Pandit
0c4386ec77 IB/{cm, umad}: Handle av init error
cm_init_av_for_response depends on ib_init_ah_from_wc() whose return
status is ignored.
ib_init_ah_from_wc() can fail and its return status should be handled as
done in this patch.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 15:37:10 -07:00
Parav Pandit
dbb12562f7 IB/{core, ipoib}: Simplify ib_find_gid to search only for IB link layer
Currently there are no users of ib_find_gid for RoCE transport. It is
only used by IPoIB.
Therefore its simplified to ignore RoCE ports and GID type check which
was previously done for every port.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 15:37:10 -07:00