Commit Graph

12884 Commits

Author SHA1 Message Date
Guoqing Jiang
e6ab8cf50f RDMA/rtrs: Introduce rtrs_post_send
Since the three functions share the similar logic, let's introduce one
common function for it.

Link: https://lore.kernel.org/r/20201023074353.21946-12-jinpu.wang@cloud.ionos.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28 13:17:41 -03:00
Guoqing Jiang
ffea6ad133 RDMA/rtrs-srv: Kill rtrs_srv_change_state_get_old
This function isn't needed since no caller checks the old_state of sess.

Link: https://lore.kernel.org/r/20201023074353.21946-11-jinpu.wang@cloud.ionos.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28 13:17:41 -03:00
Gioh Kim
c3b16b67d1 RDMA/rtrs-clt: Remove duplicated code
process_info_rsp checks that sg_cnt is zero twice.

Link: https://lore.kernel.org/r/20201023074353.21946-10-jinpu.wang@cloud.ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28 13:17:41 -03:00
Gioh Kim
16101b60e7 RDMA/rtrs-clt: Remove duplicated switch-case handling for CM error events
The events returning the same error value are put together.

Link: https://lore.kernel.org/r/20201023074353.21946-9-jinpu.wang@cloud.ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28 13:17:40 -03:00
Gioh Kim
8bd372ace3 RDMA/rtrs: Remove unnecessary argument dir of rtrs_iu_free
The direction of DMA operation is already in the rtrs_iu

Link: https://lore.kernel.org/r/20201023074353.21946-8-jinpu.wang@cloud.ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28 13:17:40 -03:00
Guoqing Jiang
3c8483f5a4 RDMA/rtrs-srv: Fix typo
It should mean region here.

Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20201023074353.21946-7-jinpu.wang@cloud.ionos.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28 13:17:40 -03:00
Guoqing Jiang
d715ff8acb RDMA/rtrs-srv: Don't guard the whole __alloc_srv with srv_mutex
The purpose of srv_mutex is to protect srv_list as in put_srv, so no need
to hold it when allocate memory for srv since it could be time consuming.

Otherwise if one machine has limited memory, rsrv_close_work could be
blocked for a longer time due to the mutex is held by get_or_create_srv
since it can't get memory in time.

  INFO: task kworker/1:1:27478 blocked for more than 120 seconds.
        Tainted: G           O    4.14.171-1-storage #4.14.171-1.3~deb9
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  kworker/1:1     D    0 27478      2 0x80000000
  Workqueue: rtrs_server_wq rtrs_srv_close_work [rtrs_server]
  Call Trace:
   ? __schedule+0x38c/0x7e0
   schedule+0x32/0x80
   schedule_preempt_disabled+0xa/0x10
   __mutex_lock.isra.2+0x25e/0x4d0
   ? put_srv+0x44/0x100 [rtrs_server]
   put_srv+0x44/0x100 [rtrs_server]
   rtrs_srv_close_work+0x16c/0x280 [rtrs_server]
   process_one_work+0x1c5/0x3c0
   worker_thread+0x47/0x3e0
   kthread+0xfc/0x130
   ? trace_event_raw_event_workqueue_execute_start+0xa0/0xa0
   ? kthread_create_on_node+0x70/0x70
   ret_from_fork+0x1f/0x30

Let's move all the logics from __find_srv_and_get and __alloc_srv to
get_or_create_srv, and remove the two functions. Then it should be safe
for multiple processes to access the same srv since it is protected with
srv_mutex.

And since we don't want to allocate chunks with srv_mutex held, let's
check the srv->refcount after get srv because the chunks could not be
allocated yet.

Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20201023074353.21946-6-jinpu.wang@cloud.ionos.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28 13:17:40 -03:00
Gioh Kim
f553e7601d RDMA/rtrs-clt: Missing error from rtrs_rdma_conn_established
When rtrs_rdma_conn_established returns error (non-zero value), the error
value is stored in con->cm_err and it cannot trigger
rtrs_rdma_error_recovery. Finally the error of rtrs_rdma_con_established
will be forgot.

Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/20201023074353.21946-5-jinpu.wang@cloud.ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28 13:17:40 -03:00
Jack Wang
fcf2959da6 RDMA/rtrs-clt: Avoid run destroy_con_cq_qp/create_con_cq_qp in parallel
It could happen two kworkers race with each other:

        CPU0                             CPU1
    addr_resolver kworker           reconnect kworker
    rtrs_clt_rdma_cm_handler
    rtrs_rdma_addr_resolved
    create_con_cq_qp: s.dev_ref++
    "s.dev_ref is 1"
                                    wait in create_cm fails with TIMEOUT
                                    destroy_con_cq_qp: --s.dev_ref
                                    "s.dev_ref is 0"
                                    destroy_con_cq_qp: sess->s.dev = NULL
     rtrs_cq_qp_create -> create_qp(con, sess->dev->ib_pd...)
    sess->dev is NULL, panic.

To fix the problem using mutex to serialize create_con_cq_qp and
destroy_con_cq_qp.

Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/20201023074353.21946-4-jinpu.wang@cloud.ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28 13:17:39 -03:00
Jack Wang
73385fdbc4 RDMA/rtrs-clt: Remove outdated comment in create_con_cq_qp
As run destroy_con_cq_qp many times doesn't work, remove the comments.

Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/20201023074353.21946-3-jinpu.wang@cloud.ionos.com
Suggested-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28 13:17:39 -03:00
Danil Kipnis
2b3062e4d9 RDMA/rtrs-clt: Remove destroy_con_cq_qp in case route resolving failed
We call destroy_con_cq_qp(con) in rtrs_rdma_addr_resolved() in case route
couldn't be resolved and then again in create_cm() because nothing
happens.

Don't call destroy_con_cq_qp from rtrs_rdma_addr_resolved, create_cm()
does the clean up already.

Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/20201023074353.21946-2-jinpu.wang@cloud.ionos.com
Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28 13:17:39 -03:00
Lang Cheng
aba457ca89 RDMA/hns: Support owner mode doorbell
The doorbell needs to store PI information into QPC, so the RoCEE should
wait for the results of storing, that is, it needs two bus operations to
complete a doorbell. When ROCEE is in SDI mode, multiple doorbells may be
interlocked because the RoCEE can only handle bus operations serially. So a
flag to mark if HIP09 is working in SDI mode is added. When the SDI flag is
set, the ROCEE will ignore the PI information of the doorbell, continue to
fetch wqe and verify its validity by it's owner_bit.

Link: https://lore.kernel.org/r/1603195493-22741-1-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28 13:13:41 -03:00
Max Gurtovoy
dae7a75f1f IB/isert: add module param to set sg_tablesize for IO cmd
Currently, iser target support max IO size of 16MiB by default. For some
adapters, allocating this amount of resources might reduce the total
number of possible connections that can be created. For those adapters,
it's preferred to reduce the max IO size to be able to create more
connections. Since there is no handshake procedure for max IO size in iser
protocol, set the default max IO size to 1MiB and add a module parameter
for enabling the option to control it for suitable adapters.

Fixes: 317000b926 ("IB/isert: allocate RW ctxs according to max IO size")
Link: https://lore.kernel.org/r/20201019094628.17202-1-mgurtovoy@nvidia.com
Reported-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28 13:00:49 -03:00
Bob Pearson
bb3ab2979f RDMA/rxe: Compute PSN windows correctly
The code which limited the number of unacknowledged PSNs was incorrect.
The PSNs are limited to 24 bits and wrap back to zero from 0x00ffffff.
The test was computing a 32 bit value which wraps at 32 bits so that
qp->req.psn can appear smaller than the limit when it is actually larger.

Replace '>' test with psn_compare which is used for other PSN comparisons
and correctly handles the 24 bit size.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20201013170741.3590-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28 10:34:20 -03:00
Jing Xiangfeng
5333499c60 RDMA/core: Fix error return in _ib_modify_qp()
Fix to return error code PTR_ERR() from the error handling case instead of
0.

Fixes: 51aab12631 ("RDMA/core: Get xmit slave for LAG")
Link: https://lore.kernel.org/r/20201016075845.129562-1-jingxiangfeng@huawei.com
Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28 10:32:22 -03:00
Alok Prasad
a2267f8a52 RDMA/qedr: Fix memory leak in iWARP CM
Fixes memory leak in iWARP CM

Fixes: e411e0587e ("RDMA/qedr: Add iWARP connection management functions")
Link: https://lore.kernel.org/r/20201021115008.28138-1-palok@marvell.com
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28 09:45:25 -03:00
Jason Gunthorpe
071ba4cc55 RDMA: Add rdma_connect_locked()
There are two flows for handling RDMA_CM_EVENT_ROUTE_RESOLVED, either the
handler triggers a completion and another thread does rdma_connect() or
the handler directly calls rdma_connect().

In all cases rdma_connect() needs to hold the handler_mutex, but when
handler's are invoked this is already held by the core code. This causes
ULPs using the 2nd method to deadlock.

Provide a rdma_connect_locked() and have all ULPs call it from their
handlers.

Link: https://lore.kernel.org/r/0-v2-53c22d5c1405+33-rdma_connect_locking_jgg@nvidia.com
Reported-and-tested-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Fixes: 2a7cec5381 ("RDMA/cma: Fix locking for the RDMA_CM_CONNECT state")
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28 09:14:49 -03:00
Selvin Xavier
b898d5c50c RDMA/bnxt_re: Fix entry size during SRQ create
Only static WQE is supported for SRQ. So always use the max supported SGEs
while calculating SRQ entry size.

Fixes: 2bb3c32c5c ("RDMA/bnxt_re: Change wr posting logic to accommodate variable wqes")
Link: https://lore.kernel.org/r/1602569752-12745-1-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-27 14:12:26 -03:00
Bob Pearson
eeed696507 RDMA/rxe: Remove unused RXE_MR_TYPE_FMR
This is a left over from the past. It is no longer used.

Link: https://lore.kernel.org/r/20201009165112.271143-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 20:00:27 -03:00
Joe Perches
3c6bff3cf9 RDMA: Convert sysfs kobject * show functions to use sysfs_emit()
Done with cocci script:

@@
identifier k_show;
identifier arg1, arg2, arg3;
@@
ssize_t k_show(struct kobject *
-	arg1
+	kobj
	, struct kobj_attribute *
-	arg2
+	attr
	, char *
-	arg3
+	buf
	)
{
	...
(
-	arg1
+	kobj
|
-	arg2
+	attr
|
-	arg3
+	buf
)
	...
}

@@
identifier k_show;
identifier kobj, attr, buf;
@@

ssize_t k_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf)
{
	<...
	return
-	sprintf(buf,
+	sysfs_emit(buf,
	...);
	...>
}

@@
identifier k_show;
identifier kobj, attr, buf;
@@

ssize_t k_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf)
{
	<...
	return
-	snprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
}

@@
identifier k_show;
identifier kobj, attr, buf;
@@

ssize_t k_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf)
{
	<...
	return
-	scnprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
}

@@
identifier k_show;
identifier kobj, attr, buf;
expression chr;
@@

ssize_t k_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf)
{
	<...
	return
-	strcpy(buf, chr);
+	sysfs_emit(buf, chr);
	...>
}

@@
identifier k_show;
identifier kobj, attr, buf;
identifier len;
@@

ssize_t k_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf)
{
	<...
	len =
-	sprintf(buf,
+	sysfs_emit(buf,
	...);
	...>
	return len;
}

@@
identifier k_show;
identifier kobj, attr, buf;
identifier len;
@@

ssize_t k_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf)
{
	<...
	len =
-	snprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
	return len;
}

@@
identifier k_show;
identifier kobj, attr, buf;
identifier len;
@@

ssize_t k_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf)
{
	<...
	len =
-	scnprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
	return len;
}

@@
identifier k_show;
identifier kobj, attr, buf;
identifier len;
@@

ssize_t k_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf)
{
	<...
-	len += scnprintf(buf + len, PAGE_SIZE - len,
+	len += sysfs_emit_at(buf, len,
	...);
	...>
	return len;
}

@@
identifier k_show;
identifier kobj, attr, buf;
expression chr;
@@

ssize_t k_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf)
{
	...
-	strcpy(buf, chr);
-	return strlen(buf);
+	return sysfs_emit(buf, chr);
}

Link: https://lore.kernel.org/r/7761c1efaebb96c432c85171d58405c25a824ccd.1602122880.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 20:00:03 -03:00
Joe Perches
1c7fd72687 RDMA: Convert sysfs device * show functions to use sysfs_emit()
Done with cocci script:

@@
identifier d_show;
identifier dev, attr, buf;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	return
-	sprintf(buf,
+	sysfs_emit(buf,
	...);
	...>
}

@@
identifier d_show;
identifier dev, attr, buf;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	return
-	snprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
}

@@
identifier d_show;
identifier dev, attr, buf;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	return
-	scnprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
}

@@
identifier d_show;
identifier dev, attr, buf;
expression chr;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	return
-	strcpy(buf, chr);
+	sysfs_emit(buf, chr);
	...>
}

@@
identifier d_show;
identifier dev, attr, buf;
identifier len;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	len =
-	sprintf(buf,
+	sysfs_emit(buf,
	...);
	...>
	return len;
}

@@
identifier d_show;
identifier dev, attr, buf;
identifier len;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	len =
-	snprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
	return len;
}

@@
identifier d_show;
identifier dev, attr, buf;
identifier len;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	len =
-	scnprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
	return len;
}

@@
identifier d_show;
identifier dev, attr, buf;
identifier len;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
-	len += scnprintf(buf + len, PAGE_SIZE - len,
+	len += sysfs_emit_at(buf, len,
	...);
	...>
	return len;
}

@@
identifier d_show;
identifier dev, attr, buf;
expression chr;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	...
-	strcpy(buf, chr);
-	return strlen(buf);
+	return sysfs_emit(buf, chr);
}

Link: https://lore.kernel.org/r/7f406fa8e3aa2552c022bec680f621e38d1fe414.1602122879.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:53:21 -03:00
Gal Pressman
7d66a71488 RDMA/uverbs: Fix false error in query gid IOCTL
Some drivers (such as EFA) have a GID table, but aren't IB/RoCE devices.
Remove the unnecessary rdma_ib_or_roce() check.

This fixes rdma-core failures for EFA when it uses the new ioctl interface
for querying the GID table.

Fixes: 9f85cbe50a ("RDMA/uverbs: Expose the new GID query API to user space")
Link: https://lore.kernel.org/r/20201026082621.32463-1-galpress@amazon.com
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:29:26 -03:00
Jason Gunthorpe
676a80adba RDMA: Remove AH from uverbs_cmd_mask
Drivers that need a uverbs AH should instead set the create_user_ah() op
similar to reg_user_mr(). MODIFY_AH and QUERY_AH cmds were never
implemented so are just deleted.

Link: https://lore.kernel.org/r/11-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:28:00 -03:00
Jason Gunthorpe
628c02bf38 RDMA: Remove uverbs cmds from drivers that don't use them
Allowing userspace to invoke these commands is probably going to crash
these drivers as they are not tested and not expecting to use them on a
user object.

For example pvrdma touches cq->ring_state which is not initialized for
user QPs.

These commands are effected:

- IB_USER_VERBS_CMD_REQ_NOTIFY_CQ is ibv_cmd_req_notify_cq() in
  rdma-core, only hfi1, ipath and rxe calls it.

- IB_USER_VERBS_CMD_POLL_CQ is ibv_cmd_poll_cq() in rdma-core, only
  ipath and hfi1 calls it.

- IB_USER_VERBS_CMD_POST_SEND/RECV is ibv_cmd_post_send/recv() in
  rdma-core, only ipath and hfi1 call them.

- IB_USER_VERBS_CMD_POST_SRQ_RECV is ibv_cmd_post_srq_recv() in
  rdma-core, only ipath and hfi1 calls it.

- IB_USER_VERBS_CMD_PEEK_CQ isn't even implemented anywhere

- IB_USER_VERBS_CMD_CREATE/DESTROY_AH is ibv_cmd_create/destroy_ah() in
  rdma-core, only bnxt_re, efa, hfi1, ipath, mlx5, orcrdma, and rxe call
  it.

Link: https://lore.kernel.org/r/10-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:28:00 -03:00
Jason Gunthorpe
bd2a40cc24 RDMA/core Remove uverbs_ex_cmd_mask
No driver sets it, and the core code sets a maximum mask, simply remove
it.

Disabled operations are now handled either by having a NULL ops pointer,
or by having the common driver callbacks check for unsupported extended
attributes.

Link: https://lore.kernel.org/r/9-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:59 -03:00
Jason Gunthorpe
1f11a7610e RDMA: Check create_flags during create_qp
Each driver should check that the QP attrs create_flags is supported.
Unfortuantely when create_flags was added to the QP attrs the drivers were
not updated. uverbs_ex_cmd_mask was used to block it - even though kernel
drivers use these flags too.

Check that flags is zero in all drivers that don't use it, remove
IB_USER_VERBS_EX_CMD_CREATE_QP from uverbs_ex_cmd_mask. Fix the error code
to be EOPNOTSUPP.

Link: https://lore.kernel.org/r/8-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:59 -03:00
Jason Gunthorpe
1c407cb5d7 RDMA: Check flags during create_cq
Each driver should check that the CQ attrs is supported. Unfortuantely
when flags was added to the CQ attrs the drivers were not updated,
uverbs_ex_cmd_mask was used to block it. This was missed when create CQ
was converted to ioctl, so non-zero flags could have been passed into
drivers.

Check that flags is zero in all drivers that don't use it, remove
IB_USER_VERBS_EX_CMD_CREATE_CQ from uverbs_ex_cmd_mask.

Fixes: 41b2a71fc8 ("IB/uverbs: Move ioctl path of create_cq and destroy_cq to a new file")
Link: https://lore.kernel.org/r/7-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:59 -03:00
Jason Gunthorpe
26e990badd RDMA: Check attr_mask during modify_qp
Each driver should check that it can support the provided attr_mask during
modify_qp. IB_USER_VERBS_EX_CMD_MODIFY_QP was being used to block
modify_qp_ex because the driver didn't check RATE_LIMIT.

Link: https://lore.kernel.org/r/6-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:58 -03:00
Jason Gunthorpe
652caba5b5 RDMA: Check srq_type during create_srq
uverbs was blocking srq_types the driver doesn't support based on the
CREATE_XSRQ cmd_mask. Fix all drivers to check for supported srq_types
during create_srq and move CREATE_XSRQ to the core code.

Link: https://lore.kernel.org/r/5-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:58 -03:00
Jason Gunthorpe
44ce37bc8b RDMA: Move more uverbs_cmd_mask settings to the core
These functions all depend on the driver providing a specific op:

- REREG_MR is rereg_user_mr(). bnxt_re set this without providing the op
- ATTACH/DEATCH_MCAST is attach_mcast()/detach_mcast(). usnic set this
  without providing the op
- OPEN_QP doesn't involve the driver but requires a XRCD. qedr provides
  xrcd but forgot to set it, usnic doesn't provide XRCD but set it anyhow.
- OPEN/CLOSE_XRCD are the ops alloc_xrcd()/dealloc_xrcd()
- CREATE_SRQ/DESTROY_SRQ are the ops create_srq()/destroy_srq()
- QUERY/MODIFY_SRQ is op query_srq()/modify_srq(). hns sets this but
  sometimes supplies a NULL op.
- RESIZE_CQ is op resize_cq(). bnxt_re sets this boes doesn't supply an op
- ALLOC/DEALLOC_MW is alloc_mw()/dealloc_mw(). cxgb4 provided an
  (now deleted) implementation but no userspace

All drivers were checked that no drivers provide the op without also
setting uverbs_cmd_mask so this should have no functional change.

Link: https://lore.kernel.org/r/4-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:58 -03:00
Jason Gunthorpe
c074bb1e30 RDMA: Remove elements in uverbs_cmd_mask that all drivers set
This is a step toward eliminating uverbs_cmd_mask. Preset this list in the
core code. Only the op reg_user_mr wasn't already being required from the
drivers.

Link: https://lore.kernel.org/r/3-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:57 -03:00
Jason Gunthorpe
b8e3130dd9 RDMA: Remove uverbs_ex_cmd_mask values that are linked to functions
Since a while now the uverbs layer checks if the driver implements a
function before allowing the ucmd to proceed. This largely obsoletes the
cmd_mask stuff, but there is some tricky bits in drivers preventing it
from being removed.

Remove the easy elements of uverbs_ex_cmd_mask by pre-setting them in the
core code. These are triggered soley based on the related ops function
pointer.

query_device_ex is not triggered based on an op, but all drivers already
implement something compatible with the extension, so enable it globally
too.

Link: https://lore.kernel.org/r/2-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:56 -03:00
Jason Gunthorpe
a5c29a262e RDMA/cxgb4: Remove MW support
This driver never enabled IB_USER_VERBS_CMD_ALLOC_MW so memory windows
were not usable from userspace. The kernel side was removed long ago. Drop
this dead code.

Fixes: feb7c1e38b ("IB: remove in-kernel support for memory windows")
Link: https://lore.kernel.org/r/1-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:44 -03:00
Kamal Heib
53839b51a7 RDMA/bnxt_re: Set queue pair state when being queried
The API for ib_query_qp requires the driver to set cur_qp_state on return,
add the missing set.

Fixes: 1ac5a40479 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Link: https://lore.kernel.org/r/20201021114952.38876-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:22:15 -03:00
Parav Pandit
fbdd0049d9 RDMA/mlx5: Fix devlink deadlock on net namespace deletion
When a mlx5 core devlink instance is reloaded in different net namespace,
its associated IB device is deleted and recreated.

Example sequence is:
$ ip netns add foo
$ devlink dev reload pci/0000:00:08.0 netns foo
$ ip netns del foo

mlx5 IB device needs to attach and detach the netdevice to it through the
netdev notifier chain during load and unload sequence.  A below call graph
of the unload flow.

cleanup_net()
   down_read(&pernet_ops_rwsem); <- first sem acquired
     ops_pre_exit_list()
       pre_exit()
         devlink_pernet_pre_exit()
           devlink_reload()
             mlx5_devlink_reload_down()
               mlx5_unload_one()
               [...]
                 mlx5_ib_remove()
                   mlx5_ib_unbind_slave_port()
                     mlx5_remove_netdev_notifier()
                       unregister_netdevice_notifier()
                         down_write(&pernet_ops_rwsem);<- recurrsive lock

Hence, when net namespace is deleted, mlx5 reload results in deadlock.

When deadlock occurs, devlink mutex is also held. This not only deadlocks
the mlx5 device under reload, but all the processes which attempt to
access unrelated devlink devices are deadlocked.

Hence, fix this by mlx5 ib driver to register for per net netdev notifier
instead of global one, which operats on the net namespace without holding
the pernet_ops_rwsem.

Fixes: 4383cfcc65 ("net/mlx5: Add devlink reload")
Link: https://lore.kernel.org/r/20201026134359.23150-1-parav@nvidia.com
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:18:19 -03:00
Bob Pearson
edebc8407b RDMA/rxe: Fix small problem in network_type patch
The patch referenced below has a typo that results in using the wrong L2
header size for outbound traffic. (V4 <-> V6).

It also breaks kernel-side RC traffic because they use AVs that use
RDMA_NETWORK_XXX enums instead of RXE_NETWORK_TYPE_XXX enums. Fix this by
transcoding between these enum types.

Fixes: e0d696d201 ("RDMA/rxe: Move the definitions for rxe_av.network_type to uAPI")
Link: https://lore.kernel.org/r/20201016211343.22906-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:12:17 -03:00
Linus Torvalds
a1e16bc7d5 RDMA 5.10 pull request
The typical set of driver updates across the subsystem:
 
  - Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma, hns,
    usnic, qib, qedr, cxgb4, hns, bnxt_re
 
  - Various rtrs fixes and updates
 
  - Bug fix for mlx4 CM emulation for virtualization scenarios where MRA
    wasn't working right
 
  - Use tracepoints instead of pr_debug in the CM code
 
  - Scrub the locking in ucma and cma to close more syzkaller bugs
 
  - Use tasklet_setup in the subsystem
 
  - Revert the idea that 'destroy' operations are not allowed to fail at
    the driver level. This proved unworkable from a HW perspective.
 
  - Revise how the umem API works so drivers make fewer mistakes using it
 
  - XRC support for qedr
 
  - Convert uverbs objects RWQ and MW to new the allocation scheme
 
  - Large queue entry sizes for hns
 
  - Use hmm_range_fault() for mlx5 On Demand Paging
 
  - uverbs APIs to inspect the GID table instead of sysfs
 
  - Move some of the RDMA code for building large page SGLs into
    lib/scatterlist
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl+J37MACgkQOG33FX4g
 mxrKfRAAnIecwdE8df0yvVU5k0Eg6qVjMy9MMHq4va9m7g6GpUcNNI0nIlOASxH2
 l+9vnUQS3ebgsPeECaDYzEr0hh/u53+xw2g4WV5ts/hE8KkQ6erruXb9kasCe8yi
 5QWJ9K36T3c03Cd3EeH6JVtytAxuH42ombfo9BkFLPVyfG/R2tsAzvm5pVi73lxk
 46wtU1Bqi4tsLhyCbifn1huNFGbHp08OIBPAIKPUKCA+iBRPaWS+Dpi+93h3g3Bp
 oJwDhL9CBCGcHM+rKWLzek3Dy87FnQn7R1wmTpUFwkK+4AH3U/XazivhX035w1vL
 YJyhakVU0kosHlX9hJTNKDHJGkt0YEV2mS8dxAuqilFBtdnrVszb5/MirvlzC310
 /b5xCPSEusv9UVZV0G4zbySVNA9knZ4YaRiR3VDVMLKl/pJgTOwEiHIIx+vs3ejk
 p8GRWa1SjXw5LfZEQcq39J689ljt6xjCTonyuBSv7vSQq5v8pjBxvHxiAe2FIa2a
 ZyZeSCYoSh0SwJQukO2VO7aprhHP3TcCJ/987+X03LQ8tV2VWPktHqm62YCaDcOl
 fgiQuQdPivRjDDkJgMfDWDGKfZeHoWLKl5XsJhWByt0lablVrsvc+8ylUl1UI7gI
 16hWB/Qtlhfwg10VdApn+aOFpIS+s5P4XIp8ik57MZO+VeJzpmE=
 =LKpl
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "A usual cycle for RDMA with a typical mix of driver and core subsystem
  updates:

   - Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma,
     hns, usnic, qib, qedr, cxgb4, hns, bnxt_re

   - Various rtrs fixes and updates

   - Bug fix for mlx4 CM emulation for virtualization scenarios where
     MRA wasn't working right

   - Use tracepoints instead of pr_debug in the CM code

   - Scrub the locking in ucma and cma to close more syzkaller bugs

   - Use tasklet_setup in the subsystem

   - Revert the idea that 'destroy' operations are not allowed to fail
     at the driver level. This proved unworkable from a HW perspective.

   - Revise how the umem API works so drivers make fewer mistakes using
     it

   - XRC support for qedr

   - Convert uverbs objects RWQ and MW to new the allocation scheme

   - Large queue entry sizes for hns

   - Use hmm_range_fault() for mlx5 On Demand Paging

   - uverbs APIs to inspect the GID table instead of sysfs

   - Move some of the RDMA code for building large page SGLs into
     lib/scatterlist"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (191 commits)
  RDMA/ucma: Fix use after free in destroy id flow
  RDMA/rxe: Handle skb_clone() failure in rxe_recv.c
  RDMA/rxe: Move the definitions for rxe_av.network_type to uAPI
  RDMA: Explicitly pass in the dma_device to ib_register_device
  lib/scatterlist: Do not limit max_segment to PAGE_ALIGNED values
  IB/mlx4: Convert rej_tmout radix-tree to XArray
  RDMA/rxe: Fix bug rejecting all multicast packets
  RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt()
  RDMA/rxe: Remove duplicate entries in struct rxe_mr
  IB/hfi,rdmavt,qib,opa_vnic: Update MAINTAINERS
  IB/rdmavt: Fix sizeof mismatch
  MAINTAINERS: CISCO VIC LOW LATENCY NIC DRIVER
  RDMA/bnxt_re: Fix sizeof mismatch for allocation of pbl_tbl.
  RDMA/bnxt_re: Use rdma_umem_for_each_dma_block()
  RDMA/umem: Move to allocate SG table from pages
  lib/scatterlist: Add support in dynamic allocation of SG table from pages
  tools/testing/scatterlist: Show errors in human readable form
  tools/testing/scatterlist: Rejuvenate bit-rotten test
  RDMA/ipoib: Set rtnl_link_ops for ipoib interfaces
  RDMA/uverbs: Expose the new GID query API to user space
  ...
2020-10-17 11:18:18 -07:00
Linus Torvalds
c4cf498dc0 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "155 patches.

  Subsystems affected by this patch series: mm (dax, debug, thp,
  readahead, page-poison, util, memory-hotplug, zram, cleanups), misc,
  core-kernel, get_maintainer, MAINTAINERS, lib, bitops, checkpatch,
  binfmt, ramfs, autofs, nilfs, rapidio, panic, relay, kgdb, ubsan,
  romfs, and fault-injection"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (155 commits)
  lib, uaccess: add failure injection to usercopy functions
  lib, include/linux: add usercopy failure capability
  ROMFS: support inode blocks calculation
  ubsan: introduce CONFIG_UBSAN_LOCAL_BOUNDS for Clang
  sched.h: drop in_ubsan field when UBSAN is in trap mode
  scripts/gdb/tasks: add headers and improve spacing format
  scripts/gdb/proc: add struct mount & struct super_block addr in lx-mounts command
  kernel/relay.c: drop unneeded initialization
  panic: dump registers on panic_on_warn
  rapidio: fix the missed put_device() for rio_mport_add_riodev
  rapidio: fix error handling path
  nilfs2: fix some kernel-doc warnings for nilfs2
  autofs: harden ioctl table
  ramfs: fix nommu mmap with gaps in the page cache
  mm: remove the now-unnecessary mmget_still_valid() hack
  mm/gup: take mmap_lock in get_dump_page()
  binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot
  coredump: rework elf/elf_fdpic vma_dump_size() into common helper
  coredump: refactor page range dumping into common helper
  coredump: let dump_emit() bail out on short writes
  ...
2020-10-16 11:31:55 -07:00
Jann Horn
4d45e75a99 mm: remove the now-unnecessary mmget_still_valid() hack
The preceding patches have ensured that core dumping properly takes the
mmap_lock.  Thanks to that, we can now remove mmget_still_valid() and all
its users.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Link: http://lkml.kernel.org/r/20200827114932.3572699-8-jannh@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:22 -07:00
Maor Gottlieb
c7a198c700 RDMA/ucma: Fix use after free in destroy id flow
ucma_free_ctx() should call to __destroy_id() on all the connection requests
that have not been delivered to user space. Currently it calls on the
context itself and cause to use after free.

Fixes the trace:

   BUG: Unable to handle kernel data access on write at 0x5deadbeef0000108
   Faulting instruction address: 0xc0080000002428f4
   Oops: Kernel access of bad area, sig: 11 [#1]
   Call Trace:
   [c000000207f2b680] [c00800000024280c] .__destroy_id+0x28c/0x610 [rdma_ucm] (unreliable)
   [c000000207f2b750] [c0080000002429c4] .__destroy_id+0x444/0x610 [rdma_ucm]
   [c000000207f2b820] [c008000000242c24] .ucma_close+0x94/0xf0 [rdma_ucm]
   [c000000207f2b8c0] [c00000000046fbdc] .__fput+0xac/0x330
   [c000000207f2b960] [c00000000015d48c] .task_work_run+0xbc/0x110
   [c000000207f2b9f0] [c00000000012fb00] .do_exit+0x430/0xc50
   [c000000207f2bae0] [c0000000001303ec] .do_group_exit+0x5c/0xd0
   [c000000207f2bb70] [c000000000144a34] .get_signal+0x194/0xe30
   [c000000207f2bc60] [c00000000001f6b4] .do_notify_resume+0x124/0x470
   [c000000207f2bd60] [c000000000032484] .interrupt_exit_user_prepare+0x1b4/0x240
   [c000000207f2be20] [c000000000010034] interrupt_return+0x14/0x1c0

Rename listen_ctx to conn_req_ctx as the poor name was the cause of this
bug.

Fixes: a1d33b70db ("RDMA/ucma: Rework how new connections are passed through event delivery")
Link: https://lore.kernel.org/r/20201012045600.418271-4-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-16 14:07:08 -03:00
Bob Pearson
71abf20b28 RDMA/rxe: Handle skb_clone() failure in rxe_recv.c
If skb_clone() is unable to allocate memory for a new sk_buff this is not
detected by the current code.

Check for a NULL return and continue. This is similar to other errors in
this loop over QPs attached to the multicast address and consistent with
the unreliable UD transport.

Fixes: e7ec96fc79 ("RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt()")
Addresses-Coverity-ID: 1497804: Null pointer dereferences (NULL_RETURNS)
Link: https://lore.kernel.org/r/20201013184236.5231-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-16 13:57:55 -03:00
Jason Gunthorpe
e0d696d201 RDMA/rxe: Move the definitions for rxe_av.network_type to uAPI
RXE was wrongly using an internal kernel enum as part of its uAPI, split
this out into a dedicated uAPI enum just for RXE. It only uses the IPv4
and IPv6 values.

This was exposed by changing the internal kernel enum definition which
broke RXE.

Fixes: 1c15b4f2a4 ("RDMA/core: Modify enum ib_gid_type and enum rdma_network_type")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-16 13:54:10 -03:00
Jason Gunthorpe
e0477b34d9 RDMA: Explicitly pass in the dma_device to ib_register_device
The code in setup_dma_device has become rather convoluted, move all of
this to the drivers. Drives now pass in a DMA capable struct device which
will be used to setup DMA, or drivers must fully configure the ibdev for
DMA and pass in NULL.

Other than setting the masks in rvt all drivers were doing this already
anyhow.

mthca, mlx4 and mlx5 were already setting up maximum DMA segment size for
DMA based on their hardweare limits in:
__mthca_init_one()
  dma_set_max_seg_size (1G)

__mlx4_init_one()
  dma_set_max_seg_size (1G)

mlx5_pci_init()
  set_dma_caps()
    dma_set_max_seg_size (2G)

Other non software drivers (except usnic) were extended to UINT_MAX [1, 2]
instead of 2G as was before.

[1] https://lore.kernel.org/linux-rdma/20200924114940.GE9475@nvidia.com/
[2] https://lore.kernel.org/linux-rdma/20200924114940.GE9475@nvidia.com/

Link: https://lore.kernel.org/r/20201008082752.275846-1-leon@kernel.org
Link: https://lore.kernel.org/r/6b2ed339933d066622d5715903870676d8cc523a.1602590106.git.mchehab+huawei@kernel.org
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-16 13:53:46 -03:00
Jason Gunthorpe
16e7483e6f Merge branch 'dynamic_sg' into rdma.git for-next
From Maor Gottlieb says:

====================
This series extends __sg_alloc_table_from_pages to allow chaining of new
pages to an already initialized SG table.

This allows for drivers to utilize the optimization of merging contiguous
pages without a need to pre allocate all the pages and hold them in a very
large temporary buffer prior to the call to SG table initialization.

The last patch changes the Infiniband core to use the new API. It removes
duplicate functionality from the code and benefits from the optimization
of allocating dynamic SG table from pages.

In huge pages system of 2MB page size, without this change, the SG table
would contain x512 SG entries.
====================

* branch 'dynamic_sg':
  RDMA/umem: Move to allocate SG table from pages
  lib/scatterlist: Add support in dynamic allocation of SG table from pages
  tools/testing/scatterlist: Show errors in human readable form
  tools/testing/scatterlist: Rejuvenate bit-rotten test
2020-10-16 12:40:58 -03:00
Linus Torvalds
9ff9b0d392 networking changes for the 5.10 merge window
Add redirect_neigh() BPF packet redirect helper, allowing to limit stack
 traversal in common container configs and improving TCP back-pressure.
 Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.
 
 Expand netlink policy support and improve policy export to user space.
 (Ge)netlink core performs request validation according to declared
 policies. Expand the expressiveness of those policies (min/max length
 and bitmasks). Allow dumping policies for particular commands.
 This is used for feature discovery by user space (instead of kernel
 version parsing or trial and error).
 
 Support IGMPv3/MLDv2 multicast listener discovery protocols in bridge.
 
 Allow more than 255 IPv4 multicast interfaces.
 
 Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
 packets of TCPv6.
 
 In Multi-patch TCP (MPTCP) support concurrent transmission of data
 on multiple subflows in a load balancing scenario. Enhance advertising
 addresses via the RM_ADDR/ADD_ADDR options.
 
 Support SMC-Dv2 version of SMC, which enables multi-subnet deployments.
 
 Allow more calls to same peer in RxRPC.
 
 Support two new Controller Area Network (CAN) protocols -
 CAN-FD and ISO 15765-2:2016.
 
 Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
 kernel problem.
 
 Add TC actions for implementing MPLS L2 VPNs.
 
 Improve nexthop code - e.g. handle various corner cases when nexthop
 objects are removed from groups better, skip unnecessary notifications
 and make it easier to offload nexthops into HW by converting
 to a blocking notifier.
 
 Support adding and consuming TCP header options by BPF programs,
 opening the doors for easy experimental and deployment-specific
 TCP option use.
 
 Reorganize TCP congestion control (CC) initialization to simplify life
 of TCP CC implemented in BPF.
 
 Add support for shipping BPF programs with the kernel and loading them
 early on boot via the User Mode Driver mechanism, hence reusing all the
 user space infra we have.
 
 Support sleepable BPF programs, initially targeting LSM and tracing.
 
 Add bpf_d_path() helper for returning full path for given 'struct path'.
 
 Make bpf_tail_call compatible with bpf-to-bpf calls.
 
 Allow BPF programs to call map_update_elem on sockmaps.
 
 Add BPF Type Format (BTF) support for type and enum discovery, as
 well as support for using BTF within the kernel itself (current use
 is for pretty printing structures).
 
 Support listing and getting information about bpf_links via the bpf
 syscall.
 
 Enhance kernel interfaces around NIC firmware update. Allow specifying
 overwrite mask to control if settings etc. are reset during update;
 report expected max time operation may take to users; support firmware
 activation without machine reboot incl. limits of how much impact
 reset may have (e.g. dropping link or not).
 
 Extend ethtool configuration interface to report IEEE-standard
 counters, to limit the need for per-vendor logic in user space.
 
 Adopt or extend devlink use for debug, monitoring, fw update
 in many drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw,
 mv88e6xxx, dpaa2-eth).
 
 In mlxsw expose critical and emergency SFP module temperature alarms.
 Refactor port buffer handling to make the defaults more suitable and
 support setting these values explicitly via the DCBNL interface.
 
 Add XDP support for Intel's igb driver.
 
 Support offloading TC flower classification and filtering rules to
 mscc_ocelot switches.
 
 Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
 fixed interval period pulse generator and one-step timestamping in
 dpaa-eth.
 
 Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
 offload.
 
 Add Lynx PHY/PCS MDIO module, and convert various drivers which have
 this HW to use it. Convert mvpp2 to split PCS.
 
 Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
 7-port Mediatek MT7531 IP.
 
 Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
 and wcn3680 support in wcn36xx.
 
 Improve performance for packets which don't require much offloads
 on recent Mellanox NICs by 20% by making multiple packets share
 a descriptor entry.
 
 Move chelsio inline crypto drivers (for TLS and IPsec) from the crypto
 subtree to drivers/net. Move MDIO drivers out of the phy directory.
 
 Clean up a lot of W=1 warnings, reportedly the actively developed
 subsections of networking drivers should now build W=1 warning free.
 
 Make sure drivers don't use in_interrupt() to dynamically adapt their
 code. Convert tasklets to use new tasklet_setup API (sadly this
 conversion is not yet complete).
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl+ItRwACgkQMUZtbf5S
 IrtTMg//UxpdR/MirT1DatBU0K/UGAZY82hV7F/UC8tPgjfHZeHvWlDFxfi3YP81
 PtPKbhRZ7DhwBXefUp6nY3UdvjftrJK2lJm8prJUPSsZRye8Wlcb7y65q7/P2y2U
 Efucyopg6RUrmrM0DUsIGYGJgylQLHnMYUl/keCsD4t5Bp4ksyi9R2t5eitGoWzh
 r3QGdbSa0AuWx4iu0i+tqp6Tj0ekMBMXLVb35dtU1t0joj2KTNEnSgABN3prOa8E
 iWYf2erOau68Ogp3yU3miCy0ZU4p/7qGHTtzbcp677692P/ekak6+zmfHLT9/Pjy
 2Stq2z6GoKuVxdktr91D9pA3jxG4LxSJmr0TImcGnXbvkMP3Ez3g9RrpV5fn8j6F
 mZCH8TKZAoD5aJrAJAMkhZmLYE1pvDa7KolSk8WogXrbCnTEb5Nv8FHTS1Qnk3yl
 wSKXuvutFVNLMEHCnWQLtODbTST9DI/aOi6EctPpuOA/ZyL1v3pl+gfp37S+LUTe
 owMnT/7TdvKaTD0+gIyU53M6rAWTtr5YyRQorX9awIu/4Ha0F0gYD7BJZQUGtegp
 HzKt59NiSrFdbSH7UdyemdBF4LuCgIhS7rgfeoUXMXmuPHq7eHXyHZt5dzPPa/xP
 81P0MAvdpFVwg8ij2yp2sHS7sISIRKq17fd1tIewUabxQbjXqPc=
 =bc1U
 -----END PGP SIGNATURE-----

Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:

 - Add redirect_neigh() BPF packet redirect helper, allowing to limit
   stack traversal in common container configs and improving TCP
   back-pressure.

   Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.

 - Expand netlink policy support and improve policy export to user
   space. (Ge)netlink core performs request validation according to
   declared policies. Expand the expressiveness of those policies
   (min/max length and bitmasks). Allow dumping policies for particular
   commands. This is used for feature discovery by user space (instead
   of kernel version parsing or trial and error).

 - Support IGMPv3/MLDv2 multicast listener discovery protocols in
   bridge.

 - Allow more than 255 IPv4 multicast interfaces.

 - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
   packets of TCPv6.

 - In Multi-patch TCP (MPTCP) support concurrent transmission of data on
   multiple subflows in a load balancing scenario. Enhance advertising
   addresses via the RM_ADDR/ADD_ADDR options.

 - Support SMC-Dv2 version of SMC, which enables multi-subnet
   deployments.

 - Allow more calls to same peer in RxRPC.

 - Support two new Controller Area Network (CAN) protocols - CAN-FD and
   ISO 15765-2:2016.

 - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
   kernel problem.

 - Add TC actions for implementing MPLS L2 VPNs.

 - Improve nexthop code - e.g. handle various corner cases when nexthop
   objects are removed from groups better, skip unnecessary
   notifications and make it easier to offload nexthops into HW by
   converting to a blocking notifier.

 - Support adding and consuming TCP header options by BPF programs,
   opening the doors for easy experimental and deployment-specific TCP
   option use.

 - Reorganize TCP congestion control (CC) initialization to simplify
   life of TCP CC implemented in BPF.

 - Add support for shipping BPF programs with the kernel and loading
   them early on boot via the User Mode Driver mechanism, hence reusing
   all the user space infra we have.

 - Support sleepable BPF programs, initially targeting LSM and tracing.

 - Add bpf_d_path() helper for returning full path for given 'struct
   path'.

 - Make bpf_tail_call compatible with bpf-to-bpf calls.

 - Allow BPF programs to call map_update_elem on sockmaps.

 - Add BPF Type Format (BTF) support for type and enum discovery, as
   well as support for using BTF within the kernel itself (current use
   is for pretty printing structures).

 - Support listing and getting information about bpf_links via the bpf
   syscall.

 - Enhance kernel interfaces around NIC firmware update. Allow
   specifying overwrite mask to control if settings etc. are reset
   during update; report expected max time operation may take to users;
   support firmware activation without machine reboot incl. limits of
   how much impact reset may have (e.g. dropping link or not).

 - Extend ethtool configuration interface to report IEEE-standard
   counters, to limit the need for per-vendor logic in user space.

 - Adopt or extend devlink use for debug, monitoring, fw update in many
   drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
   dpaa2-eth).

 - In mlxsw expose critical and emergency SFP module temperature alarms.
   Refactor port buffer handling to make the defaults more suitable and
   support setting these values explicitly via the DCBNL interface.

 - Add XDP support for Intel's igb driver.

 - Support offloading TC flower classification and filtering rules to
   mscc_ocelot switches.

 - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
   fixed interval period pulse generator and one-step timestamping in
   dpaa-eth.

 - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
   offload.

 - Add Lynx PHY/PCS MDIO module, and convert various drivers which have
   this HW to use it. Convert mvpp2 to split PCS.

 - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
   7-port Mediatek MT7531 IP.

 - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
   and wcn3680 support in wcn36xx.

 - Improve performance for packets which don't require much offloads on
   recent Mellanox NICs by 20% by making multiple packets share a
   descriptor entry.

 - Move chelsio inline crypto drivers (for TLS and IPsec) from the
   crypto subtree to drivers/net. Move MDIO drivers out of the phy
   directory.

 - Clean up a lot of W=1 warnings, reportedly the actively developed
   subsections of networking drivers should now build W=1 warning free.

 - Make sure drivers don't use in_interrupt() to dynamically adapt their
   code. Convert tasklets to use new tasklet_setup API (sadly this
   conversion is not yet complete).

* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
  Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
  net, sockmap: Don't call bpf_prog_put() on NULL pointer
  bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
  bpf, sockmap: Add locking annotations to iterator
  netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
  net: fix pos incrementment in ipv6_route_seq_next
  net/smc: fix invalid return code in smcd_new_buf_create()
  net/smc: fix valid DMBE buffer sizes
  net/smc: fix use-after-free of delayed events
  bpfilter: Fix build error with CONFIG_BPFILTER_UMH
  cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
  net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
  bpf: Fix register equivalence tracking.
  rxrpc: Fix loss of final ack on shutdown
  rxrpc: Fix bundle counting for exclusive connections
  netfilter: restore NF_INET_NUMHOOKS
  ibmveth: Identify ingress large send packets.
  ibmveth: Switch order of ibmveth_helper calls.
  cxgb4: handle 4-tuple PEDIT to NAT mode translation
  selftests: Add VRF route leaking tests
  ...
2020-10-15 18:42:13 -07:00
Heiner Kallweit
3b51788a2d IB/hfi1: use new function dev_fetch_sw_netstats
Simplify the code by using new function dev_fetch_sw_netstats().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/6cad1a04-f021-d94b-45fd-7cc7cf07367d@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-13 17:33:48 -07:00
Linus Torvalds
3ad11d7ac8 block-5.10-2020-10-12
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+EWUgQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpnoxEADCVSNBRkpV0OVkOEC3wf8EGhXhk01Jnjtl
 u5Mg2V55hcgJ0thQxBV/V28XyqmsEBrmAVi0Yf8Vr9Qbq4Ze08Wae4ChS4rEOyh1
 jTcGYWx5aJB3ChLvV/HI0nWQ3bkj03mMrL3SW8rhhf5DTyKHsVeTenpx42Qu/FKf
 fRzi09FSr3Pjd0B+EX6gunwJnlyXQC5Fa4AA0GhnXJzAznANXxHkkcXu8a6Yw75x
 e28CfhIBliORsK8sRHLoUnPpeTe1vtxCBhBMsE+gJAj9ZUOWMzvNFIPP4FvfawDy
 6cCQo2m1azJ/IdZZCDjFUWyjh+wxdKMp+NNryEcoV+VlqIoc3n98rFwrSL+GIq5Z
 WVwEwq+AcwoMCsD29Lu1ytL2PQ/RVqcJP5UheMrbL4vzefNfJFumQVZLIcX0k943
 8dFL2QHL+H/hM9Dx5y5rjeiWkAlq75v4xPKVjh/DHb4nehddCqn/+DD5HDhNANHf
 c1kmmEuYhvLpIaC4DHjE6DwLh8TPKahJjwsGuBOTr7D93NUQD+OOWsIhX6mNISIl
 FFhP8cd0/ZZVV//9j+q+5B4BaJsT+ZtwmrelKFnPdwPSnh+3iu8zPRRWO+8P8fRC
 YvddxuJAmE6BLmsAYrdz6Xb/wqfyV44cEiyivF0oBQfnhbtnXwDnkDWSfJD1bvCm
 ZwfpDh2+Tg==
 =LzyE
 -----END PGP SIGNATURE-----

Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:

 - Series of merge handling cleanups (Baolin, Christoph)

 - Series of blk-throttle fixes and cleanups (Baolin)

 - Series cleaning up BDI, seperating the block device from the
   backing_dev_info (Christoph)

 - Removal of bdget() as a generic API (Christoph)

 - Removal of blkdev_get() as a generic API (Christoph)

 - Cleanup of is-partition checks (Christoph)

 - Series reworking disk revalidation (Christoph)

 - Series cleaning up bio flags (Christoph)

 - bio crypt fixes (Eric)

 - IO stats inflight tweak (Gabriel)

 - blk-mq tags fixes (Hannes)

 - Buffer invalidation fixes (Jan)

 - Allow soft limits for zone append (Johannes)

 - Shared tag set improvements (John, Kashyap)

 - Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)

 - DM no-wait support (Mike, Konstantin)

 - Request allocation improvements (Ming)

 - Allow md/dm/bcache to use IO stat helpers (Song)

 - Series improving blk-iocost (Tejun)

 - Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
   Xianting, Yang, Yufen, yangerkun)

* tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
  block: fix uapi blkzoned.h comments
  blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
  blk-mq: get rid of the dead flush handle code path
  block: get rid of unnecessary local variable
  block: fix comment and add lockdep assert
  blk-mq: use helper function to test hw stopped
  block: use helper function to test queue register
  block: remove redundant mq check
  block: invoke blk_mq_exit_sched no matter whether have .exit_sched
  percpu_ref: don't refer to ref->data if it isn't allocated
  block: ratelimit handle_bad_sector() message
  blk-throttle: Re-use the throtl_set_slice_end()
  blk-throttle: Open code __throtl_de/enqueue_tg()
  blk-throttle: Move service tree validation out of the throtl_rb_first()
  blk-throttle: Move the list operation after list validation
  blk-throttle: Fix IO hang for a corner case
  blk-throttle: Avoid tracking latency if low limit is invalid
  blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
  blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
  block: Remove redundant 'return' statement
  ...
2020-10-13 12:12:44 -07:00
Håkon Bugge
bf6a47644e IB/mlx4: Convert rej_tmout radix-tree to XArray
Was missed during the initial review of the below patch

Fixes: 227a0e142e ("IB/mlx4: Add support for REJ due to timeout")
Link: https://lore.kernel.org/r/1602253482-6718-1-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-09 12:34:49 -03:00
Bob Pearson
de55412d02 RDMA/rxe: Fix bug rejecting all multicast packets
Fix a bug in rxe_rcv() that causes all multicast packets to be
dropped. Currently rxe_match_dgid() is called for each packet to verify
that the destination IP address matches one of the entries in the port
source GID table. This is incorrect for IP multicast addresses since they
do not appear in the GID table.

Add code to detect multicast addresses.

Change function name to rxe_chk_dgid() which is clearer.

Link: https://lore.kernel.org/r/20201008212753.265249-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-08 20:22:12 -03:00
Bob Pearson
e7ec96fc79 RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt()
The changes referenced below replaced sbk_clone)_ by taking additional
references, passing the skb along and then freeing the skb. This
deleted the packets before they could be processed and additionally
passed bad data in each packet. Since pkt is stored in skb->cb
changing pkt->qp changed it for all the packets.

Replace skb_get() by sbk_clone() in rxe_rcv_mcast_pkt() for cases where
multiple QPs are receiving multicast packets on the same address.

Delete kfree_skb() because the packets need to live until they have been
processed by each QP. They are freed later.

Fixes: 86af617641 ("IB/rxe: remove unnecessary skb_clone")
Fixes: fe896ceb57 ("IB/rxe: replace refcount_inc with skb_get")
Link: https://lore.kernel.org/r/20201008203651.256958-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-08 20:22:11 -03:00
Bob Pearson
1858d98b83 RDMA/rxe: Remove duplicate entries in struct rxe_mr
Struct rxe_mem had pd, lkey and rkey values both in itself and in the
struct ib_mr which is also included in rxe_mem.

Delete these entries and replace references with the ones in ibmr.Add
mr_pd, mr_lkey and mr_rkey macros which extract these values from mr.

Link: https://lore.kernel.org/r/20201008212818.265303-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-08 20:07:50 -03:00
Colin Ian King
8e71f694e0 IB/rdmavt: Fix sizeof mismatch
An incorrect sizeof is being used, struct rvt_ibport ** is not correct, it
should be struct rvt_ibport *. Note that since ** is the same size as
* this is not causing any issues.  Improve this fix by using
sizeof(*rdi->ports) as this allows us to not even reference the type
of the pointer.  Also remove line breaks as the entire statement can
fit on one line.

Link: https://lore.kernel.org/r/20201008095204.82683-1-colin.king@canonical.com
Addresses-Coverity: ("Sizeof not portable (SIZEOF_MISMATCH)")
Fixes: ff6acd6951 ("IB/rdmavt: Add device structure allocation")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-08 15:14:17 -03:00
Colin Ian King
73c5265913 RDMA/bnxt_re: Fix sizeof mismatch for allocation of pbl_tbl.
An incorrect sizeof is being used, u64 * is not correct, it should be just
u64 for a table of umem_pgs number of u64 items in the pbl_tbl.  Use the
idiom sizeof(*pbl_tbl) to get the object type without the need to
explicitly use u64.

Link: https://lore.kernel.org/r/20201006114700.537916-1-colin.king@canonical.com
Addresses-Coverity: ("Sizeof not portable (SIZEOF_MISMATCH)")
Fixes: 1ac5a40479 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-06 16:56:36 -03:00
Jason Gunthorpe
6ef999f500 RDMA/bnxt_re: Use rdma_umem_for_each_dma_block()
This driver is taking the SGL out of the umem and passing it through a
struct bnxt_qplib_sg_info. Instead of passing the SGL pass the umem and
then use rdma_umem_for_each_dma_block() directly.

Move the calls of ib_umem_num_dma_blocks() closer to their actual point of
use, npages is only set for non-umem pbl flows.

Link: https://lore.kernel.org/r/0-v1-b37437a73f35+49c-bnxt_re_dma_block_jgg@nvidia.com
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Tested-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-06 16:45:53 -03:00
Ming Lei
2b0d3d3e4f percpu_ref: reduce memory footprint of percpu_ref in fast path
'struct percpu_ref' is often embedded into one user structure, and the
instance is usually referenced in fast path, however actually only
'percpu_count_ptr' is needed in fast path.

So move other fields into one new structure of 'percpu_ref_data', and
allocate it dynamically via kzalloc(), then memory footprint of
'percpu_ref' in fast path is reduced a lot and becomes suitable to put
into hot cacheline of user structure.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Veronika Kabatova <vkabatov@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-06 07:29:36 -06:00
Maor Gottlieb
0c16d9635e RDMA/umem: Move to allocate SG table from pages
Remove the implementation of ib_umem_add_sg_table and instead
call to __sg_alloc_table_from_pages which already has the logic to
merge contiguous pages.

Besides that it removes duplicated functionality, it reduces the
memory consumption of the SG table significantly. Prior to this
patch, the SG table was allocated in advance regardless consideration
of contiguous pages.

In huge pages system of 2MB page size, without this change, the SG table
would contain x512 SG entries.
E.g. for 100GB memory registration:

	 Number of entries	Size
Before 	      26214400          600.0MB
After            51200		  1.2MB

Link: https://lore.kernel.org/r/20201004154340.1080481-5-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-05 20:45:45 -03:00
Linus Torvalds
165563c050 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Make sure SKB control block is in the proper state during IPSEC
    ESP-in-TCP encapsulation. From Sabrina Dubroca.

 2) Various kinds of attributes were not being cloned properly when we
    build new xfrm_state objects from existing ones. Fix from Antony
    Antony.

 3) Make sure to keep BTF sections, from Tony Ambardar.

 4) TX DMA channels need proper locking in lantiq driver, from Hauke
    Mehrtens.

 5) Honour route MTU during forwarding, always. From Maciej
    Żenczykowski.

 6) Fix races in kTLS which can result in crashes, from Rohit
    Maheshwari.

 7) Skip TCP DSACKs with rediculous sequence ranges, from Priyaranjan
    Jha.

 8) Use correct address family in xfrm state lookups, from Herbert Xu.

 9) A bridge FDB flush should not clear out user managed fdb entries
    with the ext_learn flag set, from Nikolay Aleksandrov.

10) Fix nested locking of netdev address lists, from Taehee Yoo.

11) Fix handling of 32-bit DATA_FIN values in mptcp, from Mat Martineau.

12) Fix r8169 data corruptions on RTL8402 chips, from Heiner Kallweit.

13) Don't free command entries in mlx5 while comp handler could still be
    running, from Eran Ben Elisha.

14) Error flow of request_irq() in mlx5 is busted, due to an off by one
    we try to free and IRQ never allocated. From Maor Gottlieb.

15) Fix leak when dumping netlink policies, from Johannes Berg.

16) Sendpage cannot be performed when a page is a slab page, or the page
    count is < 1. Some subsystems such as nvme were doing so. Create a
    "sendpage_ok()" helper and use it as needed, from Coly Li.

17) Don't leak request socket when using syncookes with mptcp, from
    Paolo Abeni.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits)
  net/core: check length before updating Ethertype in skb_mpls_{push,pop}
  net: mvneta: fix double free of txq->buf
  net_sched: check error pointer in tcf_dump_walker()
  net: team: fix memory leak in __team_options_register
  net: typhoon: Fix a typo Typoon --> Typhoon
  net: hinic: fix DEVLINK build errors
  net: stmmac: Modify configuration method of EEE timers
  tcp: fix syn cookied MPTCP request socket leak
  libceph: use sendpage_ok() in ceph_tcp_sendpage()
  scsi: libiscsi: use sendpage_ok() in iscsi_tcp_segment_map()
  drbd: code cleanup by using sendpage_ok() to check page for kernel_sendpage()
  tcp: use sendpage_ok() to detect misused .sendpage
  nvme-tcp: check page by sendpage_ok() before calling kernel_sendpage()
  net: add WARN_ONCE in kernel_sendpage() for improper zero-copy send
  net: introduce helper sendpage_ok() in include/linux/net.h
  net: usb: pegasus: Proper error handing when setting pegasus' MAC address
  net: core: document two new elements of struct net_device
  netlink: fix policy dump leak
  net/mlx5e: Fix race condition on nhe->n pointer in neigh update
  net/mlx5e: Fix VLAN create flow
  ...
2020-10-05 11:27:14 -07:00
Kamal Heib
5ce2dced8e RDMA/ipoib: Set rtnl_link_ops for ipoib interfaces
Report the "ipoib pkey", "mode" and "umcast" netlink attributes for every
IPoiB interface type, not just children created with 'ip link add'.

After setting the rtnl_link_ops for the parent interface, implement the
dellink() callback to block users from trying to remove it.

Fixes: 862096a8bb ("IB/ipoib: Add more rtnl_link_ops callbacks")
Link: https://lore.kernel.org/r/20201004132948.26669-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-05 15:05:45 -03:00
Avihai Horon
9f85cbe50a RDMA/uverbs: Expose the new GID query API to user space
Expose the query GID table and entry API to user space by adding two new
methods and method handlers to the device object.

This API provides a faster way to query a GID table using single call and
will be used in libibverbs to improve current approach that requires
multiple calls to open, close and read multiple sysfs files for a single
GID table entry.

Link: https://lore.kernel.org/r/20200923165015.2491894-5-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 21:20:11 -03:00
Avihai Horon
c4b4d548fa RDMA/core: Introduce new GID table query API
Introduce rdma_query_gid_table which enables querying all the GID tables
of a given device and copying the attributes of all valid GID entries to a
provided buffer.

This API provides a faster way to query a GID table using single call and
will be used in libibverbs to improve current approach that requires
multiple calls to open, close and read multiple sysfs files for a single
GID table entry.

Link: https://lore.kernel.org/r/20200923165015.2491894-4-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 21:20:11 -03:00
Avihai Horon
1c15b4f2a4 RDMA/core: Modify enum ib_gid_type and enum rdma_network_type
Separate IB_GID_TYPE_IB and IB_GID_TYPE_ROCE to two different values, so
enum ib_gid_type will match the gid types of the new query GID table API
which will be introduced in the following patches.

This change in enum ib_gid_type requires to change also enum
rdma_network_type by separating RDMA_NETWORK_IB and RDMA_NETWORK_ROCE_V1
values.

Link: https://lore.kernel.org/r/20200923165015.2491894-3-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 21:20:11 -03:00
Avihai Horon
3ff4de8f60 RDMA/core: Change rdma_get_gid_attr returned error code
Change the error code returned from rdma_get_gid_attr when the GID entry
is invalid but the GID index is in the gid table size range to -ENODATA
instead of -EINVAL.

This change is done in order to provide a more accurate error reporting to
be used by the new GID query API in user space. Nevertheless, -EINVAL is
still returned from sysfs in the aforementioned case to maintain
compatibility with user space that expects -EINVAL.

Link: https://lore.kernel.org/r/20200923165015.2491894-2-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 21:20:11 -03:00
Alok Prasad
f45271acdf RDMA/qedr: Endianness warnings cleanup
Making a change to fix following sparse warnings reported by kbuild bot.

 CHECK   drivers/infiniband/hw/qedr/verbs.c

drivers/infiniband/hw/qedr/verbs.c:3872:59: warning: incorrect type in assignment (different base types)
drivers/infiniband/hw/qedr/verbs.c:3872:59:    expected restricted __le32 [usertype] sge_prod
drivers/infiniband/hw/qedr/verbs.c:3872:59:    got unsigned int [usertype] sge_prod
drivers/infiniband/hw/qedr/verbs.c:3875:59: warning: incorrect type in assignment (different base types)
drivers/infiniband/hw/qedr/verbs.c:3875:59:    expected restricted __le32 [usertype] wqe_prod
drivers/infiniband/hw/qedr/verbs.c:3875:59:    got unsigned int [usertype] wqe_prod

Link: https://lore.kernel.org/r/20201001100959.19940-1-palok@marvell.com
Reported-by: kbuild test robot <lkp@intel.com>
Fixes: acca72e2b0 ("RDMA/qedr: SRQ's bug fixes")
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 20:50:37 -03:00
Rikard Falkeborn
3c4e919b48 RDMA/rtrs: Constify static struct attribute_group
The only usage of these is to pass their address to sysfs_create_group()
and sysfs_remove_group(), both which takes const pointers. Make it const
to allow the compiler to put them in read-only memory.

Link: https://lore.kernel.org/r/20200930224004.24279-3-rikard.falkeborn@gmail.com
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 20:44:52 -03:00
Rikard Falkeborn
42d5179c89 RDMA/core: Constify struct attribute_group
The only usage of the pma_table field in the ib_port struct is to pass its
address to sysfs_create_group() and sysfs_remove_group(). Make it const to
make it possible to constify a couple of static struct
attribute_group. This allows the compiler to put them in read-only memory.

Link: https://lore.kernel.org/r/20200930224004.24279-2-rikard.falkeborn@gmail.com
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 20:44:51 -03:00
Yishai Hadas
a03bfc37d5 RDMA/mlx5: Sync device with CPU pages upon ODP MR registration
Sync device with CPU pages upon ODP MR registration. mlx5 already has to
zero the HW's version of the PAS list, may as well deliver a PAS list that
matches the current CPU page tables configuration.

Link: https://lore.kernel.org/r/20200930163828.1336747-5-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 16:44:44 -03:00
Yishai Hadas
677cf51f71 RDMA/mlx5: Extend advice MR to support non faulting mode
Extend advice MR to support non faulting mode, this can improve
performance by increasing the populated page tables in the device.

Link: https://lore.kernel.org/r/20200930163828.1336747-4-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 16:44:05 -03:00
Yishai Hadas
8bfafde086 IB/core: Enable ODP sync without faulting
Enable ODP sync without faulting, this improves performance by reducing
the number of page faults in the system.

The gain from this option is that the device page table can be aligned
with the presented pages in the CPU page table without causing page
faults.

As of that, the overhead on data path from hardware point of view to
trigger a fault which end-up by calling the driver to bring the pages
will be dropped.

Link: https://lore.kernel.org/r/20200930163828.1336747-3-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 16:44:05 -03:00
Yishai Hadas
36f30e486d IB/core: Improve ODP to use hmm_range_fault()
Move to use hmm_range_fault() instead of get_user_pags_remote() to improve
performance in a few aspects:

This includes:
- Dropping the need to allocate and free memory to hold its output

- No need any more to use put_page() to unpin the pages

- The logic to detect contiguous pages is done based on the returned
  order, no need to run per page and evaluate.

In addition, moving to use hmm_range_fault() enables to reduce page faults
in the system with it's snapshot mode, this will be introduced in next
patches from this series.

As part of this, cleanup some flows and use the required data structures
to work with hmm_range_fault().

Link: https://lore.kernel.org/r/20200930163828.1336747-2-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 16:39:54 -03:00
Jason Gunthorpe
2ee9bf346f RDMA/addr: Fix race with netevent_callback()/rdma_addr_cancel()
This three thread race can result in the work being run once the callback
becomes NULL:

       CPU1                 CPU2                   CPU3
 netevent_callback()
                     process_one_req()       rdma_addr_cancel()
                      [..]
     spin_lock_bh()
  	set_timeout()
     spin_unlock_bh()

						spin_lock_bh()
						list_del_init(&req->list);
						spin_unlock_bh()

		     req->callback = NULL
		     spin_lock_bh()
		       if (!list_empty(&req->list))
                         // Skipped!
		         // cancel_delayed_work(&req->work);
		     spin_unlock_bh()

		    process_one_req() // again
		     req->callback() // BOOM
						cancel_delayed_work_sync()

The solution is to always cancel the work once it is completed so any
in between set_timeout() does not result in it running again.

Cc: stable@vger.kernel.org
Fixes: 44e75052bc ("RDMA/rdma_cm: Make rdma_addr_cancel into a fence")
Link: https://lore.kernel.org/r/20200930072007.1009692-1-leon@kernel.org
Reported-by: Dan Aloni <dan@kernelim.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-30 15:29:05 -03:00
Jason Gunthorpe
a6f0b08dba RDMA/core: Remove ucontext->closing
Nothing reads this any more, and the reason for its existence has passed
due to the deferred fput() scheme.

Fixes: 8ea1f989aa ("drivers/IB,usnic: reduce scope of mmap_sem")
Link: https://lore.kernel.org/r/0-v1-df64ff042436+42-uctx_closing_jgg@nvidia.com
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-30 15:27:19 -03:00
Gioh Kim
220aee3021 RDMA/rtrs: Remove unused field of rtrs_iu
list field is not used anywhere

Link: https://lore.kernel.org/r/20200930131407.6438-1-gi-oh.kim@clous.ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-30 15:21:16 -03:00
Lang Cheng
cf4c0fb00d RDMA/hns: Remove unused variables and definitions
Some code was removed but the variables were still there, and some
parameters have been changed to be queried from firmware. So the
definitions of them are no longer needed.

Fixes: 2a3d923f87 ("RDMA/hns: Replace magic numbers with #defines")
Fixes: 82e620d9c3 ("RDMA/hns: Modify the data structure of hns_roce_av")
Fixes: 8254746978 ("IB/hns: Implement the add_gid/del_gid and optimize the GIDs management")
Fixes: 21b97f5387 ("RDMA/hns: Fixup qp release bug")
Link: https://lore.kernel.org/r/1601371934-40003-1-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 14:01:20 -03:00
Leon Romanovsky
d4f40a1fb9 RDMA/i40iw: Remove intermediate pointer that points to the same struct
There is no real need to have an intermediate pointer for the same struct,
remove it, and use struct directly.

Link: https://lore.kernel.org/r/20200926102450.2966017-11-leon@kernel.org
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:11:07 -03:00
Leon Romanovsky
21c2fe94ab RDMA/mthca: Combine special QP struct with mthca QP
As preparation for the removal of QP allocation logic, we need to ensure
that ib_core allocates the right amount of memory before a call to the
driver create_qp(). It requires from driver to have the same structs for
all types of QPs.

Link: https://lore.kernel.org/r/20200926102450.2966017-10-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:11:07 -03:00
Leon Romanovsky
b925c555a1 RDMA/drivers: Remove udata check from special QP
GSI QP can't be created from the user space, hence the udata check is
always false (udata == NULL). Remove that check and simplify the flow.

Link: https://lore.kernel.org/r/20200926102450.2966017-9-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:11:06 -03:00
Leon Romanovsky
5807bb3205 RDMA/core: Align write and ioctl checks of QP types
The ioctl flow checks that the user provides only a supported list of QP
types, while write flow didn't do it and relied on the driver to check
it. Align those flows to fail as early as possible.

Link: https://lore.kernel.org/r/20200926102450.2966017-8-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:11:06 -03:00
Leon Romanovsky
8fd3cd2ae5 RDMA/mlx4: Prepare QP allocation to remove from the driver
Since all mlx4 QP have same storage type, move the QP allocation to be in
one place. This change is preparation to removal of such allocation from
the driver.

Link: https://lore.kernel.org/r/20200926102450.2966017-7-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:11:06 -03:00
Leon Romanovsky
915ec7ed91 RDMA/mlx4: Embed GSI QP into general mlx4_ib QP
Refactor the storage struct of mlx4 GSI QP to be embedded in mlx4_ib
QP. This allows to remove internal memory allocation of QP struct which is
hidden inside the mlx4_ib_create_qp() flow.

Link: https://lore.kernel.org/r/20200926102450.2966017-6-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:09:49 -03:00
Leon Romanovsky
eebe580feb RDMA/mlx5: Delete not needed GSI QP signal QP type
GSI QP doesn't need signal QP type because it is initialized statically to
zero, which is IB_SIGNAL_ALL_WR also wr->send_flags isn't set too.  This
means that the GSI QP signal QP type can be removed.

Link: https://lore.kernel.org/r/20200926102450.2966017-5-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:09:49 -03:00
Leon Romanovsky
2dc4d6725b RDMA/mlx5: Change GSI QP to have same creation flow like other QPs
There is no reason to have separate create flow for the GSI QP, while
general create_qp routine has all needed checks and ability to allocate
and free the proper struct mlx5_ib_qp.

Link: https://lore.kernel.org/r/20200926102450.2966017-4-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:09:48 -03:00
Leon Romanovsky
f8225e3488 RDMA/mlx5: Reuse existing fields in parent QP storage object
Remove duplication of mlx5_ib_qp and mlx5_ib_gsi_qp fields.  This change
returns the memory footprint of mlx5_ib QP to be as it was before
embedding GSI QP.

Link: https://lore.kernel.org/r/20200926102450.2966017-3-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:09:48 -03:00
Leon Romanovsky
0d9aef8603 RDMA/mlx5: Embed GSI QP into general mlx5_ib QP
The GSI QPs have different create flow from the regular QPs, but it is not
really needed. Update the code to use mlx5_ib_qp as a storage class for
all outside of GSI calls.

Link: https://lore.kernel.org/r/20200926102450.2966017-2-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:09:48 -03:00
Taehee Yoo
eff7423365 net: core: introduce struct netdev_nested_priv for nested interface infrastructure
Functions related to nested interface infrastructure such as
netdev_walk_all_{ upper | lower }_dev() pass both private functions
and "data" pointer to handle their own things.
At this point, the data pointer type is void *.
In order to make it easier to expand common variables and functions,
this new netdev_nested_priv structure is added.

In the following patch, a new member variable will be added into this
struct to fix the lockdep issue.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28 15:00:15 -07:00
Liu Shixin
b942fc0319 RDMA/mlx5: Fix type warning of sizeof in __mlx5_ib_alloc_counters()
sizeof() when applied to a pointer typed expression should give the size
of the pointed data, even if the data is a pointer.

Fixes: e1f24a79f4 ("IB/mlx5: Support congestion related counters")
Link: https://lore.kernel.org/r/20200917081354.2083293-1-liushixin2@huawei.com
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-25 09:17:42 -03:00
Weihang Li
30b707886a RDMA/hns: Support inline data in extented sge space for RC
HIP08 supports RC inline up to size of 32 Bytes, and all data should be
put into SQWQE. For HIP09, this capability is extended to 1024 Bytes, if
length of data is longer than 32 Bytes, they will be filled into extended
sge space.

Link: https://lore.kernel.org/r/1599744069-9968-1-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24 16:06:57 -03:00
Weihang Li
05df49279f RDMA/hns: Fix missing sq_sig_type when querying QP
The sq_sig_type field should be filled when querying QP, or the users may
get a wrong value.

Fixes: 926a01dc00 ("RDMA/hns: Add QP operations support for hip08 SoC")
Link: https://lore.kernel.org/r/1600509802-44382-9-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24 15:56:27 -03:00
Weihang Li
fbed9d2be2 RDMA/hns: Fix configuration of ack_req_freq in QPC
The hardware will add AckReq flag in BTH header according to the value of
ack_req_freq to request ACK from responder for the packets with this flag.
It should be greater than or equal to lp_pktn_ini instead of using a fixed
value.

Fixes: 7b9bd73ed1 ("RDMA/hns: Fix wrong assignment of lp_pktn_ini in QPC")
Link: https://lore.kernel.org/r/1600509802-44382-8-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24 15:56:27 -03:00
Wenpeng Liang
99fcf82521 RDMA/hns: Fix the wrong value of rnr_retry when querying qp
The rnr_retry returned to the user is not correct, it should be got from
another fields in QPC.

Fixes: bfe860351e ("RDMA/hns: Fix cast from or to restricted __le32 for driver")
Link: https://lore.kernel.org/r/1600509802-44382-7-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24 15:56:27 -03:00
Jiaran Zhang
768202a082 RDMA/hns: Solve the overflow of the calc_pg_sz()
calc_pg_sz() may gets a data calculation overflow if the PAGE_SIZE is 64 KB
and hop_num is 2. It is because that all variables involved in calculation
are defined in type of int. So change the type of bt_chunk_size,
buf_chunk_size and obj_per_chunk_default to u64.

Fixes: ba6bb7e974 ("RDMA/hns: Add interfaces to get pf capabilities from firmware")
Link: https://lore.kernel.org/r/1600509802-44382-6-git-send-email-liweihang@huawei.com
Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24 15:56:27 -03:00
Jiaran Zhang
172505cfa3 RDMA/hns: Add check for the validity of sl configuration
According to the RoCE v1 specification, the sl (service level) 0-7 are
mapped directly to priorities 0-7 respectively, sl 8-15 are reserved. The
driver should verify whether the the value of sl is larger than 7, if so,
an exception should be returned.

Fixes: 926a01dc00 ("RDMA/hns: Add QP operations support for hip08 SoC")
Link: https://lore.kernel.org/r/1600509802-44382-5-git-send-email-liweihang@huawei.com
Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24 15:56:27 -03:00
Lang Cheng
c19893fd9c RDMA/hns: Correct typo of hns_roce_create_cq()
Change "initialze" to "initialize".

Fixes: 8f3e9f3ea0 ("IB/hns: Add code for refreshing CQ CI using TPTR")
Link: https://lore.kernel.org/r/1600509802-44382-4-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24 15:56:27 -03:00
Yangyang Li
221109e643 RDMA/hns: Add interception for resizing SRQs
HIP08 doesn't support modifying the maximum number of outstanding WR in an
SRQ. However, the driver does not return a failure message, and users may
mistakenly think that the resizing is executed successfully. So the driver
needs to intercept this operation.

Fixes: ffb1308b88 ("RDMA/hns: Move SRQ code to the reasonable place")
Link: https://lore.kernel.org/r/1600509802-44382-3-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24 15:56:26 -03:00
Weihang Li
12542f1de1 RDMA/hns: Refactor process about opcode in post_send()
According to the IB specifications, the verbs should return an immediate
error when the users set an unsupported opcode. Furthermore, refactor codes
about opcode in process of post_send to make the difference between opcodes
clearer.

Link: https://lore.kernel.org/r/1600509802-44382-2-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24 15:56:26 -03:00
Yangyang Li
3cb2c996c9 RDMA/hns: Add support for SCCC in size of 64 Bytes
For HIP09, size of SCCC (Soft Congestion Control Context) is increased to
64 Bytes from 32 Bytes. The hardware will get the configuration of SCCC
from driver instead of using a fixed value.

Link: https://lore.kernel.org/r/1600245806-56321-5-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24 15:35:11 -03:00
Wenpeng Liang
98912ee82a RDMA/hns: Add support for QPC in size of 512 Bytes
The new version of RoCEE supports using QPC in size of 256B or 512B, so
that HIP09 can supports new congestion control algorithms by using QPC in
larger size.

Link: https://lore.kernel.org/r/1600245806-56321-4-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24 15:35:11 -03:00
Wenpeng Liang
09a5f210f6 RDMA/hns: Add support for CQE in size of 64 Bytes
The new version of RoCEE supports using CQE in size of 32B or 64B. The
performance of bus can be improved by using larger size of CQE.

Link: https://lore.kernel.org/r/1600245806-56321-3-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24 15:35:11 -03:00
Wenpeng Liang
247fc16d73 RDMA/hns: Add support for EQE in size of 64 Bytes
The new version of RoCEE supports using CEQE in size of 4B or 64B, AEQE in
size of 16B or 64B. The performance of bus can be improved by using larger
size of EQE.

Link: https://lore.kernel.org/r/1600245806-56321-2-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24 15:35:10 -03:00
Julia Lawall
3de3c4785b RDMA/efa: Drop double zeroing for sg_init_table()
sg_init_table() zeroes its first argument, so the allocation of that
argument doesn't have to.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression x,n,flags;
@@

x =
- kcalloc
+ kmalloc_array
  (n,sizeof(*x),flags)
...
sg_init_table(x,n)
// </smpl>

Link: https://lore.kernel.org/r/1600601186-7420-6-git-send-email-Julia.Lawall@inria.fr
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Acked-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-22 20:26:47 -03:00
Sindhu, Devale
f2334964e9 i40iw: Add support to make destroy QP synchronous
Occasionally ib_write_bw crash is seen due to access of a pd object in
i40iw_sc_qp_destroy after it is freed. Destroy qp is not synchronous in
i40iw and thus the iwqp object could be referencing a pd object that is
freed by ib core as a result of successful return from i40iw_destroy_qp.

Wait in i40iw_destroy_qp till all QP references are released and destroy
the QP and its associated resources before returning.  Switch to use the
refcount API vs atomic API for lifetime management of the qp.

 RIP: 0010:i40iw_sc_qp_destroy+0x4b/0x120 [i40iw]
 [...]
 RSP: 0018:ffffb4a7042e3ba8 EFLAGS: 00010002
 RAX: 0000000000000000 RBX: 0000000000000001 RCX: dead000000000122
 RDX: ffffb4a7042e3bac RSI: ffff8b7ef9b1e940 RDI: ffff8b7efbf09080
 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
 R10: 8080808080808080 R11: 0000000000000010 R12: ffff8b7efbf08050
 R13: 0000000000000001 R14: ffff8b7f15042928 R15: ffff8b7ef9b1e940
 FS:  0000000000000000(0000) GS:ffff8b7f2fa00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000400 CR3: 000000020d60a006 CR4: 00000000001606e0
 Call Trace:
  i40iw_exec_cqp_cmd+0x4d3/0x5c0 [i40iw]
  ? try_to_wake_up+0x1ea/0x5d0
  ? __switch_to_asm+0x40/0x70
  i40iw_process_cqp_cmd+0x95/0xa0 [i40iw]
  i40iw_handle_cqp_op+0x42/0x1a0 [i40iw]
  ? cm_event_handler+0x13c/0x1f0 [iw_cm]
  i40iw_rem_ref+0xa0/0xf0 [i40iw]
  cm_work_handler+0x99c/0xd10 [iw_cm]
  process_one_work+0x1a1/0x360
  worker_thread+0x30/0x380
  ? process_one_work+0x360/0x360
  kthread+0x10c/0x130
  ? kthread_park+0x80/0x80
  ret_from_fork+0x35/0x40

Fixes: d374984179 ("i40iw: add files for iwarp interface")
Link: https://lore.kernel.org/r/20200916131811.2077-1-shiraz.saleem@intel.com
Reported-by: Kamal Heib <kheib@redhat.com>
Signed-off-by: Sindhu, Devale <sindhu.devale@intel.com>
Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-22 20:23:18 -03:00
Daniel Kranzdorf
b0cff387e1 RDMA/efa: Add messages and RDMA read work requests HW stats
Add separate stats types for send messages and RDMA read work requests.

Link: https://lore.kernel.org/r/20200915141449.8428-3-galpress@amazon.com
Signed-off-by: Daniel Kranzdorf <dkkranzd@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-22 20:21:11 -03:00
Gal Pressman
215b88ac45 RDMA/efa: Group keep alive received counter with other SW stats
The keep alive received counter is a software stat, keep it grouped with
all other software stats.  Since all stored stats are software stats,
remove the efa_sw_stats struct and use efa_stats instead.

Link: https://lore.kernel.org/r/20200915141449.8428-2-galpress@amazon.com
Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-22 20:21:11 -03:00
Leon Romanovsky
b09c4d7012 RDMA/restrack: Improve readability in task name management
Use rdma_restrack_set_name() and rdma_restrack_parent_name() instead of
tricky uses of rdma_restrack_attach_task()/rdma_restrack_uadd().

This uniformly makes all restracks add'd using rdma_restrack_add().

Link: https://lore.kernel.org/r/20200922091106.2152715-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-22 19:47:35 -03:00
Leon Romanovsky
c34a23c28c RDMA/restrack: Simplify restrack tracking in kernel flows
Have a single rdma_restrack_add() that adds an entry, there is no reason
to split the user/kernel here, the rdma_restrack_set_task() is responsible
for this difference.

This patch prepares the code to the future requirement of making restrack
is mandatory for managing ib objects.

Link: https://lore.kernel.org/r/20200922091106.2152715-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-22 19:47:35 -03:00
Leon Romanovsky
13ef5539de RDMA/restrack: Count references to the verbs objects
Refactor the restrack code to make sure the kref inside the restrack entry
properly kref's the object in which it is embedded. This slight change is
needed for future conversions of MR and QP which are refcounted before the
release and kfree.

The ideal flow from ib_core perspective as follows:
* Allocate ib_* structure with rdma_zalloc_*.
* Set everything that is known to ib_core to that newly created object.
* Initialize kref with restrack help
* Call to driver specific allocation functions.
* Insert into restrack DB
....
* Return and release restrack with restrack_put.

Largely this means a rdma_restrack_new() should be called near allocating
the containing structure.

Link: https://lore.kernel.org/r/20200922091106.2152715-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-22 19:47:35 -03:00
Leon Romanovsky
d7ecab1e5f RDMA/mlx5: Don't call to restrack recursively
The restrack is going to manage memory of all IB objects and must be
called before object is created. GSI QP in the mlx5_ib separated between
creating dummy interface and HW object beneath. This was achieved by
double call to ib_create_qp().

In order to skip such reentry call to internal driver create_qp code.

Link: https://lore.kernel.org/r/20200922091106.2152715-3-leon@kernel.org
Reviewed-by: Mark Zhang <markz@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-22 19:47:35 -03:00
Leon Romanovsky
60aaeffa36 RDMA/cma: Delete from restrack DB after successful destroy
Update the code to have similar destroy pattern like other IB objects.

This change create asymmetry to the rdma_id_private create flow to make
sure that memory is managed by restrack.

Link: https://lore.kernel.org/r/20200922091106.2152715-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-22 19:47:34 -03:00
Jason Gunthorpe
f5449e7480 RDMA/ucma: Rework ucma_migrate_id() to avoid races with destroy
ucma_destroy_id() assumes that all things accessing the ctx will do so via
the xarray. This assumption violated only in the case the FD is being
closed, then the ctx is reached via the ctx_list. Normally this is OK
since ucma_destroy_id() cannot run concurrenty with release(), however
with ucma_migrate_id() is involved this can violated as the close of the
2nd FD can run concurrently with destroy on the first:

                CPU0                      CPU1
        ucma_destroy_id(fda)
                                  ucma_migrate_id(fda -> fdb)
                                       ucma_get_ctx()
        xa_lock()
         _ucma_find_context()
         xa_erase()
        xa_unlock()
                                       xa_lock()
                                        ctx->file = new_file
                                        list_move()
                                       xa_unlock()
                                      ucma_put_ctx()

                                   ucma_close(fdb)
                                      _destroy_id()
                                      kfree(ctx)

        _destroy_id()
          wait_for_completion()
          // boom, ctx was freed

The ctx->file must be modified under the handler and xa_lock, and prior to
modification the ID must be rechecked that it is still reachable from
cur_file, ie there is no parallel destroy or migrate.

To make this work remove the double locking and streamline the control
flow. The double locking was obsoleted by the handler lock now directly
preventing new uevents from being created, and the ctx_list cannot be read
while holding fgets on both files. Removing the double locking also
removes the need to check for the same file.

Fixes: 88314e4dda ("RDMA/cma: add support for rdma_migrate_id()")
Link: https://lore.kernel.org/r/0-v1-05c5a4090305+3a872-ucma_syz_migrate_jgg@nvidia.com
Reported-and-tested-by: syzbot+cc6fc752b3819e082d0c@syzkaller.appspotmail.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 20:54:01 -03:00
Jason Gunthorpe
8383da3e4a RDMA/mlx5: Clarify what the UMR is for when creating MRs
Once a mkey is created it can be modified using UMR. This is desirable for
performance reasons. However, different hardware has restrictions on what
modifications are possible using UMR. Make sense of these checks:

- mlx5_ib_can_reconfig_with_umr() returns true if the access flags can be
  altered. Most cases create MRs using 0 access flags (now made clear by
  consistent use of set_mkc_access_pd_addr_fields()), but the old logic
  here was tormented. Make it clear that this is checking if the current
  access_flags can be modified using UMR to different access_flags. It is
  always OK to use UMR to change flags that all HW supports.

- mlx5_ib_can_load_pas_with_umr() returns true if UMR can be used to
  enable and update the PAS/XLT. Enabling requires updating the entity
  size, so UMR ends up completely disabled on this old hardware. Make it
  clear why it is disabled. FRWR, ODP and cache always requires
  mlx5_ib_can_load_pas_with_umr().

- mlx5_ib_pas_fits_in_mr() is used to tell if an existing MR can be
  resized to hold a new PAS list. This only works for cached MR's because
  we don't store the PAS list size in other cases.

To be very clear, arrange things so any pre-created MR's in the cache
check the newly requested access_flags before allowing the MR to leave the
cache. If UMR cannot set the required access_flags the cache fails to
create the MR.

This in turn means relaxed ordering and atomic are now correctly blocked
early for implicit ODP on older HW.

Link: https://lore.kernel.org/r/20200914112653.345244-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 13:02:43 -03:00
Jason Gunthorpe
0ec52f0194 RDMA/mlx5: Disable IB_DEVICE_MEM_MGT_EXTENSIONS if IB_WR_REG_MR can't work
set_reg_wr() always fails if !umr_modify_entity_size_disabled because
mlx5_ib_can_use_umr() always fails. Without set_reg_wr() IB_WR_REG_MR
doesn't work and that means the device should not advertise
IB_DEVICE_MEM_MGT_EXTENSIONS.

Fixes: 841b07f99a ("IB/mlx5: Block MR WR if UMR is not possible")
Link: https://lore.kernel.org/r/20200914112653.345244-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 13:02:43 -03:00
Jason Gunthorpe
5eb29f0d13 RDMA/mlx5: Make mkeys always owned by the kernel's PD when not enabled
Any mkey that is not enabled and assigned to userspace should have the PD
set to a kernel owned PD.

When cache entries are created for the first time the PDN is set to 0,
which is probably a kernel PD, but be explicit.

When a MR is registered using the hybrid reg_create with UMR xlt & enable
the disabled mkey is pointing at the user PD, keep it pointing at the
kernel until a UMR enables it and sets the user PD.

Fixes: 9ec4483a3f ("IB/mlx5: Move MRs to a kernel PD when freeing them to the MR cache")
Link: https://lore.kernel.org/r/20200914112653.345244-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 13:02:43 -03:00
Jason Gunthorpe
1c97ca3da0 RDMA/mlx5: Use set_mkc_access_pd_addr_fields() in reg_create()
reg_create() open codes this helper, use the shared code.

Link: https://lore.kernel.org/r/20200914112653.345244-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 13:02:42 -03:00
Jason Gunthorpe
2e4e706e09 RDMA/mlx5: Remove dead check for EAGAIN after alloc_mr_from_cache()
alloc_mr_from_cache() no longer returns EAGAIN, this is just dead code
now.

Fixes: aad719dcf3 ("RDMA/mlx5: Allow MRs to be created in the cache synchronously")
Link: https://lore.kernel.org/r/20200914112653.345244-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 13:02:42 -03:00
Lijun Ou
22d3e1ed2c RDMA/hns: Set the unsupported wr opcode
hip06 does not support IB_WR_LOCAL_INV, so the ps_opcode should be set to
an invalid value instead of being left uninitialized.

Fixes: 9a4435375c ("IB/hns: Add driver files for hns RoCE driver")
Fixes: a2f3d4479f ("RDMA/hns: Avoid unncessary initialization")
Link: https://lore.kernel.org/r/1600350615-115217-1-git-send-email-oulijun@huawei.com
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 12:52:58 -03:00
Keita Suzuki
3e45410fe3 RDMA/qedr: Fix resource leak in qedr_create_qp
When xa_insert() fails, the acquired resource in qedr_create_qp should
also be freed. However, current implementation does not handle the error.

Fix this by adding a new goto label that calls qedr_free_qp_resources.

Fixes: 1212767e23 ("qedr: Add wrapping generic structure for qpidr and adjust idr routines.")
Link: https://lore.kernel.org/r/20200911125159.4577-1-keitasuzuki.park@sslab.ics.keio.ac.jp
Signed-off-by: Keita Suzuki <keitasuzuki.park@sslab.ics.keio.ac.jp>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 12:45:41 -03:00
Potnuri Bharat Teja
8d539c6109 RDMA/iw_cxgb4: Disable delayed ack by default
Receive side delayed ack mode is needed only for certain area networks/
connections. Therefore disable it by default.

Link: https://lore.kernel.org/r/20200909153707.2795-1-bharat@chelsio.com
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 11:29:37 -03:00
Jason Gunthorpe
a1255fff5d Merge branch 'mlx_sw_owner_v2' into rdma.git for-next
Leon Romanovsky says:

====================
This series from Alex extends software steering interface to support
devices with extra capability "sw_owner_2" which will replace existing
"sw_owner".
====================

Based on the mlx5-next branch at
     git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.

* branch 'mlx5_sw_owner_v2:
  RDMA/mlx5: Expose TIR and QP ICM address for sw_owner_v2 devices
  RDMA/mlx5: Allow DM allocation for sw_owner_v2 enabled devices
  RDMA/mlx5: Add sw_owner_v2 bit capability
2020-09-18 10:31:45 -03:00
Jason Gunthorpe
5dee5872f8 Merge branch 'mlx5_active_speed' into rdma.git for-next
Leon Romanovsky says:

====================
IBTA declares speed as 16 bits, but kernel stores it in u8. This series
fixes in-kernel declaration while keeping external interface intact.
====================

Based on the mlx5-next branch at
     git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.

* branch 'mlx5_active_speed':
  RDMA: Fix link active_speed size
  RDMA/mlx5: Delete duplicated mlx5_ptys_width enum
  net/mlx5: Refactor query port speed functions
2020-09-18 10:31:45 -03:00
Aharon Landau
376ceb31ff RDMA: Fix link active_speed size
According to the IB spec active_speed size should be u16 and not u8 as
before. Changing it to allow further extensions in offered speeds.

Link: https://lore.kernel.org/r/20200917090223.1018224-4-leon@kernel.org
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 10:31:24 -03:00
Alex Vesker
54a38b6627 RDMA/mlx5: Expose TIR and QP ICM address for sw_owner_v2 devices
Expose the ICM address to access TIR and QP, this will allow sw_owned_v2
devices to steer traffic to TIRs and QPs same as done with sw_owner
capability.

Link: https://lore.kernel.org/r/20200903073857.1129166-4-leon@kernel.org
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 09:40:41 -03:00
Alex Vesker
8310e32704 RDMA/mlx5: Allow DM allocation for sw_owner_v2 enabled devices
sw_owner_v2 will replace sw_owner for future devices, this means that if
sw_owner_v2 is set sw_owner should be ignored and DM allocation is
required for sw_owner_v2 devices to function.

Link: https://lore.kernel.org/r/20200903073857.1129166-3-leon@kernel.org
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 09:40:41 -03:00
Leon Romanovsky
c0a6b5ecc5 RDMA: Convert RWQ table logic to ib_core allocation scheme
Move struct ib_rwq_ind_table allocation to ib_core.

Link: https://lore.kernel.org/r/20200902081623.746359-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 14:04:33 -03:00
Leon Romanovsky
d18bb3e152 RDMA: Clean MW allocation and free flows
Move allocation and destruction of memory windows under ib_core
responsibility and clean drivers to ensure that no updates to MW
ib_core structures are done in driver layer.

Link: https://lore.kernel.org/r/20200902081623.746359-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 14:04:32 -03:00
Aharon Landau
e27014bdb4 RDMA/mlx5: Delete duplicated mlx5_ptys_width enum
Combine two same enums to avoid duplication.

Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2020-09-17 19:33:03 +03:00
Aharon Landau
639bf4415c net/mlx5: Refactor query port speed functions
The functions mlx5_query_port_link_width_oper and
mlx5_query_port_ib_proto_oper are always called together, so combine them
to a new function called mlx5_query_port_oper to avoid duplication.

And while the mlx5i_get_port_settings is the same as
mlx5_query_port_oper therefore let's remove it.

According to the IB spec link_width_oper and ib_proto_oper should be u16
and not as written u8, so perform casting as a preparation to cross-RDMA
patch which will fix that type for all drivers in the RDMA subsystem.

Fixes: ada68c31ba ("net/mlx5: Introduce a new header file for physical port functions")
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2020-09-17 19:33:02 +03:00
Jason Gunthorpe
b5de0c60cc RDMA/cma: Fix use after free race in roce multicast join
The roce path triggers a work queue that continues to touch the id_priv
but doesn't hold any reference on it. Futher, unlike in the IB case, the
work queue is not fenced during rdma_destroy_id().

This can trigger a use after free if a destroy is triggered in the
incredibly narrow window after the queue_work and the work starting and
obtaining the handler_mutex.

The only purpose of this work queue is to run the ULP event callback from
the standard context, so switch the design to use the existing
cma_work_handler() scheme. This simplifies quite a lot of the flow:

- Use the cma_work_handler() callback to launch the work for roce. This
  requires generating the event synchronously inside the
  rdma_join_multicast(), which in turn means the dummy struct
  ib_sa_multicast can become a simple stack variable.

- cm_work_handler() used the id_priv kref, so we can entirely eliminate
  the kref inside struct cma_multicast. Since the cma_multicast never
  leaks into an unprotected work queue the kfree can be done at the same
  time as for IB.

- Eliminating the general multicast.ib requires using cma_set_mgid() in a
  few places to recompute the mgid.

Fixes: 3c86aa70bf ("RDMA/cm: Add RDMA CM support for IBoE devices")
Link: https://lore.kernel.org/r/20200902081122.745412-9-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 09:09:25 -03:00
Jason Gunthorpe
3788d2997b RDMA/cma: Consolidate the destruction of a cma_multicast in one place
Two places were open coding this sequence, and also pull in
cma_leave_roce_mc_group() which was called only once.

Link: https://lore.kernel.org/r/20200902081122.745412-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 09:09:24 -03:00
Jason Gunthorpe
1bb5091def RDMA/cma: Remove dead code for kernel rdmacm multicast
There is no kernel user of RDMA CM multicast so this code managing the
multicast subscription of the kernel-only internal QP is dead. Remove it.

This makes the bug fixes in the next patches much simpler.

Link: https://lore.kernel.org/r/20200902081122.745412-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 09:09:24 -03:00
Jason Gunthorpe
7e85bcda8b RDMA/cma: Combine cma_ndev_work with cma_work
These are the same thing, except that cma_ndev_work doesn't have a state
transition. Signal no state transition by setting old_state and new_state
== 0.

In all cases the handler function should not be called once
rdma_destroy_id() has progressed passed setting the state.

Link: https://lore.kernel.org/r/20200902081122.745412-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 09:09:24 -03:00
Jason Gunthorpe
5cfbf9291e RDMA/cma: Remove cma_comp()
The only place that still uses it is rdma_join_multicast() which is only
doing a sanity check that the caller hasn't done something wrong and
doesn't need the spinlock.

At least in the case of rdma_join_multicast() the information it needs
will remain until the ID is destroyed once it enters these
states. Similarly there is no reason to check for these specific states in
the handler callback, instead use the usual check for a destroyed id under
the handler_mutex.

Link: https://lore.kernel.org/r/20200902081122.745412-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 09:09:24 -03:00
Jason Gunthorpe
d490ee52f0 RDMA/cma: Fix locking for the RDMA_CM_LISTEN state
There is a strange unlocked read of the ID state when checking for
reuseaddr. This is because an ID cannot be reusable once it becomes a
listening ID. Instead of using the state to exclude reuse, just clear it
as part of rdma_listen()'s flow to convert reusable into not reusable.

Once a ID goes to listen there is no way back out, and the only use of
reusable is on the bind_list check.

Finally, update the checks under handler_mutex to use READ_ONCE and audit
that once RDMA_CM_LISTEN is observed in a req callback it is stable under
the handler_mutex.

Link: https://lore.kernel.org/r/20200902081122.745412-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 09:09:24 -03:00
Jason Gunthorpe
732d41c545 RDMA/cma: Make the locking for automatic state transition more clear
Re-organize things so the state variable is not read unlocked. The first
attempt to go directly from ADDR_BOUND immediately tells us if the ID is
already bound, if we can't do that then the attempt inside
rdma_bind_addr() to go from IDLE to ADDR_BOUND confirms the ID needs
binding.

Link: https://lore.kernel.org/r/20200902081122.745412-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 09:09:23 -03:00
Jason Gunthorpe
2a7cec5381 RDMA/cma: Fix locking for the RDMA_CM_CONNECT state
It is currently a bit confusing, but the design is if the handler_mutex
is held, and the state is in RDMA_CM_CONNECT, then the state cannot leave
RDMA_CM_CONNECT without also serializing with the handler_mutex.

Make this clearer by adding a direct assertion, fixing the usage in
rdma_connect and generally using READ_ONCE to read the state value.

Link: https://lore.kernel.org/r/20200902081122.745412-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 09:09:23 -03:00
Liu Shixin
3cc30e8dfc RDMA/ipoib: Convert to use DEFINE_SEQ_ATTRIBUTE macro
Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code.

Link: https://lore.kernel.org/r/20200916025022.3992627-1-liushixin2@huawei.com
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-16 13:46:18 -03:00
Parav Pandit
1d7c995820 RDMA/i40iw: Avoid typecast from void to pci_dev
hw_context stores a pci_device pointer. Use the correct type instead of
void * and avoid typecasts.

Link: https://lore.kernel.org/r/20200914123528.270382-1-parav@mellanox.com
Signed-off-by: Parav Pandit <parav@nvidia.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-16 13:40:29 -03:00
Yuval Basson
06e8d1df46 RDMA/qedr: Add support for user mode XRC-SRQ's
Implement the XRC specific verbs.  The additional QP type introduced new
logic to the rest of the verbs that now require distinguishing whether a
QP has an "RQ" or an "SQ" or both.

Link: https://lore.kernel.org/r/20200722102339.30104-1-ybason@marvell.com
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Yuval Basson <ybason@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-16 08:33:31 -03:00
Jason Gunthorpe
4aa1615268 RDMA/core: Fix ordering of CQ pool destruction
rxe will hold a refcount on the IB device as long as CQ objects exist,
this causes destruction of a rxe device to hang if the CQ pool has any
cached CQs since they are being destroyed after the refcount must go to
zero.

Treat the CQ pool like a client and create/destroy it before/after all
other clients. No users of CQ pool can exist past a client remove call.

Link: https://lore.kernel.org/r/e8a240aa-9e9b-3dca-062f-9130b787f29b@acm.org
Fixes: c7ff819aef ("RDMA/core: Introduce shared CQ pool API")
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-14 15:20:18 -03:00
Linus Torvalds
b1df2a0783 RDMA second 5.9-rc pull request
A number of driver bug fixes and a few recent regressions:
 
 - Several bug fixes for bnxt_re. Crashing, incorrect data reported,
   and corruption on new HW
 
 - Memory leak and crash in rxe
 
 - Fix sysfs corruption in rxe if the netdev name is too long
 
 - Fix a crash on error unwind in the new cq_pool code
 
 - Fix kobject panics in rtrs by working device lifetime properly
 
 - Fix a data corruption bug in iser target related to misaligned buffers
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl9atAYACgkQOG33FX4g
 mxqSdw//Qi29dnxzVGpsaO4/krd/VmI6NT6eNpgK7Nqx80DaCYer0JhtwZOUxHqK
 KbHIV9XB/f6BSI67c9ydYj4PNX6FpFnoUWQLvqZwip5VM7R6ifIVjm0ap1jCAUSS
 axDLFZOySIOYNhcZ5I+MtY/kxykKBjMteuMXdpBe4FwZ+XSmsC5KkfRH/+FUhjVG
 peL6aRVDv9TByH8w+iZE1wSmVrOphOE1C/jN5TyotQTmKe7IHoXJtkalosYHXFWw
 KZaaz52e4IYKVFl4HIcl6+FfPExhxsyfDtRHluvn+vzY/wFy1RZw6F0BZt7mioy5
 J8R6w82xEe/SNugTGuvIzqXOymmy9H4CrG9pHy4NRMMzC28LGI7qHJgVhr/jZy8+
 GPxR26cywDhPsd4XA2K3mvs7DVSoBUPYlIUnHdYjfBZl/ghColg9+XGyNv6pdrke
 Q7Kog5blcpOAahBX+ElBLvIZXk5oEk5W+3H/M0OeuVMQ/DrMtALrCnwpp4wDKVvO
 9QuYfGgQ+25xbV9kwzckLGo5eedN3cRD/v4hcqvQUZo+9zLYZ/HZRMjpOdrscQ+I
 QL4FgpcLpOASKZ+bYjjpFxK3rNVTDT9CYJw4/hxEaOhxRhtAO1Q9mJdvJTK6dj09
 oR9LPyefQkyKCAt+heWHKKkEYDiwT8U1SlR8STotg24VHIj6Rb4=
 =2DDd
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "A number of driver bug fixes and a few recent regressions:

   - Several bug fixes for bnxt_re. Crashing, incorrect data reported,
     and corruption on new HW

   - Memory leak and crash in rxe

   - Fix sysfs corruption in rxe if the netdev name is too long

   - Fix a crash on error unwind in the new cq_pool code

   - Fix kobject panics in rtrs by working device lifetime properly

   - Fix a data corruption bug in iser target related to misaligned
     buffers"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  IB/isert: Fix unaligned immediate-data handling
  RDMA/rtrs-srv: Set .release function for rtrs srv device during device init
  RDMA/bnxt_re: Remove set but not used variable 'qplib_ctx'
  RDMA/core: Fix reported speed and width
  RDMA/core: Fix unsafe linked list traversal after failing to allocate CQ
  RDMA/bnxt_re: Remove the qp from list only if the qp destroy succeeds
  RDMA/bnxt_re: Fix driver crash on unaligned PSN entry address
  RDMA/bnxt_re: Restrict the max_gids to 256
  RDMA/bnxt_re: Static NQ depth allocation
  RDMA/bnxt_re: Fix the qp table indexing
  RDMA/bnxt_re: Do not report transparent vlan from QP1
  RDMA/mlx4: Read pkey table length instead of hardcoded value
  RDMA/rxe: Fix panic when calling kmem_cache_create()
  RDMA/rxe: Fix memleak in rxe_mem_init_user
  RDMA/rxe: Fix the parent sysfs read when the interface has 15 chars
  RDMA/rtrs-srv: Replace device_register with device_initialize and device_add
2020-09-11 10:02:36 -07:00
Michal Kalderon
9e054b13b2 RDMA/qedr: Fix function prototype parameters alignment
Alignment of parameters was off by one

Link: https://lore.kernel.org/r/20200902165741.8355-9-michal.kalderon@marvell.com
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-11 10:24:55 -03:00
Michal Kalderon
fbf58026b2 RDMA/qedr: Fix inline size returned for iWARP
commit 59e8970b37 ("RDMA/qedr: Return max inline data in QP query
result") changed query_qp max_inline size to return the max roce inline
size.  When iwarp was introduced, this should have been modified to return
the max inline size based on protocol.  This size is cached in the device
attributes

Fixes: 69ad0e7fe8 ("RDMA/qedr: Add support for iWARP in user space")
Link: https://lore.kernel.org/r/20200902165741.8355-8-michal.kalderon@marvell.com
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-11 10:24:55 -03:00
Michal Kalderon
cc293f5420 RDMA/qedr: Fix iWARP active mtu display
Currently iWARP does not support mtu-change. Notify user when MTU changes
that reload of qedr is required for mtu to change.  Display the correct
active mtu.

Fixes: f5b1b1775b ("RDMA/qedr: Add iWARP support in existing verbs")
Link: https://lore.kernel.org/r/20200902165741.8355-7-michal.kalderon@marvell.com
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-11 10:24:55 -03:00
Michal Kalderon
8a5a10a1a7 RDMA/qedr: Fix return code if accept is called on a destroyed qp
In iWARP, accept could be called after a QP is already destroyed.  In this
case an error should be returned and not success.

Fixes: 82af6d19d8 ("RDMA/qedr: Fix synchronization methods and memory leaks in qedr")
Link: https://lore.kernel.org/r/20200902165741.8355-5-michal.kalderon@marvell.com
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-11 10:24:54 -03:00
Michal Kalderon
a379ad54e5 RDMA/qedr: Fix use of uninitialized field
dev->attr.page_size_caps was used uninitialized when setting device
attributes

Fixes: ec72fce401 ("qedr: Add support for RoCE HW init")
Link: https://lore.kernel.org/r/20200902165741.8355-4-michal.kalderon@marvell.com
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-11 10:24:54 -03:00
Michal Kalderon
0b1eddc196 RDMA/qedr: Fix doorbell setting
Change the doorbell setting so that the maximum value between the last and
current value is set. This is to avoid doorbells being lost.

Fixes: a7efd7773e ("qedr: Add support for PD,PKEY and CQ verbs")
Link: https://lore.kernel.org/r/20200902165741.8355-3-michal.kalderon@marvell.com
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-11 10:24:54 -03:00
Michal Kalderon
098e345a1a RDMA/qedr: Fix qp structure memory leak
The qedr_qp structure wasn't freed when the protocol was RoCE.  kmemleak
output when running basic RoCE scenario.

unreferenced object 0xffff927ad7e22c00 (size 1024):
  comm "ib_send_bw", pid 7082, jiffies 4384133693 (age 274.698s)
  hex dump (first 32 bytes):
    00 b0 cd a2 79 92 ff ff 00 3f a1 a2 79 92 ff ff  ....y....?..y...
    00 ee 5c dd 80 92 ff ff 00 f6 5c dd 80 92 ff ff  ..\.......\.....
  backtrace:
    [<00000000b2ba0f35>] qedr_create_qp+0xb3/0x6c0 [qedr]
    [<00000000e85a43dd>] ib_uverbs_handler_UVERBS_METHOD_QP_CREATE+0x555/0xad0 [ib_uverbs]
    [<00000000fee4d029>] ib_uverbs_cmd_verbs+0xa5a/0xb80 [ib_uverbs]
    [<000000005d622660>] ib_uverbs_ioctl+0xa4/0x110 [ib_uverbs]
    [<00000000eb4cdc71>] ksys_ioctl+0x87/0xc0
    [<00000000abe6b23a>] __x64_sys_ioctl+0x16/0x20
    [<0000000046e7cef4>] do_syscall_64+0x4d/0x90
    [<00000000c6948f76>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 1212767e23 ("qedr: Add wrapping generic structure for qpidr and adjust idr routines.")
Link: https://lore.kernel.org/r/20200902165741.8355-2-michal.kalderon@marvell.com
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-11 10:24:54 -03:00
Jason Gunthorpe
1d4299ed77 RDMA/ocrdma: Remove fbo from MR
This is always the same value as IOVA masked by the page size, just use
that clearer calculation directly.

It is unclear of ocrdma hardware can actually support a true fbo, if so it
could use a different algorithm to compute the best page size.

Link: https://lore.kernel.org/r/17-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-11 10:24:54 -03:00
Jason Gunthorpe
b3003a7445 RDMA/qedr: Remove fbo and zbva from the MR
zbva is always false, so fbo is never read.

A 'zero-based-virtual-address' is simply IOVA == 0, and the driver already
supports this.

Link: https://lore.kernel.org/r/16-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-11 10:24:54 -03:00
Jason Gunthorpe
81655d3c4a RDMA/mlx4: Use ib_umem_num_dma_blocks()
For the calls linked to mlx4_ib_umem_calc_optimal_mtt_size() use
ib_umem_num_dma_blocks() inside the function, it is just some weird static
default.

All other places are just using it with PAGE_SIZE, switch to
ib_umem_num_dma_blocks().

As this is the last call site, remove ib_umem_num_count().

Link: https://lore.kernel.org/r/15-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-11 10:24:54 -03:00
Jason Gunthorpe
87aebd3f8c RDMA/pvrdma: Use ib_umem_num_dma_blocks() instead of ib_umem_page_count()
This driver always uses PAGE_SIZE.

Link: https://lore.kernel.org/r/14-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-11 10:24:54 -03:00
Jason Gunthorpe
b8387f8189 RDMA/ocrdma: Use ib_umem_num_dma_blocks() instead of ib_umem_page_count()
This driver always uses a DMA array made up of PAGE_SIZE elements, so just
use ib_umem_num_dma_blocks().

Since rdma_for_each_dma_block() always iterates exactly
ib_umem_num_dma_blocks() there is no need for the early exit check in
build_user_pbes(), delete it.

Link: https://lore.kernel.org/r/13-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-11 10:24:54 -03:00
Jason Gunthorpe
cf9ce3c8ab RDMA/hns: Use ib_umem_num_dma_blocks() instead of opencoding
mtr_umem_page_count() does the same thing, replace it with the core code.

Also, ib_umem_find_best_pgsz() should always be called to check that the
umem meets the page_size requirement. If there is a limited set of
page_sizes that work it the pgsz_bitmap should be set to that set. 0 is a
failure and the umem cannot be used.

Lightly tidy the control flow to implement this flow properly.

Link: https://lore.kernel.org/r/12-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Acked-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-11 10:24:54 -03:00
Jason Gunthorpe
84e71b4d9b RDMA/bnxt: Do not use ib_umem_page_count() or ib_umem_num_pages()
ib_umem_page_count() returns the number of 4k entries required for a DMA
map, but bnxt_re already computes a variable page size. The correct API to
determine the size of the page table array is ib_umem_num_dma_blocks().

Fix the overallocation of the page array in fill_umem_pbl_tbl() when
working with larger page sizes by using the right function. Lightly
re-organize this function to make it clearer.

Replace the other calls to ib_umem_num_pages().

Fixes: d85582517e ("RDMA/bnxt_re: Use core helpers to get aligned DMA address")
Link: https://lore.kernel.org/r/11-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-11 10:24:54 -03:00
Jason Gunthorpe
901bca71cd RDMA/qedr: Use ib_umem_num_dma_blocks() instead of ib_umem_page_count()
The length of the list populated by qedr_populate_pbls() should be
calculated using ib_umem_num_dma_blocks() with the same size/shift passed
to qedr_populate_pbls().

Link: https://lore.kernel.org/r/10-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-11 10:24:53 -03:00
Jason Gunthorpe
68363052ff RDMA/qedr: Use rdma_umem_for_each_dma_block() instead of open-coding
This loop is splitting the DMA SGL into pg_shift sized pages, use the core
code for this directly.

Link: https://lore.kernel.org/r/9-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-11 10:24:53 -03:00
Jason Gunthorpe
22123a0e49 RDMA/i40iw: Use ib_umem_num_dma_pages()
If ib_umem_find_best_pgsz() returns > PAGE_SIZE then the equation here is
not correct. 'start' should be 'virt'. Change it to use the core code for
page_num and the canonical calculation of page_shift.

Fixes: eb52c0333f ("RDMA/i40iw: Use core helpers to get aligned DMA address within a supported page size")
Link: https://lore.kernel.org/r/8-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-11 10:24:53 -03:00
Jason Gunthorpe
1f9b6827c8 RDMA/efa: Use ib_umem_num_dma_pages()
If ib_umem_find_best_pgsz() returns > PAGE_SIZE then the equation here is
not correct. 'start' should be 'virt'. Change it to use the core code for
page_num and the canonical calculation of page_shift.

Fixes: 40ddb3f020 ("RDMA/efa: Use API to get contiguous memory blocks aligned to device supported page size")
Link: https://lore.kernel.org/r/7-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Tested-by: Gal Pressman <galpress@amazon.com>
Acked-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-11 10:24:53 -03:00
Jason Gunthorpe
a665aca89a RDMA/umem: Split ib_umem_num_pages() into ib_umem_num_dma_blocks()
ib_umem_num_pages() should only be used by things working with the SGL in
CPU pages directly.

Drivers building DMA lists should use the new ib_num_dma_blocks() which
returns the number of blocks rdma_umem_for_each_block() will return.

To make this general for DMA drivers requires a different implementation.
Computing DMA block count based on umem->address only works if the
requested page size is < PAGE_SIZE and/or the IOVA == umem->address.

Instead the number of DMA pages should be computed in the IOVA address
space, not umem->address. Thus the IOVA has to be stored inside the umem
so it can be used for these calculations.

For now set it to umem->address by default and fix it up if
ib_umem_find_best_pgsz() was called. This allows drivers to be converted
to ib_umem_num_dma_blocks() safely.

Link: https://lore.kernel.org/r/6-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-11 10:24:53 -03:00
Jason Gunthorpe
89603f7e7e RDMA/umem: Replace for_each_sg_dma_page with rdma_umem_for_each_dma_block
Generally drivers should be using this core helper to split up the umem
into DMA pages.

These drivers are all probably wrong in some way to pass PAGE_SIZE in as
the HW page size. Either the driver doesn't support other page sizes and
it should use 4096, or the driver does support other page sizes and should
use ib_umem_find_best_pgsz() to select the best HW pages size of the HW
supported set.

The only case it could be correct is if the HW has a global setting for
PAGE_SIZE set at driver initialization time.

Link: https://lore.kernel.org/r/5-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 15:33:17 -03:00
Jason Gunthorpe
ebc24096c4 RDMA/umem: Add rdma_umem_for_each_dma_block()
This helper does the same as rdma_for_each_block(), except it works on a
umem. This simplifies most of the call sites.

Link: https://lore.kernel.org/r/4-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 15:33:17 -03:00
Jason Gunthorpe
3361c29e92 RDMA/umem: Use simpler logic for ib_umem_find_best_pgsz()
The calculation in rdma_find_pg_bit() is fairly complicated, and the
function is never called anywhere else. Inline a simpler version into
ib_umem_find_best_pgsz()

Link: https://lore.kernel.org/r/3-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 15:33:17 -03:00
Jason Gunthorpe
10c75ccb54 RDMA/umem: Prevent small pages from being returned by ib_umem_find_best_pgsz()
rdma_for_each_block() makes assumptions about how the SGL is constructed
that don't work if the block size is below the page size used to to build
the SGL.

The rules for umem SGL construction require that the SG's all be PAGE_SIZE
aligned and we don't encode the actual byte offset of the VA range inside
the SGL using offset and length. So rdma_for_each_block() has no idea
where the actual starting/ending point is to compute the first/last block
boundary if the starting address should be within a SGL.

Fixing the SGL construction turns out to be really hard, and will be the
subject of other patches. For now block smaller pages.

Fixes: 4a35339958 ("RDMA/umem: Add API to find best driver supported page size in an MR")
Link: https://lore.kernel.org/r/2-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 15:33:17 -03:00
Jason Gunthorpe
a40c20dabd RDMA/umem: Fix ib_umem_find_best_pgsz() for mappings that cross a page boundary
It is possible for a single SGL to span an aligned boundary, eg if the SGL
is

  61440 -> 90112

Then the length is 28672, which currently limits the block size to
32k. With a 32k page size the two covering blocks will be:

  32768->65536 and 65536->98304

However, the correct answer is a 128K block size which will span the whole
28672 bytes in a single block.

Instead of limiting based on length figure out which high IOVA bits don't
change between the start and end addresses. That is the highest useful
page size.

Fixes: 4a35339958 ("RDMA/umem: Add API to find best driver supported page size in an MR")
Link: https://lore.kernel.org/r/1-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 15:33:17 -03:00
Leon Romanovsky
71ff3f6268 RDMA: Make counters destroy symmetrical
Change counters to return failure like any other verbs destroy, however
this flow shouldn't return error at all.

Link: https://lore.kernel.org/r/20200907120921.476363-10-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 14:14:29 -03:00
Leon Romanovsky
add53535fb RDMA: Restore ability to return error for destroy WQ
Make this interface symmetrical to other destroy paths.

Fixes: a49b1dc7ae ("RDMA: Convert destroy_wq to be void")
Link: https://lore.kernel.org/r/20200907120921.476363-9-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 14:14:29 -03:00
Leon Romanovsky
d0c45c8556 RDMA: Change XRCD destroy return value
Update XRCD destroy flow to allow command failure.

Fixes: 28ad5f65c3 ("RDMA: Move XRCD to be under ib_core responsibility")
Link: https://lore.kernel.org/r/20200907120921.476363-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 14:14:29 -03:00
Leon Romanovsky
43d781b9fa RDMA: Allow fail of destroy CQ
Like any other verbs objects, CQ shouldn't fail during destroy, but
mlx5_ib didn't follow this contract with mixed IB verbs objects with
DEVX. Such mix causes to the situation where FW and kernel are fully
interdependent on the reference counting of each side.

Kernel verbs and drivers that don't have DEVX flows shouldn't fail.

Fixes: e39afe3d6d ("RDMA: Convert CQ allocations to be under core responsibility")
Link: https://lore.kernel.org/r/20200907120921.476363-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 14:14:29 -03:00
Leon Romanovsky
7e3c66c9a9 RDMA/core: Delete function indirection for alloc/free kernel CQ
The ib_alloc_cq*() and ib_free_cq*() are solely kernel verbs to manage CQs
and doesn't need extra indirection just to call same functions with
constant parameter NULL as udata.

Link: https://lore.kernel.org/r/20200907120921.476363-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 14:14:28 -03:00
Leon Romanovsky
119181d1d4 RDMA: Restore ability to fail on SRQ destroy
In similar way to other IB objects, restore the ability to return error on
SRQ destroy. Strictly speaking, this change is not necessary, and provided
here to ensure a symmetrical interface like other destroy functions.

Fixes: 68e326dea1 ("RDMA: Handle SRQ allocations by IB/core")
Link: https://lore.kernel.org/r/20200907120921.476363-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 14:14:24 -03:00
Leon Romanovsky
fd89099d63 RDMA/mlx5: Issue FW command to destroy SRQ on reentry
The HW release can fail and leave the system in limbo state, where SRQ is
removed from the table, but can't be destroyed later.  In every reentry,
the initial xa_erase_irq() check will fail.

Rewrite the erase logic to keep index, but don't store the entry
itself. By doing it, we can safely reinsert entry back in the case of
destroy failure.

Link: https://lore.kernel.org/r/20200907120921.476363-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 14:04:13 -03:00
Leon Romanovsky
9a9ebf8cd7 RDMA: Restore ability to fail on AH destroy
Like any other IB verbs objects, AH are refcounted by ib_core. The release
of those objects are controlled by ib_core with promise that AH destroy
can't fail.

Being SW object for now, this change makes dealloc_ah() to behave like any
other destroy IB flows.

Fixes: d345691471 ("RDMA: Handle AH allocations by IB/core")
Link: https://lore.kernel.org/r/20200907120921.476363-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 13:57:22 -03:00
Leon Romanovsky
91a7c58fce RDMA: Restore ability to fail on PD deallocate
The IB verbs objects are counted by the kernel and ib_core ensures that
deallocate PD will success so it will be called once all other objects
that depends on PD will be released. This is achieved by managing various
reference counters on such objects.

The mlx5 driver didn't follow this standard flow when allowed DEVX objects
that are not managed by ib_core to be interleaved with the ones under
ib_core responsibility.

In such interleaved scenarios deallocate command can fail and ib_core will
leave uobject in internal DB and attempt to clean it later to free
resources anyway.

This change partially restores returned value from dealloc_pd() for all
drivers, but keeping in mind that non-DEVX devices and kernel verbs paths
shouldn't fail.

Fixes: 21a428a019 ("RDMA: Handle PD allocations by IB/core")
Link: https://lore.kernel.org/r/20200907120921.476363-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 13:57:22 -03:00
Sagi Grimberg
0b089c1ef7 IB/isert: Fix unaligned immediate-data handling
Currently we allocate rx buffers in a single contiguous buffers for
headers (iser and iscsi) and data trailer. This means that most likely the
data starting offset is aligned to 76 bytes (size of both headers).

This worked fine for years, but at some point this broke, resulting in
data corruptions in isert when a command comes with immediate data and the
underlying backend device assumes 512 bytes buffer alignment.

We assume a hard-requirement for all direct I/O buffers to be 512 bytes
aligned. To fix this, we should avoid passing unaligned buffers for I/O.

Instead, we allocate our recv buffers with some extra space such that we
can have the data portion align to 512 byte boundary. This also means that
we cannot reference headers or data using structure but rather
accessors (as they may move based on alignment). Also, get rid of the
wrong __packed annotation from iser_rx_desc as this has only harmful
effects (not aligned to anything).

This affects the rx descriptors for iscsi login and data plane.

Fixes: 3d75ca0ade ("block: introduce multi-page bvec helpers")
Link: https://lore.kernel.org/r/20200904195039.31687-1-sagi@grimberg.me
Reported-by: Stephen Rust <srust@blockbridge.com>
Tested-by: Doug Dumitru <doug@dumitru.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 13:46:03 -03:00
Md Haris Iqbal
558d52b297 RDMA/rtrs-srv: Incorporate ib_register_client into rtrs server init
The rnbd_server module's communication manager (cm) initialization depends
on the registration of the "network namespace subsystem" of the RDMA CM
agent module. As such, when the kernel is configured to load the
rnbd_server and the RDMA cma module during initialization; and if the
rnbd_server module is initialized before RDMA cma module, a null ptr
dereference occurs during the RDMA bind operation.

Call trace:

  Call Trace:
   ? xas_load+0xd/0x80
   xa_load+0x47/0x80
   cma_ps_find+0x44/0x70
   rdma_bind_addr+0x782/0x8b0
   ? get_random_bytes+0x35/0x40
   rtrs_srv_cm_init+0x50/0x80
   rtrs_srv_open+0x102/0x180
   ? rnbd_client_init+0x6e/0x6e
   rnbd_srv_init_module+0x34/0x84
   ? rnbd_client_init+0x6e/0x6e
   do_one_initcall+0x4a/0x200
   kernel_init_freeable+0x1f1/0x26e
   ? rest_init+0xb0/0xb0
   kernel_init+0xe/0x100
   ret_from_fork+0x22/0x30
  Modules linked in:
  CR2: 0000000000000015

All this happens cause the cm init is in the call chain of the module
init, which is not a preferred practice.

So remove the call to rdma_create_id() from the module init call chain.
Instead register rtrs-srv as an ib client, which makes sure that the
rdma_create_id() is called only when an ib device is added.

Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20200907103106.104530-1-haris.iqbal@cloud.ionos.com
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Reviewed-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 13:31:08 -03:00
Md Haris Iqbal
39c2d639ca RDMA/rtrs-srv: Set .release function for rtrs srv device during device init
The device .release function was not being set during the device
initialization. This was leading to the below warning, in error cases when
put_srv was called before device_add was called.

Warning:

Device '(null)' does not have a release() function, it is broken and must
be fixed. See Documentation/kobject.txt.

So, set the device .release function during device initialization in the
__alloc_srv() function.

Fixes: baa5b28b7a ("RDMA/rtrs-srv: Replace device_register with device_initialize and device_add")
Link: https://lore.kernel.org/r/20200907102216.104041-1-haris.iqbal@cloud.ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 13:28:14 -03:00
Lijun Ou
a2f3d4479f RDMA/hns: Avoid unncessary initialization
Some variables have been initialized when used. As a result, here removes
some unncessary initial assignment.

Link: https://lore.kernel.org/r/1599547944-30671-1-git-send-email-oulijun@huawei.com
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 13:23:15 -03:00
YueHaibing
9e712446a8 RDMA/bnxt_re: Remove set but not used variable 'qplib_ctx'
drivers/infiniband/hw/bnxt_re/main.c:1012:25:
 warning: variable ‘qplib_ctx’ set but not used [-Wunused-but-set-variable]

Fixes: f86b31c6a2 ("RDMA/bnxt_re: Static NQ depth allocation")
Link: https://lore.kernel.org/r/20200905121624.32776-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 13:20:48 -03:00
Jason Gunthorpe
f553246f7f RDMA/core: Change how failing destroy is handled during uobj abort
Currently it triggers a WARN_ON and then goes ahead and destroys the
uobject anyhow, leaking any driver memory.

The only place that leaks driver memory should be during FD close() in
uverbs_destroy_ufile_hw().

Drivers are only allowed to fail destroy uobjects if they guarantee
destroy will eventually succeed. uverbs_destroy_ufile_hw() provides the
loop to give the driver that chance.

Link: https://lore.kernel.org/r/20200902081708.746631-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 13:16:48 -03:00
Allen Pais
00b3c11879 RDMA/rxe: Convert tasklets to use new tasklet_setup() API
In preparation for unconditionally passing the struct tasklet_struct
pointer to all tasklet callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.

Link: https://lore.kernel.org/r/20200903060637.424458-6-allen.lkml@gmail.com
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-03 12:01:53 -03:00
Allen Pais
a23afb448b RDMA/qib: Convert tasklets to use new tasklet_setup() API
In preparation for unconditionally passing the struct tasklet_struct
pointer to all tasklet callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.

Link: https://lore.kernel.org/r/20200903060637.424458-5-allen.lkml@gmail.com
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-03 12:01:53 -03:00
Allen Pais
4e95f84999 RDMA/i40iw: Convert tasklets to use new tasklet_setup() API
In preparation for unconditionally passing the struct tasklet_struct
pointer to all tasklet callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.

Link: https://lore.kernel.org/r/20200903060637.424458-4-allen.lkml@gmail.com
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-03 12:01:52 -03:00
Allen Pais
55db47d082 RDMA/hfi1: Convert tasklets to use new tasklet_setup() API
In preparation for unconditionally passing the struct tasklet_struct
pointer to all tasklet callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.

Link: https://lore.kernel.org/r/20200903060637.424458-3-allen.lkml@gmail.com
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-03 12:01:52 -03:00
Allen Pais
53c2a706ae RDMA/bnxt_re: Convert tasklets to use new tasklet_setup() API
In preparation for unconditionally passing the struct tasklet_struct
pointer to all tasklet callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.

Link: https://lore.kernel.org/r/20200903060637.424458-2-allen.lkml@gmail.com
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-03 12:01:52 -03:00
Leon Romanovsky
4b916ed9f9 RDMA/mlx5: Fix potential race between destroy and CQE poll
The SRQ can be destroyed right before mlx5_cmd_get_srq is called.
In such case the latter will return NULL instead of expected SRQ.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/20200830084010.102381-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-03 10:12:57 -03:00
Alex Dewar
4f680cb9f1 RDMA/ucma: Fix resource leak on error path
In ucma_process_join(), if the call to xa_alloc() fails, the function will
return without freeing mc. Fix this by jumping to the correct line.

In the process I renamed the jump labels to something more memorable for
extra clarity.

Link: https://lore.kernel.org/r/20200902162454.332828-1-alex.dewar90@gmail.com
Addresses-Coverity-ID: 1496814 ("Resource leak")
Fixes: 95fe51096b ("RDMA/ucma: Remove mc_list and rely on xarray")
Signed-off-by: Alex Dewar <alex.dewar90@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-02 17:06:48 -03:00
Kamal Heib
28b0865714 RDMA/core: Fix reported speed and width
When the returned speed from __ethtool_get_link_ksettings() is
SPEED_UNKNOWN this will lead to reporting a wrong speed and width for
providers that uses ib_get_eth_speed(), fix that by defaulting the
netdev_speed to SPEED_1000 in case the returned value from
__ethtool_get_link_ksettings() is SPEED_UNKNOWN.

Fixes: d41861942f ("IB/core: Add generic function to extract IB speed from netdev")
Link: https://lore.kernel.org/r/20200902124304.170912-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-02 16:58:54 -03:00
Xi Wang
8aa64be019 RDMA/core: Fix unsafe linked list traversal after failing to allocate CQ
It's not safe to access the next CQ in list_for_each_entry() after
invoking ib_free_cq(), because the CQ has already been freed in current
iteration.  It should be replaced by list_for_each_entry_safe().

Fixes: c7ff819aef ("RDMA/core: Introduce shared CQ pool API")
Link: https://lore.kernel.org/r/1598963935-32335-1-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-02 15:56:40 -03:00
Kamal Heib
7d11b4787d RDMA/qedr: Fix reported max_pkeys
As qedr driver supports both RoCE and iWarp, make sure to set the
max_pkeys only when running in RoCE mode.

Link: https://lore.kernel.org/r/20200827141655.406185-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-02 15:42:36 -03:00
Alex Dewar
524d8ffd07 RDMA/qib: Tidy up process_cc()
This function has a lot of gotos which could be replaced by simple
returns, making the function tidier and less bug prone.

Link: https://lore.kernel.org/r/20200825171242.448447-2-alex.dewar90@gmail.com
Signed-off-by: Alex Dewar <alex.dewar90@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31 13:21:06 -03:00
Alex Dewar
d2598bb809 RDMA/qib: Remove superfluous fallthrough statements
Commit 36a8f01cd2 ("IB/qib: Add congestion control agent
implementation") erroneously marked a couple of switch cases as /*
FALLTHROUGH */, which were later converted to fallthrough statements by
commit df561f6688 ("treewide: Use fallthrough pseudo-keyword"). This
triggered a Coverity warning about unreachable code.

Remove the fallthrough statements.

Link: https://lore.kernel.org/r/20200825171242.448447-1-alex.dewar90@gmail.com
Addresses-Coverity: ("Unreachable code")
Fixes: 36a8f01cd2 ("IB/qib: Add congestion control agent implementation")
Signed-off-by: Alex Dewar <alex.dewar90@gmail.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31 13:21:06 -03:00
Jason Gunthorpe
6989aa62d3 Linux 5.9-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl9ML+IeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGA8EIAIy/kTbFS0yrE9yV
 hb98oX0z9+EU9YQg9vhaRWwPd+rJF/JMQZLqYcwbhjG9abaUL3T3fEcSAefMHw8E
 LAt+hYzA38dHt7tqhsFQX3vV1VorvDVICBVN0yRPRWKKikq4OPIHzaAR9tleGAF5
 8btQisl1PjN+obwYmLuNb6aX16OCwAF+uXOwehcoJs9dvMNhwtXRzfOflWzOvOo6
 tE0bHErlylLDfLv4ZzEfczTdks4QJZ7C0xLSf3oN9AAynW42Xnhct4hi8qZY/hCf
 CMaqeN4hdpub6TvQIqBdDqMMjEXGFgeNSnAEBQY9VpvUqz8NTu6sQxwgJEKDF5tg
 d81lv2c=
 =uW/F
 -----END PGP SIGNATURE-----

Merge tag 'v5.9-rc3' into rdma.git for-next

Required due to dependencies in following patches.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31 12:28:12 -03:00
Bob Pearson
7672dac304 RDMA/rxe: Address an issue with hardened user copy
Change rxe pools to use kzalloc instead of kmem_cache to allocate memory
for rxe objects. The pools are not really necessary and they trigger
hardened user copy warnings as the ioctl framework copies the QP number
directly to userspace.

Also the general project to move object alloation to the core code will
eventually clean these out anyhow.

Link: https://lore.kernel.org/r/20200827163535.2632-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31 12:21:16 -03:00
Bob Pearson
63fa15dbd4 RDMA/rxe: Add SPDX hdrs to rxe source files
Add SPDX headers to all rxe .c and .h files.

Link: https://lore.kernel.org/r/20200827145439.2273-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31 12:20:02 -03:00
Jason Gunthorpe
5d985d724b RDMA/core: Trigger a WARN_ON if the driver causes uobjects to become leaked
Drivers that fail destroy can cause uverbs to leak uobjects. Drivers are
required to always eventually destroy their ubojects, so trigger a WARN_ON
to detect this driver bug.

Link: https://lore.kernel.org/r/0-v1-b1e0ed400ba9+f7-warn_destroy_ufile_hw_jgg@nvidia.com
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31 12:10:11 -03:00
Weihang Li
074bf2c2c7 RDMA/hns: Get udp sport num dynamically instead of using a fixed value
The UDP source port number in RoCE v2 is used to create entropy for
network routers (ECMP), load balancers and 802.3ad link aggregation
switching that are not aware of RoCE IB headers. Considering that the IB
core has achieved a new interface to get a hashed value of it, the fixed
value of it in QPC and UD WQE in hns driver could be fixed and the port
number is to be set dynamically now.

For QPC of RC, the value could be hashed from flow_lable if the user pass
it in or from remote qpn and local qpn. For WQE of UD, it is set according
to fl or as a random value.

Link: https://lore.kernel.org/r/1598002289-8611-1-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31 12:03:17 -03:00
Bob Pearson
5f9e2822d1 RDMA/rxe: Fix style warnings
Fixed several minor checkpatch warnings in existing rxe source.

Link: https://lore.kernel.org/r/20200820224638.3212-3-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 09:51:08 -03:00
Lang Cheng
e0ef0f68c4 RDMA/hns: Add a check for current state before modifying QP
It should be considered an illegal operation if the ULP attempts to modify
a QP from another state to the current hardware state. Otherwise, the ULP
can modify some fields of QPC at any time. For example, for a QP in state
of RTS, modify it from RTR to RTS can change the PSN, which is always not
as expected.

Fixes: 9a4435375c ("IB/hns: Add driver files for hns RoCE driver")
Link: https://lore.kernel.org/r/1598353674-24270-1-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 09:46:07 -03:00
Selvin Xavier
097a9d23b7 RDMA/bnxt_re: Remove the qp from list only if the qp destroy succeeds
Driver crashes when destroy_qp is re-tried because of an error
returned. This is because the qp entry was removed from the qp list during
the first call.

Remove qp from the list only if destroy_qp returns success.

The driver will still trigger a WARN_ON due to the memory leaking, but at
least it isn't corrupting memory too.

Fixes: 8dae419f9e ("RDMA/bnxt_re: Refactor queue pair creation code")
Link: https://lore.kernel.org/r/1598292876-26529-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 09:30:44 -03:00
Naresh Kumar PBS
934d0ac9a6 RDMA/bnxt_re: Fix driver crash on unaligned PSN entry address
When computing the first psn entry, driver checks for page alignment. If
this address is not page aligned,it attempts to compute the offset in that
page for later use by using ALIGN macro. ALIGN macro does not return
offset bytes but the requested aligned address and hence cannot be used
directly to store as offset.  Since driver was using the address itself
instead of offset, it resulted in invalid address when filling the psn
buffer.

Fixed driver to use PAGE_MASK macro to calculate the offset.

Fixes: fddcbbb02a ("RDMA/bnxt_re: Simplify obtaining queue entry from hw ring")
Link: https://lore.kernel.org/r/1598292876-26529-7-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 09:30:44 -03:00
Naresh Kumar PBS
847b97887e RDMA/bnxt_re: Restrict the max_gids to 256
Some adapters report more than 256 gid entries. Restrict it to 256 for
now.

Fixes: 1ac5a4047975("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Link: https://lore.kernel.org/r/1598292876-26529-6-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 09:30:44 -03:00
Naresh Kumar PBS
f86b31c6a2 RDMA/bnxt_re: Static NQ depth allocation
At first, driver allocates memory for NQ based on qplib_ctx->cq_count and
qplib_ctx->srqc_count.  Later when creating ring, it uses a static value
of 128K -1.

Fixing this with a static value for now.

Fixes: b08fe048a6 ("RDMA/bnxt_re: Refactor net ring allocation function")
Link: https://lore.kernel.org/r/1598292876-26529-5-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 09:30:43 -03:00
Selvin Xavier
84cf229f40 RDMA/bnxt_re: Fix the qp table indexing
qp->id can be a value outside the max number of qp. Indexing the qp table
with the id can cause out of bounds crash. So changing the qp table
indexing by (qp->id % max_qp -1).

Allocating one extra entry for QP1. Some adapters create one more than the
max_qp requested to accommodate QP1.  If the qp->id is 1, store the
inforamtion in the last entry of the qp table.

Fixes: f218d67ef0 ("RDMA/bnxt_re: Allow posting when QPs are in error")
Link: https://lore.kernel.org/r/1598292876-26529-4-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 09:30:43 -03:00
Selvin Xavier
2d0e60ee32 RDMA/bnxt_re: Do not report transparent vlan from QP1
QP1 Rx CQE reports transparent VLAN ID in the completion and this is used
while reporting the completion for received MAD packet. Check if the vlan
id is configured before reporting it in the work completion.

Fixes: 84511455ac ("RDMA/bnxt_re: report vlan_id and sl in qp1 recv completion")
Link: https://lore.kernel.org/r/1598292876-26529-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 09:30:43 -03:00
Mark Bloch
ec78b3bd66 RDMA/mlx4: Read pkey table length instead of hardcoded value
If the pkey_table is not available (which is the case when RoCE is not
supported), the cited commit caused a regression where mlx4_devices
without RoCE are not created.

Fix this by returning a pkey table length of zero in procedure
eth_link_query_port() if the pkey-table length reported by the device is
zero.

Link: https://lore.kernel.org/r/20200824110229.1094376-1-leon@kernel.org
Cc: <stable@vger.kernel.org>
Fixes: 1901b91f99 ("IB/core: Fix potential NULL pointer dereference in pkey cache")
Fixes: fa417f7b52 ("IB/mlx4: Add support for IBoE")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 09:17:05 -03:00
Kamal Heib
d862060a4b RDMA/rxe: Fix panic when calling kmem_cache_create()
To avoid the following kernel panic when calling kmem_cache_create() with
a NULL pointer from pool_cache(), Block the rxe_param_set_add() from
running if the rdma_rxe module is not initialized.

 BUG: unable to handle kernel NULL pointer dereference at 000000000000000b
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP NOPTI
 CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018
 RIP: 0010:kmem_cache_alloc+0xd1/0x1b0
 Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6
 RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005
 RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000
 RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0
 R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0
 R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000
 FS:  00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0
 Call Trace:
  rxe_alloc+0xc8/0x160 [rdma_rxe]
  rxe_get_dma_mr+0x25/0xb0 [rdma_rxe]
  __ib_alloc_pd+0xcb/0x160 [ib_core]
  ib_mad_init_device+0x296/0x8b0 [ib_core]
  add_client_context+0x11a/0x160 [ib_core]
  enable_device_and_get+0xdc/0x1d0 [ib_core]
  ib_register_device+0x572/0x6b0 [ib_core]
  ? crypto_create_tfm+0x32/0xe0
  ? crypto_create_tfm+0x7a/0xe0
  ? crypto_alloc_tfm+0x58/0xf0
  rxe_register_device+0x19d/0x1c0 [rdma_rxe]
  rxe_net_add+0x3d/0x70 [rdma_rxe]
  ? dev_get_by_name_rcu+0x73/0x90
  rxe_param_set_add+0xaf/0xc0 [rdma_rxe]
  parse_args+0x179/0x370
  ? ref_module+0x1b0/0x1b0
  load_module+0x135e/0x17e0
  ? ref_module+0x1b0/0x1b0
  ? __do_sys_init_module+0x13b/0x180
  __do_sys_init_module+0x13b/0x180
  do_syscall_64+0x5b/0x1a0
  entry_SYSCALL_64_after_hwframe+0x65/0xca
 RIP: 0033:0x7f9137ed296e

This can be triggered if a user tries to use the 'module option' which is
not actually a real module option but some idiotic (and thankfully no
obsolete) sysfs interface.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200825151725.254046-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 09:14:41 -03:00
Kamal Heib
b9caebb290 RDMA/usnic: Remove the query_pkey callback
Now that the query_pkey() isn't mandatory by the RDMA core, this callback
can be removed from the usnic provider. The libfabric userspace never
touches the pkey.

Link: https://lore.kernel.org/r/20200820125346.111902-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 09:02:42 -03:00
Dinghao Liu
e3ddd6067e RDMA/rxe: Fix memleak in rxe_mem_init_user
When page_address() fails, umem should be freed just like when
rxe_mem_alloc() fails.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200819075632.22285-1-dinghao.liu@zju.edu.cn
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:45:59 -03:00
Jason Gunthorpe
657360d6c7 RDMA/ucma: Remove closing and the close_wq
Use cancel_work_sync() to ensure that the wq is not running and simply
assign NULL to ctx->cm_id to indicate if the work ran or not. Delete the
close_wq since flush_workqueue() is no longer needed.

Link: https://lore.kernel.org/r/20200818120526.702120-15-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:16 -03:00
Jason Gunthorpe
a1d33b70db RDMA/ucma: Rework how new connections are passed through event delivery
When a new connection is established the RDMA CM creates a new cm_id and
passes it through to the event handler. However inside the UCMA the new ID
is not assigned a ucma_context until the user retrieves the event from a
syscall.

This creates a weird edge condition where a cm_id's context can continue
to point at the listening_id that created it, and a number of additional
edge conditions on event list clean up related to destroying half created
IDs.

There is also a race condition in ucma_get_events() where the
cm_id->context is being assigned without holding the handler_mutex.

Simplify all of this by creating the ucma_context inside the event handler
itself and eliminating the edge case of a half created cm_id. All cm_id's
can be uniformly destroyed via __destroy_id() or via the close_work.

Link: https://lore.kernel.org/r/20200818120526.702120-14-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:16 -03:00
Jason Gunthorpe
310ca1a7dc RDMA/ucma: Narrow file->mut in ucma_event_handler()
Since the backlog is now an atomic the file->mut is now only protecting
the event_list and ctx_list. Narrow its scope to make it clear

Link: https://lore.kernel.org/r/20200818120526.702120-13-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:16 -03:00
Jason Gunthorpe
26c15dec49 RDMA/ucma: Change backlog into an atomic
There is no reason to grab the file->mut just to do this inc/dec work. Use
an atomic.

Link: https://lore.kernel.org/r/20200818120526.702120-12-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:16 -03:00
Jason Gunthorpe
38e03d0926 RDMA/ucma: Add missing locking around rdma_leave_multicast()
All entry points to the rdma_cm from a ULP must be single threaded,
even this error unwinds. Add the missing locking.

Fixes: 7c11910783 ("RDMA/ucma: Put a lock around every call to the rdma_cm layer")
Link: https://lore.kernel.org/r/20200818120526.702120-11-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:15 -03:00
Jason Gunthorpe
98837c6c3d RDMA/ucma: Fix locking for ctx->events_reported
This value is locked under the file->mut, ensure it is held whenever
touching it.

The case in ucma_migrate_id() is a race, while in ucma_free_uctx() it is
already not possible for the write side to run, the movement is just for
clarity.

Fixes: 88314e4dda ("RDMA/cma: add support for rdma_migrate_id()")
Link: https://lore.kernel.org/r/20200818120526.702120-10-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:15 -03:00
Jason Gunthorpe
09e328e47a RDMA/ucma: Fix the locking of ctx->file
ctx->file is changed under the file->mut lock by ucma_migrate_id(), which
is impossible to lock correctly. Instead change ctx->file under the
handler_lock and ctx_table lock and revise all places touching ctx->file
to use this locking when reading ctx->file.

Link: https://lore.kernel.org/r/20200818120526.702120-9-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:15 -03:00
Jason Gunthorpe
308571debc RDMA/ucma: Do not use file->mut to lock destroying
The only reader of destroying is inside a handler under the handler_mutex,
so directly use the handler_mutex when setting it instead of the larger
file->mut.

As the refcount could be zero here, and the cm_id already freed, and
additional refcount grab around the locking is required to touch the
cm_id.

Link: https://lore.kernel.org/r/20200818120526.702120-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:15 -03:00
Jason Gunthorpe
d114c6feed RDMA/cma: Add missing locking to rdma_accept()
In almost all cases rdma_accept() is called under the handler_mutex by
ULPs from their handler callbacks. The one exception was ucma which did
not get the handler_mutex.

To improve the understand-ability of the locking scheme obtain the mutex
for ucma as well.

This improves how ucma works by allowing it to directly use handler_mutex
for some of its internal locking against the handler callbacks intead of
the global file->mut lock.

There does not seem to be a serious bug here, other than a DISCONNECT event
can be delivered concurrently with accept succeeding.

Link: https://lore.kernel.org/r/20200818120526.702120-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:15 -03:00
Jason Gunthorpe
95fe51096b RDMA/ucma: Remove mc_list and rely on xarray
It is not really necessary to keep a linked list of mcs associated with
each context when we can just scan the xarray to find the right things.

The removes another overloading of file->mut by relying on the xarray
locking for mc instead.

Link: https://lore.kernel.org/r/20200818120526.702120-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:14 -03:00
Jason Gunthorpe
620db1a118 RDMA/ucma: Fix error cases around ucma_alloc_ctx()
The store to ctx->cm_id was based on the idea that _ucma_find_context()
would not return the ctx until it was fully setup.

Without locking this doesn't work properly.

Split things so that the xarray is allocated with NULL to reserve the ID
and once everything is final set the cm_id and store.

Along the way this shows that the error unwind in ucma_get_event() if a
new ctx is created is wrong, fix it up.

Link: https://lore.kernel.org/r/20200818120526.702120-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:14 -03:00
Jason Gunthorpe
c07e12d8e9 RDMA/ucma: Consolidate the two destroy flows
ucma_close() is open coding the tail end of ucma_destroy_id(), consolidate
this duplicated code into a function.

Link: https://lore.kernel.org/r/20200818120526.702120-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:14 -03:00
Jason Gunthorpe
07e266a775 RDMA/ucma: Remove unnecessary locking of file->ctx_list in close
During the file_operations release function it is already not possible
that write() can be running concurrently, remove the extra locking
around the ctx_list.

Link: https://lore.kernel.org/r/20200818120526.702120-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:13 -03:00
Jason Gunthorpe
ca2968c1ef RDMA/ucma: Fix refcount 0 incr in ucma_get_ctx()
Both ucma_destroy_id() and ucma_close_id() (triggered from an event via a
wq) can drive the refcount to zero. ucma_get_ctx() was wrongly assuming
that the refcount can only go to zero from ucma_destroy_id() which also
removes it from the xarray.

Use refcount_inc_not_zero() instead.

Link: https://lore.kernel.org/r/20200818120526.702120-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:38:13 -03:00
Mark Zhang
7c4b1ab9f1 IB/mlx5: Add DCT RoCE LAG support
When DCT QPs work in RoCE LAG mode:
 1. DCT creation is allowed only when it is supported
 2. The "port" of a DCT QP is assigned in a round-robin way

Link: https://lore.kernel.org/r/20200818115245.700581-3-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:34:28 -03:00
Mark Zhang
8f3243a047 IB/mlx5: Add tx_affinity support for DCI QP
DCI QP supports tx_affinity as well.

Link: https://lore.kernel.org/r/20200818115245.700581-2-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:34:28 -03:00
Chuck Lever
8dc105befe RDMA/cm: Add tracepoints to track MAD send operations
Surface the operation of MAD exchanges during connection
establishment. Some samples:

[root@klimt ~]# trace-cmd report -F ib_cma
cpus=4
     kworker/0:4-123   [000]    60.677388: icm_send_rep:         local_id=1965336542 remote_id=1096195961 state=REQ_RCVD lap_state=LAP_UNINIT
   kworker/u8:11-391   [002]    60.678808: icm_send_req:         local_id=1982113758 remote_id=0 state=IDLE lap_state=LAP_UNINIT
     kworker/0:4-123   [000]    60.679652: icm_send_rtu:         local_id=1982113758 remote_id=1079418745 state=REP_RCVD lap_state=LAP_UNINIT
            nfsd-1954  [001]    60.691350: icm_send_rep:         local_id=1998890974 remote_id=1129750393 state=MRA_REQ_SENT lap_state=LAP_UNINIT
            nfsd-1954  [003]    62.017931: icm_send_drep:        local_id=1998890974 remote_id=1129750393 state=TIMEWAIT lap_state=LAP_UNINIT

Link: https://lore.kernel.org/r/159767240197.2968.12048458026453596018.stgit@klimt.1015granger.net
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 19:41:41 -03:00
Chuck Lever
75874b3d50 RDMA/cm: Replace pr_debug() call sites with tracepoints
In the interest of converging on a common instrumentation infrastructure,
modernize the pr_debug() call sites added by commit 119bf81793 ("IB/cm:
Add debug prints to ib_cm"). The new tracepoints appear in a new "ib_cma"
subsystem.

The conversion is somewhat mechanical. Someone more familiar with the
semantics of the recorded information might suggest additional data
capture.

Some benefits include:

- Tracepoints enable "always on" reporting of these errors
- The error records are structured and compact
- Tracepoints provide hooks for eBPF scripts

Sample output:

            nfsd-1954  [003]    62.017901: icm_dreq_skipped:     local_id=1998890974 remote_id=1129750393 state=DREQ_RCVD lap_state=LAP_UNINIT

Link: https://lore.kernel.org/r/159767239665.2968.10613294222688696646.stgit@klimt.1015granger.net
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 19:41:41 -03:00
Chuck Lever
b3d03daa7c RDMA/core: Move the rdma_show_ib_cm_event() macro
Refactor: Make it globally available in the utilities header.

Link: https://lore.kernel.org/r/159767239131.2968.9520990257041764685.stgit@klimt.1015granger.net
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 16:01:47 -03:00
Gal Pressman
8d9290a4a8 RDMA/efa: Remove redundant udata check from alloc ucontext response
The alloc ucontext flow is always called with a valid udata, there's no
need to test whether it's NULL.

While at it, the 'udata->outlen' check is removed as well as we copy the
minimum between the size of the response and outlen, so in case of zero
outlen, zero bytes will be copied.

Link: https://lore.kernel.org/r/20200818110835.54299-1-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 14:45:54 -03:00
Kamal Heib
62cbff3267 RDMA/vmw_pvrdma: Fix kernel-doc documentation
Fix the kernel-doc documentation by matching between the functions
definitions and documentation.

Link: https://lore.kernel.org/r/20200820123512.105193-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 14:34:18 -03:00
Mohammad Heib
fd49ddaf7e RDMA/rxe: prevent rxe creation on top of vlan interface
Creating rxe device on top of vlan interface will create a non-functional
device that has an empty gids table and can't be used for rdma cm
communication.

This is caused by the logic in
enum_all_gids_of_dev_cb()/is_eth_port_of_netdev(), which only considers
networks connected to "upper devices" of the configured network device,
resulting in an empty set of gids for a vlan interface, and attempts to
connect via this rdma device fail in cm_init_av_for_response because no
gids can be resolved.

Apparently, this behavior was implemented to fit the HW-RoCE devices that
create RoCE device per port, therefore RXE must behave the same like
HW-RoCE devices and create rxe device per real device only.

In order to communicate via a vlan interface, the user must use the gid
index of the vlan address instead of creating rxe over vlan.

Link: https://lore.kernel.org/r/20200811150415.3693-1-goody698@gmail.com
Signed-off-by: Mohammad Heib <goody698@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 14:12:18 -03:00
Yi Zhang
60b1af64eb RDMA/rxe: Fix the parent sysfs read when the interface has 15 chars
'parent' sysfs reads will yield '\0' bytes when the interface name has 15
chars, and there will no "\n" output.

To reproduce, create one interface with 15 chars:

 [root@test ~]# ip a s enp0s29u1u7u3c2
 2: enp0s29u1u7u3c2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 1000
     link/ether 02:21:28:57:47:17 brd ff:ff:ff:ff:ff:ff
     inet6 fe80::ac41:338f:5bcd:c222/64 scope link noprefixroute
        valid_lft forever preferred_lft forever
 [root@test ~]# modprobe rdma_rxe
 [root@test ~]# echo enp0s29u1u7u3c2 > /sys/module/rdma_rxe/parameters/add
 [root@test ~]# cat /sys/class/infiniband/rxe0/parent
 enp0s29u1u7u3c2[root@test ~]#
 [root@test ~]# f="/sys/class/infiniband/rxe0/parent"
 [root@test ~]# echo "$(<"$f")"
 -bash: warning: command substitution: ignored null byte in input
 enp0s29u1u7u3c2

Use scnprintf and PAGE_SIZE to fill the sysfs output buffer.

Cc: stable@vger.kernel.org
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200820153646.31316-1-yi.zhang@redhat.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 13:59:58 -03:00
Md Haris Iqbal
baa5b28b7a RDMA/rtrs-srv: Replace device_register with device_initialize and device_add
There are error cases when we will call free_srv before device kobject is
initialized; in such cases calling put_device generates the following
warning:

 kobject: '(null)' (000000009f5445ed): is not initialized, yet
 kobject_put() is being called.

So call device_initialize() only once when the server is allocated. If we
end up calling put_srv() and subsequently free_srv(), our call to
put_device() would result in deletion of the obj. Call device_add() later
when we actually have a connection. Correspondingly, call device_del()
instead of device_unregister() when srv->dev_ref falls to 0.

Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20200811092722.2450-1-haris.iqbal@cloud.ionos.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Reviewed-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 13:44:53 -03:00
Håkon Bugge
785167a114 IB/mlx4: Adjust delayed work when a dup is observed
When scheduling delayed work to clean up the cache, if the entry already
has been scheduled for deletion, we adjust the delay.

Fixes: 3cf69cc8db ("IB/mlx4: Add CM paravirtualization")
Link: https://lore.kernel.org/r/20200803061941.1139994-7-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 11:31:22 -03:00
Håkon Bugge
227a0e142e IB/mlx4: Add support for REJ due to timeout
A CM REJ packet with its reason equal to timeout is a special beast in the
sense that it doesn't have a Remote Communication ID nor does it have a
Remote Port GID.

Using CX-3 virtual functions, either from a bare-metal machine or
pass-through from a VM, MAD packets are proxied through the PF driver.

Since the VF drivers have separate name spaces for MAD Transaction Ids
(TIDs), the PF driver has to re-map the TIDs and keep the book keeping
in a cache.

This proxying doesn't not handle said REJ packets.

If the active side abandons its connection attempt after having sent a
REQ, it will send a REJ with the reason being timeout. This example can be
provoked by a simple user-verbs program, which ends up doing:

    rdma_connect(cm_id, &conn_param);
    rdma_destroy_id(cm_id);

using the async librdmacm API.

Having dynamic debug prints enabled in the mlx4_ib driver, we will then
see:

mlx4_ib_demux_cm_handler: Couldn't find an entry for pv_cm_id 0x0, attr_id 0x12

The solution is to introduce a radix-tree. When a REQ packet is received
and handled in mlx4_ib_demux_cm_handler(), we know the connecting peer's
para-virtual cm_id and the destination slave. We then insert an entry into
the tree with said information. We also schedule work to remove this entry
from the tree and free it, in order to avoid memory leak.

When a REJ packet with reason timeout is received, we can look up the
slave in the tree, and deliver the packet to the correct slave.

When a duplicate REQ packet is received, the entry is in the tree. In this
case, we adjust the delayed work in order to avoid a too premature
eviction of the entry.

When cleaning up, we simply traverse the tree and modify any delayed work
to use a zero delay. A subsequent flush of the system_wq will ensure all
entries being wiped out.

Fixes: 3cf69cc8db ("IB/mlx4: Add CM paravirtualization")
Link: https://lore.kernel.org/r/20200803061941.1139994-6-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 11:31:22 -03:00
Håkon Bugge
7fd1507df7 IB/mlx4: Fix starvation in paravirt mux/demux
The mlx4 driver will proxy MAD packets through the PF driver. A VM or an
instantiated VF will send its MAD packets to the PF driver using
loop-back. The PF driver will be informed by an interrupt, but defer the
handling and polling of CQEs to a worker thread running on an ordered
work-queue.

Consider the following scenario: the VMs will in short proximity in time,
for example due to a network event, send many MAD packets to the PF
driver. Lets say there are K VMs, each sending N packets.

The interrupt from the first VM will start the worker thread, which will
poll N CQEs. A common case here is where the PF driver will multiplex the
packets received from the VMs out on the wire QP.

But before the wire QP has returned a send CQE and associated interrupt,
the other K - 1 VMs have sent their N packets as well.

The PF driver has to multiplex K * N packets out on the wire QP. But the
send-queue on the wire QP has a finite capacity.

So, in this scenario, if K * N is larger than the send-queue capacity of
the wire QP, we will get MAD packets dropped on the floor with this
dynamic debug message:

mlx4_ib_multiplex_mad: failed sending GSI to wire on behalf of slave 2 (-11)

and this despite the fact that the wire send-queue could have capacity,
but the PF driver isn't aware, because the wire send CQEs have not yet
been polled.

We can also have a similar scenario inbound, with a wire recv-queue larger
than the tunnel QP's send-queue. If many remote peers send MAD packets to
the very same VM, the tunnel send-queue destined to the VM could allegedly
be construed to be full by the PF driver.

This starvation is fixed by introducing separate work queues for the wire
QPs vs. the tunnel QPs.

With this fix, using a dual ported HCA, 8 VFs instantiated, we could run
cmtime on each of the 18 interfaces towards a similar configured peer,
each cmtime instance with 800 QPs (all in all 14400 QPs) without a single
CM packet getting lost.

Fixes: 3cf69cc8db ("IB/mlx4: Add CM paravirtualization")
Link: https://lore.kernel.org/r/20200803061941.1139994-5-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 11:31:22 -03:00
Håkon Bugge
0ae207fb91 IB/mlx4: Separate tunnel and wire bufs parameters
Using CX-3 in virtualized mode, MAD packets are proxied through the PF
driver. The feed is N tunnel QPs, and what is received from the VFs is
multiplexed out on the wire QP. Since this is a many-to-one scenario, it
is better to have separate initialization parameters for the two usages.

The number of wire and tunnel bufs are yanked up to 2K and 512
respectively. With this set of parameters, a system consisting of eight
physical servers, each with eight VMs and 14 I/O servers (BM), can run
switch fail-over without seeing:

mlx4_ib_demux_mad: failed sending GSI to slave 3 via tunnel qp (-11)

or

mlx4_ib_multiplex_mad: failed sending GSI to wire on behalf of slave 2 (-11)

Fixes: 3cf69cc8db ("IB/mlx4: Add CM paravirtualization")
Link: https://lore.kernel.org/r/20200803061941.1139994-4-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 11:31:21 -03:00
Håkon Bugge
e7d087fce6 IB/mlx4: Add support for MRA
Using CX-3 in virtualized mode, MAD packets are proxied through the PF
driver. However, the handling lacks support of the MRA (Message Receipt
Acknowledgment) packet. When having dynamic debug enabled, we see tons of:

mlx4_ib_multiplex_cm_handler: id{slave: 7, sl_cm_id: 0x8fcb45a0} is NULL! attr_id: 0x11

Fixes: 3cf69cc8db ("IB/mlx4: Add CM paravirtualization")
Link: https://lore.kernel.org/r/20200803061941.1139994-3-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 11:31:21 -03:00
Håkon Bugge
094619449a IB/mlx4: Add and improve logging
Add missing check for success after call to mlx4_ib_send_to_wire() in
mlx4_ib_multiplex_mad().

Amended the existing pr_debug() in mlx4_ib_multiplex_cm_handler() and
mlx4_ib_demux_cm_handler() with attr_id during a lookup failure.

Removed two noisy pr_debug() in mad.c

Fixes: 3cf69cc8db ("IB/mlx4: Add CM paravirtualization")
Link: https://lore.kernel.org/r/20200803061941.1139994-2-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 11:31:21 -03:00
Gustavo A. R. Silva
df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Weihang Li
6da06c6291 Revert "RDMA/hns: Reserve one sge in order to avoid local length error"
This patch caused some issues on SEND operation, and it should be reverted
to make the drivers work correctly. There will be a better solution that
has been tested carefully to solve the original problem.

This reverts commit 711195e57d.

Fixes: 711195e57d ("RDMA/hns: Reserve one sge in order to avoid local length error")
Link: https://lore.kernel.org/r/1597829984-20223-1-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-20 08:35:19 -03:00
Kaike Wan
b25e8e85e7 RDMA/hfi1: Correct an interlock issue for TID RDMA WRITE request
The following message occurs when running an AI application with TID RDMA
enabled:

hfi1 0000:7f:00.0: hfi1_0: [QP74] hfi1_tid_timeout 4084
hfi1 0000:7f:00.0: hfi1_0: [QP70] hfi1_tid_timeout 4084

The issue happens when TID RDMA WRITE request is followed by an
IB_WR_RDMA_WRITE_WITH_IMM request, the latter could be completed first on
the responder side. As a result, no ACK packet for the latter could be
sent because the TID RDMA WRITE request is still being processed on the
responder side.

When the TID RDMA WRITE request is eventually completed, the requester
will wait for the IB_WR_RDMA_WRITE_WITH_IMM request to be acknowledged.

If the next request is another TID RDMA WRITE request, no TID RDMA WRITE
DATA packet could be sent because the preceding IB_WR_RDMA_WRITE_WITH_IMM
request is not completed yet.

Consequently the IB_WR_RDMA_WRITE_WITH_IMM will be retried but it will be
ignored on the responder side because the responder thinks it has already
been completed. Eventually the retry will be exhausted and the qp will be
put into error state on the requester side. On the responder side, the TID
resource timer will eventually expire because no TID RDMA WRITE DATA
packets will be received for the second TID RDMA WRITE request.  There is
also risk of a write-after-write memory corruption due to the issue.

Fix by adding a requester side interlock to prevent any potential data
corruption and TID RDMA protocol error.

Fixes: a0b34f75ec ("IB/hfi1: Add interlock between a TID RDMA request and other requests")
Link: https://lore.kernel.org/r/20200811174931.191210.84093.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org> # 5.4.x+
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-20 08:31:41 -03:00
Selvin Xavier
a812f2d60a RDMA/bnxt_re: Do not add user qps to flushlist
Driver shall add only the kernel qps to the flush list for clean up.
During async error events from the HW, driver is adding qps to this list
without checking if the qp is kernel qp or not.

Add a check to avoid user qp addition to the flush list.

Fixes: 942c9b6ca8 ("RDMA/bnxt_re: Avoid Hard lockup during error CQE processing")
Fixes: c50866e285 ("bnxt_re: fix the regression due to changes in alloc_pbl")
Link: https://lore.kernel.org/r/1596689148-4023-1-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-20 08:31:41 -03:00
Colin Ian King
4469add9d3 RDMA/core: Fix spelling mistake "Could't" -> "Couldn't"
There is a spelling mistake in a pr_warn message. Fix it.

Link: https://lore.kernel.org/r/20200810075824.46770-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-20 08:31:41 -03:00
Jason Gunthorpe
c0f4979e90 RDMA/cm: Remove unused cm_class
Previous commits removed all references to the /sys/class/infiniband_cm/
directory represented by the cm_class symbol. Remove the directory and
cm_class.

Fixes: a1a8e4a85c ("rdma: Delete the ib_ucm module")
Link: https://lore.kernel.org/r/0-v1-90096a98c476+205-remove_cm_leftovers_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 15:43:07 -03:00
Max Gurtovoy
c97119b6d3 IB/isert: remove duplicated error prints
The isert_post_recv function prints an error in case of failures, so no
need for the callers to add another print.

Link: https://lore.kernel.org/r/20200805121231.166162-2-maxg@mellanox.com
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 15:22:05 -03:00
Maor Gottlieb
e6ac9f6006 RDMA/mlx5: Enable sniffer when device is in switchdev mode
In order to allow sniffer when the RDMA device is in switchdev mode, we
don't need to set the source port when creating the sniffer rule.

Link: https://lore.kernel.org/r/20200803060214.15328-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 15:03:32 -03:00
Mark Zhang
c531024bb1 RDMA/mlx5: Add new IB rates support
Support 56, 25, 100, 200 and 50Gbps IB rates in mlx5 driver.

Link: https://lore.kernel.org/r/20200802081712.1993490-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 15:03:32 -03:00
Gal Pressman
a4e6a1dd57 RDMA/efa: Introduce SRD RNR retry
This patch introduces the ability to configure SRD QPs with the RNR retry
parameter when issuing a modify QP command.

In addition, a capability bit was added to report support to the userspace
library.

Link: https://lore.kernel.org/r/20200731060420.17053-5-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 14:52:45 -03:00
Gal Pressman
22c50e0660 RDMA/efa: Introduce SRD QP state machine
This precursory patch adds the SRD QP type state machine, which is
currently identical to the one of UD QP type.  A following patch is going
to change the SRD QP state machine to support RNR retry modifications.

Link: https://lore.kernel.org/r/20200731060420.17053-4-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 14:52:45 -03:00
Gal Pressman
ab67badd1c RDMA/efa: Be consistent with modify QP bitmask
The modify QP bitmask was not consistent with other bitmasks used in the
device interface. Remove the bitmask enum and allow usage with
EFA_GET/SET.

Link: https://lore.kernel.org/r/20200731060420.17053-3-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 14:52:44 -03:00
Gal Pressman
34eb009ffe RDMA/efa: Add a generic capability check helper
Instead of adding a new function for each capability added, introduce a
generic helper to query device capabilities.

Link: https://lore.kernel.org/r/20200731060420.17053-2-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 14:52:44 -03:00
Leon Romanovsky
d6673746d6 RDMA: Remove constant domain argument from flow creation call
The "domain" argument is constant and modern device (mlx5) doesn't support
anything except IB_FLOW_DOMAIN_USER, so delete this extra parameter and
simplify code.

Link: https://lore.kernel.org/r/20200730081235.1581127-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 14:47:34 -03:00
Leon Romanovsky
70c1430fba RDMA/mlx5: Replace open-coded offsetofend() macro
Clean mlx5_ib from open-coded implementations of offsetofend().

Link: https://lore.kernel.org/r/20200730081235.1581127-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 14:47:34 -03:00
Leon Romanovsky
156f378985 RDMA/mlx5: Simplify multiple else-if cases with switch keyword
Improve readability of fs.c by converting multiple else-if constructions
to be implemented with switch keyword.

Link: https://lore.kernel.org/r/20200730081235.1581127-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 14:47:34 -03:00
Colin Ian King
dfd022a9ea RDMA/usnic: Fix spelling mistake "transistion" -> "transition"
There is a spelling mistake in a usnic_err error message. Fix it.

Link: https://lore.kernel.org/r/20200805141459.23069-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 13:18:53 -03:00
Colin Ian King
d963c524a4 RDMA/hns: Fix spelling mistake "epmty" -> "empty"
There is a spelling mistake in a dev_dbg message. Fix it.

Link: https://lore.kernel.org/r/20200805141111.22804-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 13:18:53 -03:00
Peter Xu
64019a2e46 mm/gup: remove task_struct pointer for all gup code
After the cleanup of page fault accounting, gup does not need to pass
task_struct around any more.  Remove that parameter in the whole gup
stack.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12 10:58:04 -07:00
Linus Torvalds
d7806bbd22 RDMA 5.9 merge window pull request
Smaller set of RDMA updates. A smaller number of 'big topics' with the
 majority of changes being driver updates.
 
 - Driver updates for hfi1, rxe, mlx5, hns, qedr, usnic, bnxt_re
 
 - Removal of dead or redundant code across the drivers
 
 - RAW resource tracker dumps to include a device specific data blob for
   device objects to aide device debugging
 
 - Further advance the IOCTL interface, remove the ability to turn it off.
   Add QUERY_CONTEXT, QUERY_MR, and QUERY_PD commands
 
 - Remove stubs related to devices with no pkey table
 
 - A shared CQ scheme to allow multiple ULPs to share the CQ rings of a
   device to give higher performance
 
 - Several more static checker, syzkaller and rare crashers fixed
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl8sSA0ACgkQOG33FX4g
 mxpp1w/8Df/KIB38PVHpKraIW10bX03KsXwoskMYCA+ITYWM5ce+P7YF+yXXGs69
 Vh2vUYHlr1RvqXQkq3Y3LjzCPKTYFuNFVQRZF1LrfbfOpSS9aoQqoxwgKs08dibm
 YDeRwueWneksWhXeEZLA0QoKd4kEWrScA/n7VGYQ4YcWw8FLKa9t6OMSGivCrFLu
 QA+sA9nytrvMWC5uJUCdeVwlRnoaICPYHmM5yafOykPyEciRw2jU1kzTRVy5Z0Hu
 iCsXm2lJPcVoMgSjW6SgktY3oBkQeSu3ZZesT3eTM6FJsoDYkuSiKjNmWSZjW1zv
 x6CFGjVVin41rN4FMTeqqnwYoML9Q/obbyHvBHs5MTd5J8tLDhesQj3Ev7CUaUed
 b0s38v+oEL1w22nkOChfeyfh7eLcy3yiszqvkIU9ABk8mF0p1guGQYsfguzbsq0K
 3ZRw/361SxCUBvU6P8CdQbIJlhkH+Un7d81qyt+rhLgaZYm/N+d8auIKUxP1jCxh
 q9hss2Cj2U9eZsA/wGNqV1LNazfEAAj/5qjItMirbRd90FL8h+AP2LfJfC7p+id3
 3BfOui0JbZqNTTl4ftTxPuxtWDEdTPgwi7JvQd/be9HRlSV8DYCSMUzYFn8A+Zya
 cbxjxFuBJWmF+y9csDIVBTdFi+j9hO6notw+G89NznuB3QlPl50=
 =0z2L
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "A quiet cycle after the larger 5.8 effort. Substantially cleanup and
  driver work with a few smaller features this time.

   - Driver updates for hfi1, rxe, mlx5, hns, qedr, usnic, bnxt_re

   - Removal of dead or redundant code across the drivers

   - RAW resource tracker dumps to include a device specific data blob
     for device objects to aide device debugging

   - Further advance the IOCTL interface, remove the ability to turn it
     off. Add QUERY_CONTEXT, QUERY_MR, and QUERY_PD commands

   - Remove stubs related to devices with no pkey table

   - A shared CQ scheme to allow multiple ULPs to share the CQ rings of
     a device to give higher performance

   - Several more static checker, syzkaller and rare crashers fixed"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (121 commits)
  RDMA/mlx5: Fix flow destination setting for RDMA TX flow table
  RDMA/rxe: Remove pkey table
  RDMA/umem: Add a schedule point in ib_umem_get()
  RDMA/hns: Fix the unneeded process when getting a general type of CQE error
  RDMA/hns: Fix error during modify qp RTS2RTS
  RDMA/hns: Delete unnecessary memset when allocating VF resource
  RDMA/hns: Remove redundant parameters in set_rc_wqe()
  RDMA/hns: Remove support for HIP08_A
  RDMA/hns: Refactor hns_roce_v2_set_hem()
  RDMA/hns: Remove redundant hardware opcode definitions
  RDMA/netlink: Remove CAP_NET_RAW check when dump a raw QP
  RDMA/include: Replace license text with SPDX tags
  RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq
  RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting
  RDMA/cma: Execute rdma_cm destruction from a handler properly
  RDMA/cma: Remove unneeded locking for req paths
  RDMA/cma: Using the standard locking pattern when delivering the removal event
  RDMA/cma: Simplify DEVICE_REMOVAL for internal_id
  RDMA/efa: Add EFA 0xefa1 PCI ID
  RDMA/efa: User/kernel compatibility handshake mechanism
  ...
2020-08-06 16:43:36 -07:00
Linus Torvalds
47ec5303d7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:

 1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan.

 2) Support UDP segmentation in code TSO code, from Eric Dumazet.

 3) Allow flashing different flash images in cxgb4 driver, from Vishal
    Kulkarni.

 4) Add drop frames counter and flow status to tc flower offloading,
    from Po Liu.

 5) Support n-tuple filters in cxgb4, from Vishal Kulkarni.

 6) Various new indirect call avoidance, from Eric Dumazet and Brian
    Vazquez.

 7) Fix BPF verifier failures on 32-bit pointer arithmetic, from
    Yonghong Song.

 8) Support querying and setting hardware address of a port function via
    devlink, use this in mlx5, from Parav Pandit.

 9) Support hw ipsec offload on bonding slaves, from Jarod Wilson.

10) Switch qca8k driver over to phylink, from Jonathan McDowell.

11) In bpftool, show list of processes holding BPF FD references to
    maps, programs, links, and btf objects. From Andrii Nakryiko.

12) Several conversions over to generic power management, from Vaibhav
    Gupta.

13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry
    Yakunin.

14) Various https url conversions, from Alexander A. Klimov.

15) Timestamping and PHC support for mscc PHY driver, from Antoine
    Tenart.

16) Support bpf iterating over tcp and udp sockets, from Yonghong Song.

17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov.

18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan.

19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several
    drivers. From Luc Van Oostenryck.

20) XDP support for xen-netfront, from Denis Kirjanov.

21) Support receive buffer autotuning in MPTCP, from Florian Westphal.

22) Support EF100 chip in sfc driver, from Edward Cree.

23) Add XDP support to mvpp2 driver, from Matteo Croce.

24) Support MPTCP in sock_diag, from Paolo Abeni.

25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic
    infrastructure, from Jakub Kicinski.

26) Several pci_ --> dma_ API conversions, from Christophe JAILLET.

27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel.

28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki.

29) Refactor a lot of networking socket option handling code in order to
    avoid set_fs() calls, from Christoph Hellwig.

30) Add rfc4884 support to icmp code, from Willem de Bruijn.

31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei.

32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin.

33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin.

34) Support TCP syncookies in MPTCP, from Flowian Westphal.

35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano
    Brivio.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits)
  net: thunderx: initialize VF's mailbox mutex before first usage
  usb: hso: remove bogus check for EINPROGRESS
  usb: hso: no complaint about kmalloc failure
  hso: fix bailout in error case of probe
  ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM
  selftests/net: relax cpu affinity requirement in msg_zerocopy test
  mptcp: be careful on subflow creation
  selftests: rtnetlink: make kci_test_encap() return sub-test result
  selftests: rtnetlink: correct the final return value for the test
  net: dsa: sja1105: use detected device id instead of DT one on mismatch
  tipc: set ub->ifindex for local ipv6 address
  ipv6: add ipv6_dev_find()
  net: openvswitch: silence suspicious RCU usage warning
  Revert "vxlan: fix tos value before xmit"
  ptp: only allow phase values lower than 1 period
  farsync: switch from 'pci_' to 'dma_' API
  wan: wanxl: switch from 'pci_' to 'dma_' API
  hv_netvsc: do not use VF device if link is down
  dpaa2-eth: Fix passing zero to 'PTR_ERR' warning
  net: macb: Properly handle phylink on at91sam9x
  ...
2020-08-05 20:13:21 -07:00
Michael Guralnik
23fcc7dee2 RDMA/mlx5: Fix flow destination setting for RDMA TX flow table
For RDMA TX flow table, set destination type to be 'port' and prevent
creation of flows with TIR destination.

As RDMA TX is an egress flow table the rules on this flow table should
not forward traffic back to the NIC and should set the destination to be
the port.

Without the setting of this destination type flow rules on the RDMA TX
flow tables are not created as FW invokes a syndrome for undefined
destination for the rule.

Fixes: 24670b1a31 ("net/mlx5: Add support for RDMA TX steering")
Link: https://lore.kernel.org/r/20200803055849.14947-1-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-05 21:09:39 -03:00
Linus Torvalds
2ed90dbbf7 dma-mapping updates for 5.9
- make support for dma_ops optional
  - move more code out of line
  - add generic support for a dma_ops bypass mode
  - misc cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl8oGscLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYNfEhAAmFwd6BBHGwAhXUchoIue5vdNnuY3GiBFRzUdz67W
 zRYYgZYiPjl+MwflRmwPcoWEnGzmweRa2s6OnyDostiCRauioa8BuQfGqJasf1yZ
 D36dFNVHGW0o6pRDUQkd688k/4A6szwuwpq83qi4e8X2I9QzAITHtW8izjfPM923
 FlJzxEFggbB2TvwfUXOZhmpuG4Dog8S7VZ1Uz4QAg0Z/5FDqIKAAG2aZMqCXBbiX
 01E8tr0AqU/jn2xpc8O+DJGFiYIRhqhyNxQbH6qz1Q3xGFSokcLYm3YqkqVOgpn1
 DLs2UFDxWkly/F+wGnYtju7OD9VGPywzOcW125/LIsApYN5R/rYrtQzK41eq7Mp5
 HY3tqgNTIMdnl4so7QXeU4Vxj+lUdPlI26NZGszcM5AVftdTX8KjGdS+0+PBza6i
 i7trwG7J5/DnwiBCvEKoul7Ul1psUMTSvYwINTXRqsU4mZXhhx/mwyXbtruELnkj
 3agM98u6hoalLNjd2aueh+NjMZi1r+MchTrfRvTcxJ+yQ5BoR5kF+iz7eT/LtZ72
 AqWwimsPGNkLHUa0TrqWql5tv90cdDkBZzWXVbixwxRfgynWYLE6jugeIy8hwjFf
 GjO5XKbBwnWPjdSzFsVMPeuNpmr7ZjVHHewy2Q/jWQAIOyeof0VztEl23LN5yUkx
 pc8=
 =90UK
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-5.9' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

 - make support for dma_ops optional

 - move more code out of line

 - add generic support for a dma_ops bypass mode

 - misc cleanups

* tag 'dma-mapping-5.9' of git://git.infradead.org/users/hch/dma-mapping:
  dma-contiguous: cleanup dma_alloc_contiguous
  dma-debug: use named initializers for dir2name
  powerpc: use the generic dma_ops_bypass mode
  dma-mapping: add a dma_ops_bypass flag to struct device
  dma-mapping: make support for dma ops optional
  dma-mapping: inline the fast path dma-direct calls
  dma-mapping: move the remaining DMA API calls out of line
2020-08-04 17:29:57 -07:00
Linus Torvalds
99ea1521a0 Remove uninitialized_var() macro for v5.9-rc1
- Clean up non-trivial uses of uninitialized_var()
 - Update documentation and checkpatch for uninitialized_var() removal
 - Treewide removal of uninitialized_var()
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl8oYLQWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJsfjEACvf0D3WL3H7sLHtZ2HeMwOgAzq
 il08t6vUscINQwiIIK3Be43ok3uQ1Q+bj8sr2gSYTwunV2IYHFferzgzhyMMno3o
 XBIGd1E+v1E4DGBOiRXJvacBivKrfvrdZ7AWiGlVBKfg2E0fL1aQbe9AYJ6eJSbp
 UGqkBkE207dugS5SQcwrlk1tWKUL089lhDAPd7iy/5RK76OsLRCJFzIerLHF2ZK2
 BwvA+NWXVQI6pNZ0aRtEtbbxwEU4X+2J/uaXH5kJDszMwRrgBT2qoedVu5LXFPi8
 +B84IzM2lii1HAFbrFlRyL/EMueVFzieN40EOB6O8wt60Y4iCy5wOUzAdZwFuSTI
 h0xT3JI8BWtpB3W+ryas9cl9GoOHHtPA8dShuV+Y+Q2bWe1Fs6kTl2Z4m4zKq56z
 63wQCdveFOkqiCLZb8s6FhnS11wKtAX4czvXRXaUPgdVQS1Ibyba851CRHIEY+9I
 AbtogoPN8FXzLsJn7pIxHR4ADz+eZ0dQ18f2hhQpP6/co65bYizNP5H3h+t9hGHG
 k3r2k8T+jpFPaddpZMvRvIVD8O2HvJZQTyY6Vvneuv6pnQWtr2DqPFn2YooRnzoa
 dbBMtpon+vYz6OWokC5QNWLqHWqvY9TmMfcVFUXE4AFse8vh4wJ8jJCNOFVp8On+
 drhmmImUr1YylrtVOw==
 =xHmk
 -----END PGP SIGNATURE-----

Merge tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull uninitialized_var() macro removal from Kees Cook:
 "This is long overdue, and has hidden too many bugs over the years. The
  series has several "by hand" fixes, and then a trivial treewide
  replacement.

   - Clean up non-trivial uses of uninitialized_var()

   - Update documentation and checkpatch for uninitialized_var() removal

   - Treewide removal of uninitialized_var()"

* tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  compiler: Remove uninitialized_var() macro
  treewide: Remove uninitialized_var() usage
  checkpatch: Remove awareness of uninitialized_var() macro
  mm/debug_vm_pgtable: Remove uninitialized_var() usage
  f2fs: Eliminate usage of uninitialized_var() macro
  media: sur40: Remove uninitialized_var() usage
  KVM: PPC: Book3S PR: Remove uninitialized_var() usage
  clk: spear: Remove uninitialized_var() usage
  clk: st: Remove uninitialized_var() usage
  spi: davinci: Remove uninitialized_var() usage
  ide: Remove uninitialized_var() usage
  rtlwifi: rtl8192cu: Remove uninitialized_var() usage
  b43: Remove uninitialized_var() usage
  drbd: Remove uninitialized_var() usage
  x86/mm/numa: Remove uninitialized_var() usage
  docs: deprecated.rst: Add uninitialized_var()
2020-08-04 13:49:43 -07:00
David S. Miller
bd0b33b248 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Resolved kernel/bpf/btf.c using instructions from merge commit
69138b34a7

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-02 01:02:12 -07:00
Kamal Heib
76251e15ea RDMA/rxe: Remove pkey table
The RoCE spec requires RoCE devices to support only the default pkey.
However the rxe driver maintains a 64 enties pkey table and uses only the
first entry. Remove the pkey table and hard code a table of length one
hard wired with the default pkey. Replace all checks of the pkey_table
with a comparison to the default_pkey instead.

Link: https://lore.kernel.org/r/20200721101618.686110-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-31 16:17:56 -03:00
Eric Dumazet
928da37a22 RDMA/umem: Add a schedule point in ib_umem_get()
Mapping as little as 64GB can take more than 10 seconds, triggering issues
on kernels with CONFIG_PREEMPT_NONE=y.

ib_umem_get() already splits the work in 2MB units on x86_64, adding a
cond_resched() in the long-lasting loop is enough to solve the issue.

Note that sg_alloc_table() can still use more than 100 ms, which is also
problematic. This might be addressed later in ib_umem_add_sg_table(),
adding new blocks in sgl on demand.

Link: https://lore.kernel.org/r/20200730015755.1827498-1-edumazet@google.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-31 14:12:30 -03:00
Xi Wang
395f2e8fd3 RDMA/hns: Fix the unneeded process when getting a general type of CQE error
If the hns ROCEE reports a general error CQE (types not specified by the IB
General Specifications), it's no need to change the QP state to error, and
the driver should just skip it.

Fixes: 7c044adca2 ("RDMA/hns: Simplify the cqe code of poll cq")
Link: https://lore.kernel.org/r/1595932941-40613-8-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-30 15:11:49 -03:00
Lang Cheng
4327bd2c41 RDMA/hns: Fix error during modify qp RTS2RTS
One qp state migrations legal configuration was deleted mistakenly.

Fixes: 357f342946 ("RDMA/hns: Simplify the state judgment code of qp")
Link: https://lore.kernel.org/r/1595932941-40613-7-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-30 15:11:49 -03:00
Lang Cheng
a5531e9b70 RDMA/hns: Delete unnecessary memset when allocating VF resource
The hns_roce_cmq_setup_basic_desc() can clear the whole desc, so removes
these redundant memset operations.

Link: https://lore.kernel.org/r/1595932941-40613-6-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-30 15:11:49 -03:00
Weihang Li
eaaa98dedf RDMA/hns: Remove redundant parameters in set_rc_wqe()
There are some functions called by set_rc_wqe() use two parameters:
"void *wqe" and "struct hns_roce_v2_rc_send_wqe *rc_sq_wqe", but the first
one can be got from the second one. So remove the redundant wqe from
related functions.

Link: https://lore.kernel.org/r/1595932941-40613-5-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-30 11:31:02 -03:00
Lang Cheng
a247fd28c1 RDMA/hns: Remove support for HIP08_A
HIP08_A is an temporary version and all features of it are supported by
HIP08_B. So remove the relevant code.

Link: https://lore.kernel.org/r/1595932941-40613-4-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-30 11:31:02 -03:00
Weihang Li
cdc1f3e946 RDMA/hns: Refactor hns_roce_v2_set_hem()
The parts about preparing and sending mailbox to hardware is not strongly
related to other codes in hns_roce_v2_set_hem(), and can be encapsulated
into a separate function.

Link: https://lore.kernel.org/r/1595932941-40613-3-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-30 11:31:02 -03:00
Lang Cheng
57005c96b7 RDMA/hns: Remove redundant hardware opcode definitions
HNS_ROCE_SQ_OPCODE_XXXs and HNS_ROCE_V2_WQE_OP_XXXs have same values, so
remove a set of redundant definitions. In addition, remove the suffix of
HNS_ROCE_V2_WQE_OP_BIND_MW_TYPE.

Link: https://lore.kernel.org/r/1595932941-40613-2-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-30 11:31:02 -03:00
Leon Romanovsky
fb448ce87a RDMA/core: Free DIM memory in error unwind
The memory allocated for the DIM wasn't freed in in error unwind path, fix
it by calling to rdma_dim_destroy().

Fixes: da6629793a ("RDMA/core: Provide RDMA DIM support for ULPs")
Link: https://lore.kernel.org/r/20200730082719.1582397-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com <mailto:maxg@mellanox.com>>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-30 11:03:33 -03:00
Leon Romanovsky
5d46b289d0 RDMA/core: Stop DIM before destroying CQ
HW destroy operation should be last operation after all possible CQ users
completed their work, so move DIM work cancellation before such destroy
call.

Fixes: da6629793a ("RDMA/core: Provide RDMA DIM support for ULPs")
Link: https://lore.kernel.org/r/20200730082719.1582397-3-leon@kernel.org
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-30 11:03:33 -03:00
Leon Romanovsky
7fa84b5708 RDMA/mlx5: Initialize QP mutex for the debug kernels
In DCT and RSS RAW QP creation flows, the QP mutex wasn't initialized and
the magic field inside lock was missing. This caused to the following
kernel warning for kernels build with CONFIG_DEBUG_MUTEXES.

 DEBUG_LOCKS_WARN_ON(lock->magic != lock)
 WARNING: CPU: 3 PID: 16261 at kernel/locking/mutex.c:938 __mutex_lock+0x60e/0x940
 Modules linked in: bonding nf_tables ipip tunnel4 geneve ip6_udp_tunnel udp_tunnel ip6_gre ip6_tunnel tunnel6 ip_gre gre ip_tunnel mlx5_ib mlx5_core mlxfw ptp pps_core rdma_ucm ib_uverbs ib_ipoib ib_umad openvswitch nsh xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter overlay ib_srp scsi_transport_srp rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core [last unloaded: mlxfw]
 CPU: 3 PID: 16261 Comm: ib_send_bw Not tainted 5.8.0-rc4_for_upstream_min_debug_2020_07_08_22_04 #1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
 RIP: 0010:__mutex_lock+0x60e/0x940
 Code: c0 0f 84 6d fa ff ff 44 8b 15 4e 9d ba 00 45 85 d2 0f 85 5d fa ff ff 48 c7 c6 f2 de 2b 82 48 c7 c7 f1 8a 2b 82 e8 d2 4d 72 ff <0f> 0b 4c 8b 4d 88 e9 3f fa ff ff f6 c2 04 0f 84 37 fe ff ff 48 89
 RSP: 0018:ffff88810bb8b870 EFLAGS: 00010286
 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
 RDX: ffff88829f1dd880 RSI: 0000000000000000 RDI: ffffffff81192afa
 RBP: ffff88810bb8b910 R08: 0000000000000000 R09: 0000000000000028
 R10: 0000000000000000 R11: 0000000000003f85 R12: 0000000000000002
 R13: ffff88827d8d3ce0 R14: ffffffffa059f615 R15: ffff8882a4d02610
 FS:  00007f3f6988e740(0000) GS:ffff8882f5b80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000556556158000 CR3: 000000010a63c005 CR4: 0000000000360ea0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  ? cmd_exec+0x947/0xe60 [mlx5_core]
  ? __mutex_lock+0x76/0x940
  ? mlx5_ib_qp_set_counter+0x25/0xa0 [mlx5_ib]
  mlx5_ib_qp_set_counter+0x25/0xa0 [mlx5_ib]
  mlx5_ib_counter_bind_qp+0x9b/0xe0 [mlx5_ib]
  __rdma_counter_bind_qp+0x6b/0xa0 [ib_core]
  rdma_counter_bind_qp_auto+0x363/0x520 [ib_core]
  _ib_modify_qp+0x316/0x580 [ib_core]
  ib_modify_qp_with_udata+0x19/0x30 [ib_core]
  modify_qp+0x4c4/0x600 [ib_uverbs]
  ib_uverbs_ex_modify_qp+0x87/0xe0 [ib_uverbs]
  ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x129/0x1c0 [ib_uverbs]
  ib_uverbs_cmd_verbs.isra.5+0x5d5/0x11f0 [ib_uverbs]
  ? ib_uverbs_handler_UVERBS_METHOD_QUERY_CONTEXT+0x120/0x120 [ib_uverbs]
  ? lock_acquire+0xb9/0x3a0
  ? ib_uverbs_ioctl+0xd0/0x210 [ib_uverbs]
  ? ib_uverbs_ioctl+0x175/0x210 [ib_uverbs]
  ib_uverbs_ioctl+0x14b/0x210 [ib_uverbs]
  ? ib_uverbs_ioctl+0xd0/0x210 [ib_uverbs]
  ksys_ioctl+0x234/0x7d0
  ? exc_page_fault+0x202/0x640
  ? do_syscall_64+0x1f/0x2e0
  __x64_sys_ioctl+0x16/0x20
  do_syscall_64+0x59/0x2e0
  ? asm_exc_page_fault+0x8/0x30
  ? rcu_read_lock_sched_held+0x52/0x60
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: b4aaa1f0b4 ("IB/mlx5: Handle type IB_QPT_DRIVER when creating a QP")
Link: https://lore.kernel.org/r/20200730082719.1582397-2-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-30 11:03:33 -03:00
Mike Marciniszyn
54a485e9ec IB/rdmavt: Fix RQ counting issues causing use of an invalid RWQE
The lookaside count is improperly initialized to the size of the
Receive Queue with the additional +1.  In the traces below, the
RQ size is 384, so the count was set to 385.

The lookaside count is then rarely refreshed.  Note the high and
incorrect count in the trace below:

rvt_get_rwqe: [hfi1_0] wqe ffffc900078e9008 wr_id 55c7206d75a0 qpn c
	qpt 2 pid 3018 num_sge 1 head 1 tail 0, count 385
rvt_get_rwqe: (hfi1_rc_rcv+0x4eb/0x1480 [hfi1] <- rvt_get_rwqe) ret=0x1

The head,tail indicate there is only one RWQE posted although the count
says 385 and we correctly return the element 0.

The next call to rvt_get_rwqe with the decremented count:

rvt_get_rwqe: [hfi1_0] wqe ffffc900078e9058 wr_id 0 qpn c
	qpt 2 pid 3018 num_sge 0 head 1 tail 1, count 384
rvt_get_rwqe: (hfi1_rc_rcv+0x4eb/0x1480 [hfi1] <- rvt_get_rwqe) ret=0x1

Note that the RQ is empty (head == tail) yet we return the RWQE at tail 1,
which is not valid because of the bogus high count.

Best case, the RWQE has never been posted and the rc logic sees an RWQE
that is too small (all zeros) and puts the QP into an error state.

In the worst case, a server slow at posting receive buffers might fool
rvt_get_rwqe() into fetching an old RWQE and corrupt memory.

Fix by deleting the faulty initialization code and creating an
inline to fetch the posted count and convert all callers to use
new inline.

Fixes: f592ae3c99 ("IB/rdmavt: Fracture single lock used for posting and processing RWQEs")
Link: https://lore.kernel.org/r/20200728183848.22226.29132.stgit@awfm-01.aw.intel.com
Reported-by: Zhaojuan Guo <zguo@redhat.com>
Cc: <stable@vger.kernel.org> # 5.4.x
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Tested-by: Honggang Li <honli@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 15:54:36 -03:00
Mark Zhang
1d70ad0f85 RDMA/netlink: Remove CAP_NET_RAW check when dump a raw QP
When dumping QPs bound to a counter, raw QPs should be allowed to dump
without the CAP_NET_RAW privilege. This is consistent with what "rdma res
show qp" does.

Fixes: c4ffee7c9b ("RDMA/netlink: Implement counter dumpit calback")
Link: https://lore.kernel.org/r/20200727095828.496195-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 15:51:19 -03:00
Jack Wang
03ed5a8cda RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq
lockdep triggers a warning from time to time when running a regression
test:

 rnbd_client L685: </dev/nullb0@bla> Device disconnected.
 rnbd_client L1756: Unloading module

 workqueue: WQ_MEM_RECLAIM rtrs_client_wq:rtrs_clt_reconnect_work [rtrs_client] is flushing !WQ_MEM_RECLAIM ib_addr:process_one_req [ib_core]
 WARNING: CPU: 2 PID: 18824 at kernel/workqueue.c:2517 check_flush_dependency+0xad/0x130

The root cause is workqueue core expect flushing should not be done for a
!WQ_MEM_RECLAIM wq from a WQ_MEM_RECLAIM workqueue.

In above case ib_addr workqueue without WQ_MEM_RECLAIM, but rtrs_wq
WQ_MEM_RECLAIM.

To avoid the warning, remove the WQ_MEM_RECLAIM flag.

Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/20200724111508.15734-4-haris.iqbal@cloud.ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 14:26:53 -03:00
Danil Kipnis
09e0dbbeed RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting
In order to avoid all the clients to start reconnecting at the same time
schedule the reconnect dwork with a random jitter of +[0,8] seconds.

Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/20200724111508.15734-2-haris.iqbal@cloud.ionos.com
Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 14:26:53 -03:00
Leon Romanovsky
81530ab08e RDMA/mlx5: Allow providing extra scatter CQE QP flag
Scatter CQE feature relies on two flags MLX5_QP_FLAG_SCATTER_CQE and
MLX5_QP_FLAG_ALLOW_SCATTER_CQE, both of them can be provided without
relation to device capability.

Relax global validity check to allow MLX5_QP_FLAG_ALLOW_SCATTER_CQE QP
flag.

Existing user applications are failing on this new validity check.

Fixes: 90ecb37a75 ("RDMA/mlx5: Change scatter CQE flag to be set like other vendor flags")
Fixes: 37518fa49f ("RDMA/mlx5: Process all vendor flags in one place")
Link: https://lore.kernel.org/r/20200728120255.805733-1-leon@kernel.org
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 14:19:01 -03:00
Jason Gunthorpe
f6a9d47ae6 RDMA/cma: Execute rdma_cm destruction from a handler properly
When a rdma_cm_id needs to be destroyed after a handler callback fails,
part of the destruction pattern is open coded into each call site.

Unfortunately the blind assignment to state discards important information
needed to do cma_cancel_operation(). This results in active operations
being left running after rdma_destroy_id() completes, and the
use-after-free bugs from KASAN.

Consolidate this entire pattern into destroy_id_handler_unlock() and
manage the locking correctly. The state should be set to
RDMA_CM_DESTROYING under the handler_lock to atomically ensure no futher
handlers are called.

Link: https://lore.kernel.org/r/20200723070707.1771101-5-leon@kernel.org
Reported-by: syzbot+08092148130652a6faae@syzkaller.appspotmail.com
Reported-by: syzbot+a929647172775e335941@syzkaller.appspotmail.com
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 14:10:02 -03:00
Jason Gunthorpe
cc9c037343 RDMA/cma: Remove unneeded locking for req paths
The REQ flows are concerned that once the handler is called on the new
cm_id the ULP can choose to trigger a rdma_destroy_id() concurrently at
any time.

However, this is not true, while the ULP can call rdma_destroy_id(), it
immediately blocks on the handler_mutex which prevents anything harmful
from running concurrently.

Remove the confusing extra locking and refcounts and make the
handler_mutex protecting state during destroy more clear.

Link: https://lore.kernel.org/r/20200723070707.1771101-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 14:10:02 -03:00
Jason Gunthorpe
3647a28de1 RDMA/cma: Using the standard locking pattern when delivering the removal event
Whenever an event is delivered to the handler it should be done under the
handler_mutex and upon any non-zero return from the handler it should
trigger destruction of the cm_id.

cma_process_remove() skips some steps here, it is not necessarily wrong
since the state change should prevent any races, but it is confusing and
unnecessary.

Follow the standard pattern here, with the slight twist that the
transition to RDMA_CM_DEVICE_REMOVAL includes a cma_cancel_operation().

Link: https://lore.kernel.org/r/20200723070707.1771101-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 14:10:02 -03:00
Jason Gunthorpe
d54f23c09e RDMA/cma: Simplify DEVICE_REMOVAL for internal_id
cma_process_remove() triggers an unconditional rdma_destroy_id() for
internal_id's and skips the event deliver and transition through
RDMA_CM_DEVICE_REMOVAL.

This is confusing and unnecessary. internal_id always has
cma_listen_handler() as the handler, have it catch the
RDMA_CM_DEVICE_REMOVAL event and directly consume it and signal removal.

This way the FSM sequence never skips the DEVICE_REMOVAL case and the
logic in this hard to test area is simplified.

Link: https://lore.kernel.org/r/20200723070707.1771101-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 14:10:01 -03:00
Gal Pressman
d4f9cb5c5b RDMA/efa: Add EFA 0xefa1 PCI ID
Add support for 0xefa1 devices.

Link: https://lore.kernel.org/r/20200722140312.3651-5-galpress@amazon.com
Reviewed-by: Shadi Ammouri <sammouri@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 09:23:40 -03:00
Gal Pressman
a5d87b6985 RDMA/efa: User/kernel compatibility handshake mechanism
Introduce a mechanism that performs an handshake between the userspace
provider and kernel driver which verifies that the user supports all
required features in order to operate correctly.

The handshake verifies the needed functionality by comparing the reported
device caps and the provider caps. If the device reports a non-zero
capability the appropriate comp mask is required from the userspace
provider in order to allocate the context.

Link: https://lore.kernel.org/r/20200722140312.3651-4-galpress@amazon.com
Reviewed-by: Shadi Ammouri <sammouri@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 09:23:40 -03:00
Gal Pressman
da2924bdca RDMA/efa: Expose minimum SQ size
The device reports the minimum SQ size required for creation.

This patch queries the min SQ size and reports it back to the userspace
library.

Link: https://lore.kernel.org/r/20200722140312.3651-3-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Shadi Ammouri <sammouri@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 09:23:39 -03:00
Gal Pressman
556c811f24 RDMA/efa: Expose maximum TX doorbell batch
The device reports the maximum number of bytes to be written before
ringing the doorbell (zero means unlimited).

This patch queries the max batch size and reports it back to the userspace
library.

Link: https://lore.kernel.org/r/20200722140312.3651-2-galpress@amazon.com
Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com>
Reviewed-by: Firas JahJah <firasj@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 09:23:39 -03:00
Yamin Friedman
c804af2c1d IB/srpt: use new shared CQ mechanism
Have the driver use shared CQs provided by the rdma core driver.  This
provides the advantage of improved efficiency handling interrupts.

Link: https://lore.kernel.org/r/20200722135629.49467-3-maxg@mellanox.com
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 09:10:32 -03:00
Yamin Friedman
c6e6630723 IB/isert: use new shared CQ mechanism
Have the driver use shared CQs provided by the rdma core driver.  Since
this provides similar functionality to iser_comp it has been removed.

Now there is no reason to allocate very large CQs when the driver is
loaded while gaining the advantage of shared CQs. Previously when a single
connection was opened a CQ was opened for every core with enough space for
eight connections, this is a very large overhead that in most cases will
not be utilized.

Link: https://lore.kernel.org/r/20200722135629.49467-2-maxg@mellanox.com
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 09:10:31 -03:00
Yamin Friedman
d56a7852ec IB/iser: use new shared CQ mechanism
Have the driver use shared CQs provided by the rdma core driver.  Since
this provides similar functionality to iser_comp it has been removed. Now
there is no reason to allocate very large CQs when the driver is loaded
while gaining the advantage of shared CQs.

Link: https://lore.kernel.org/r/20200722135629.49467-1-maxg@mellanox.com
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Acked-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 09:10:31 -03:00
Leon Romanovsky
71cab8ef5c RDMA/mlx5: Delete unreachable code
Delete two occurrences of unreachable code discovered by the Coverity.

Link: https://lore.kernel.org/r/20200727095746.495915-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-28 16:25:37 -03:00
Colin Ian King
3540669761 qed: fix assignment of n_rq_elems to incorrect params field
Currently n_rq_elems is being assigned to params.elem_size instead of the
field params.num_elems.  Coverity is detecting this as a double assingment
to params.elem_size and reporting this as an usused value on the first
assignment.  Fix this.

Addresses-Coverity: ("Unused value")
Fixes: b6db3f71c9 ("qed: simplify chain allocation with init params struct")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-27 12:46:28 -07:00
Michael Chan
bfc6e5fbcb bnxt_en: Update firmware interface to 1.10.1.54.
Main changes are 200G support and fixing the definitions of discard and
error counters to match the hardware definitions.

Because the HWRM_PORT_PHY_QCFG message size has now exceeded the max.
encapsulated response message size of 96 bytes from the PF to the VF,
we now need to cap this message to 96 bytes for forwarding.  The forwarded
response only needs to contain the basic link status and speed information
and can be capped without adding the new information.

v2: Fix bnxt_re compile error.

Cc: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-27 11:47:33 -07:00
Li Heng
47fda651d5 RDMA/core: Fix return error value in _ib_modify_qp() to negative
The error codes in _ib_modify_qp() are supposed to be negative errno.

Fixes: 7a5c938b9e ("IB/core: Check for rdma_protocol_ib only after validating port_num")
Link: https://lore.kernel.org/r/1595645787-20375-1-git-send-email-liheng40@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Li Heng <liheng40@huawei.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-27 12:13:34 -03:00
Jason Gunthorpe
5351a56b1a RDMA/mlx5: Fix prefetch memory leak if get_prefetchable_mr fails
destroy_prefetch_work() must always be called if the work is not going
to be queued. The num_sge also should have been set to i, not i-1
which avoids the condition where it shouldn't have been called in the
first place.

Cc: stable@vger.kernel.org
Fixes: fb985e278a ("RDMA/mlx5: Use SRCU properly in ODP prefetch")
Link: https://lore.kernel.org/r/20200727095712.495652-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-27 11:50:20 -03:00
Jason Gunthorpe
31142a4ba6 RDMA/cm: Add min length checks to user structure copies
These are missing throughout ucma, it harmlessly copies garbage from
userspace, but in this new code which uses min to compute the copy length
it can result in uninitialized stack memory. Check for minimum length at
the very start.

  BUG: KMSAN: uninit-value in ucma_connect+0x2aa/0xab0 drivers/infiniband/core/ucma.c:1091
  CPU: 0 PID: 8457 Comm: syz-executor069 Not tainted 5.8.0-rc5-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Call Trace:
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0x1df/0x240 lib/dump_stack.c:118
   kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:121
   __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
   ucma_connect+0x2aa/0xab0 drivers/infiniband/core/ucma.c:1091
   ucma_write+0x5c5/0x630 drivers/infiniband/core/ucma.c:1764
   do_loop_readv_writev fs/read_write.c:737 [inline]
   do_iter_write+0x710/0xdc0 fs/read_write.c:1020
   vfs_writev fs/read_write.c:1091 [inline]
   do_writev+0x42d/0x8f0 fs/read_write.c:1134
   __do_sys_writev fs/read_write.c:1207 [inline]
   __se_sys_writev+0x9b/0xb0 fs/read_write.c:1204
   __x64_sys_writev+0x4a/0x70 fs/read_write.c:1204
   do_syscall_64+0xb0/0x150 arch/x86/entry/common.c:386
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 34e2ab57a9 ("RDMA/ucma: Extend ucma_connect to receive ECE parameters")
Fixes: 0cb15372a6 ("RDMA/cma: Connect ECE to rdma_accept")
Link: https://lore.kernel.org/r/0-v1-d5b86dab17dc+28c25-ucma_syz_min_jgg@nvidia.com
Reported-by: syzbot+086ab5ca9eafd2379aa6@syzkaller.appspotmail.com
Reported-by: syzbot+7446526858b83c8828b2@syzkaller.appspotmail.com
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-27 11:50:00 -03:00
Jason Gunthorpe
7923774368 Merge branch 'mlx5_uar' into rdma.git /for-next
Meir Lichtinger says:

====================
ConnectX-7 supports setting relaxed ordering read/write mkey attribute by
UMR, indicated by new HCA capabilities, so extend mlx5_ib driver to
configure UMR control segment
====================

Based on the mlx5-next branch at
      git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.

* branch 'mlx5_uar':
  RDMA/mlx5: Set mkey relaxed ordering by UMR with ConnectX-7
  RDMA/mlx5: Use MLX5_SET macro instead of local structure
  RDMA/mlx5: ConnectX-7 new capabilities to set relaxed ordering by UMR
2020-07-27 11:44:36 -03:00
Meir Lichtinger
896ec97353 RDMA/mlx5: Set mkey relaxed ordering by UMR with ConnectX-7
Up to ConnectX-7 UMR is not used when user passes relaxed ordering access
flag. ConnectX-7 supports setting relaxed ordering read/write mkey
attribute by UMR, indicated by new HCA capabilities.

With ConnectX-7 driver uses UMR when user set relaxed ordering access
flag, in contrast to previous silicon models. Specifically it includes
setting relvant flags of mkey context mask in UMR control segment, and
relaxed ordering write and read flags in UMR mkey context segment.

Link: https://lore.kernel.org/r/20200716105248.1423452-4-leon@kernel.org
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-27 11:19:00 -03:00
Meir Lichtinger
2224635938 RDMA/mlx5: Use MLX5_SET macro instead of local structure
Use generic mlx5 structure defined in mlx5_ifc.h to represent ConnectX
device data structures instead of using structure defined specifically for
mlx5_ib module.

Link: https://lore.kernel.org/r/20200716105248.1423452-3-leon@kernel.org
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-27 11:19:00 -03:00
David S. Miller
a57066b1a0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
The UDP reuseport conflict was a little bit tricky.

The net-next code, via bpf-next, extracted the reuseport handling
into a helper so that the BPF sk lookup code could invoke it.

At the same time, the logic for reuseport handling of unconnected
sockets changed via commit efc6b6f6c3
which changed the logic to carry on the reuseport result into the
rest of the lookup loop if we do not return immediately.

This requires moving the reuseport_has_conns() logic into the callers.

While we are here, get rid of inline directives as they do not belong
in foo.c files.

The other changes were cases of more straightforward overlapping
modifications.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-25 17:49:04 -07:00
Gustavo A. R. Silva
6f24b15925 IB/hfi1: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with the
new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7-rc7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Link: https://lore.kernel.org/r/20200721133455.GA14363@embeddedor
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24 16:59:55 -03:00
Leon Romanovsky
9b8d846924 RDMA/uverbs: Silence shiftTooManyBitsSigned warning
Fix reported by kbuild warning.

   drivers/infiniband/core/uverbs_cmd.c:1897:47: warning: Shifting signed 32-bit value by 31 bits is undefined behaviour [shiftTooManyBitsSigned]
    BUILD_BUG_ON(IB_USER_LAST_QP_ATTR_MASK == (1 << 31));
                                                 ^
Link: https://lore.kernel.org/r/20200720175627.1273096-3-leon@kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24 16:53:02 -03:00
Leon Romanovsky
29f3fe1d68 RDMA/uverbs: Remove redundant assignments
The kbuild reported the following warning, so clean whole uverbs_cmd.c
file.

   drivers/infiniband/core/uverbs_cmd.c:1066:6: warning: Variable 'ret' is reassigned a value before the old one has been used. [redundantAssignment]
    ret = uverbs_request(attrs, &cmd, sizeof(cmd));
        ^
   drivers/infiniband/core/uverbs_cmd.c:1064:0: note: Variable 'ret' is reassigned a value before the old one has been used.
    int    ret = -EINVAL;
   ^

Link: https://lore.kernel.org/r/20200720175627.1273096-2-leon@kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24 16:53:02 -03:00
Maor Gottlieb
d4d7f59643 RDMA/mlx5: Add missing srcu_read_lock in ODP implicit flow
According to the locking scheme, mlx5_ib_update_xlt() should be called
with srcu_read_lock(dev->odp->srcu). Prefetch missed this. This fixes the
below WARN from lockdep_assert_held():

  WARNING: CPU: 1 PID: 1130 at drivers/infiniband/hw/mlx5/odp.c:132 mlx5_odp_populate_xlt+0x175/0x180 [mlx5_ib]
  Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter overlay ib_srp scsi_transport_srp rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core mlx5_core mlxfw ptp pps_core
  CPU: 1 PID: 1130 Comm: kworker/u16:11 Tainted: G        W 5.8.0-rc5_for_upstream_debug_2020_07_13_11_04 #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  Workqueue: events_unbound mlx5_ib_prefetch_mr_work [mlx5_ib]
  RIP: 0010:mlx5_odp_populate_xlt+0x175/0x180 [mlx5_ib]
  Code: 08 e2 85 c0 0f 84 65 ff ff ff 49 8b 87 60 01 00 00 be ff ff ff ff 48 8d b8 b0 39 00 00 e8 93 e0 50 e1 85 c0 0f 85 45 ff ff ff <0f> 0b e9 3e ff ff ff 0f 0b eb c7 0f 1f 44 00 00 48 8b 87 98 0f 00
  RSP: 0018:ffff88840f44fc68 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff88840cc9d000 RCX: ffff88840efcd940
  RDX: 0000000000000000 RSI: ffff88844871b9b0 RDI: ffff88840efce100
  RBP: ffff88840cc9d040 R08: 0000000000000040 R09: 0000000000000001
  R10: ffff88846ced3068 R11: 0000000000000000 R12: 00000000000156ec
  R13: 0000000000000004 R14: 0000000000000004 R15: ffff888439941000
  FS:  0000000000000000(0000) GS:ffff88846fa80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f8536d12430 CR3: 0000000437a5e006 CR4: 0000000000360ea0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   mlx5_ib_update_xlt+0x37c/0x7c0 [mlx5_ib]
   pagefault_mr+0x315/0x440 [mlx5_ib]
   mlx5_ib_prefetch_mr_work+0x56/0xa0 [mlx5_ib]
   process_one_work+0x215/0x5c0
   worker_thread+0x3c/0x380
   ? process_one_work+0x5c0/0x5c0
   kthread+0x133/0x150
   ? kthread_park+0x90/0x90
   ret_from_fork+0x1f/0x30

Hold the SRCU during prefetch, even though it strictly isn't needed since
prefetch is holding the num_deferred_work it does make it easier to reason
about.

Fixes: 5256edcb98 ("RDMA/mlx5: Rework implicit ODP destroy")
Link: https://lore.kernel.org/r/20200719065747.131157-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24 16:44:06 -03:00
Leon Romanovsky
16e51f78a9 RDMA/core: Update write interface to use automatic object lifetime
The automatic object lifetime model allows us to change the write()
interface to have the same logic as the ioctl() path. Update the
create/alloc functions to be in the following format, so the code flow
will be the same:

 * Allocate objects
 * Initialize them
 * Call to the drivers, this is last step that is allowed to fail
 * Finalize object
 * Return response and allow to core code to handle abort/commit
   respectively.

Link: https://lore.kernel.org/r/20200719052223.75245-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24 16:22:30 -03:00
Leon Romanovsky
0f63ef1dd5 RDMA/core: Align abort/commit object scheme for write() and ioctl() paths
Create the same logic flow for the write() interface as we have for the
ioctl() path by making sure that the object is committed or aborted
automatically after HW object creation.

Link: https://lore.kernel.org/r/20200719052223.75245-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24 15:57:22 -03:00
Maor Gottlieb
c94e272b57 RDMA/mlx5: Allow SQ modification
Currently the SQ is set to a ready state when the RAW QP is modified to
INIT.  When the TIS is modified, e.g. to change the lag_tx_affinity, then
SQs which are already in the ready state will not be affected.

Open a window to modify the SQ behavior by setting the SQ as ready only
when QP was modified to RTS.

Link: https://lore.kernel.org/r/20200716105416.1423826-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24 15:49:19 -03:00
Alexander Lobakin
155065866b qed: add support for different page sizes for chains
Extend current infrastructure to store chain page size in a struct
and use it in all functions instead of fixed QED_CHAIN_PAGE_SIZE.
Its value remains the default one, but can be overridden in
qed_chain_init_params before chain allocation.

Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22 18:19:03 -07:00
Alexander Lobakin
b6db3f71c9 qed: simplify chain allocation with init params struct
To simplify qed_chain_alloc() prototype and call sites, introduce struct
qed_chain_init_params to specify chain params, and pass a pointer to
filled struct to the actual qed_chain_alloc() instead of a long list
of separate arguments.

Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22 18:19:03 -07:00
Jason Gunthorpe
a862192e92 RDMA/mlx5: Prevent prefetch from racing with implicit destruction
Prefetch work in mlx5_ib_prefetch_mr_work can be queued and able to run
concurrently with destruction of the implicit MR. The num_deferred_work
was intended to serialize this, but there is a race:

       CPU0                                          CPU1

    mlx5_ib_free_implicit_mr()
      xa_erase(odp_mkeys)
      synchronize_srcu()
      __xa_erase(implicit_children)
                                      mlx5_ib_prefetch_mr_work()
                                        pagefault_mr()
                                         pagefault_implicit_mr()
                                          implicit_get_child_mr()
                                           xa_cmpxchg()
                                        atomic_dec_and_test(num_deferred_mr)
      wait_event(imr->q_deferred_work)
      ib_umem_odp_release(odp_imr)
        kfree(odp_imr)

At this point in mlx5_ib_free_implicit_mr() the implicit_children list is
supposed to be empty forever so that destroy_unused_implicit_child_mr()
and related are not and will not be running.

Since it is not empty the destroy_unused_implicit_child_mr() flow ends up
touching deallocated memory as mlx5_ib_free_implicit_mr() already tore down the
imr parent.

The solution is to flush out the prefetch wq by driving num_deferred_work
to zero after creation of new prefetch work is blocked.

Fixes: 5256edcb98 ("RDMA/mlx5: Rework implicit ODP destroy")
Link: https://lore.kernel.org/r/20200719065435.130722-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-21 13:51:35 -03:00
Devesh Sharma
2bb3c32c5c RDMA/bnxt_re: Change wr posting logic to accommodate variable wqes
Modifying the post-send and post-recv to initialize the wqes slot by slot
dynamically depending on the number of max sges requested by consumer at
the time of QP creation.

Changed the QP creation logic to determine the size of SQ and RQ in 16B
slots based on the number of wqe and number of SGEs requested by consumer

Link: https://lore.kernel.org/r/1594822619-4098-6-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:32:50 -03:00
Devesh Sharma
54ace98443 RDMA/bnxt_re: Add helper data structures
Adding few helper data structure which are useful to initialize hardware
send wqe in variable wqe mode.

Adding a qp flag in HSI to indicate variable wqe is enabled for this qp.

Link: https://lore.kernel.org/r/1594822619-4098-5-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:32:50 -03:00
Devesh Sharma
5ac5396a6c RDMA/bnxt_re: Pull psn buffer dynamically based on prod
Changing the PSN management memory buffers from statically initialized to
dynamic pull scheme.

During create qp only the start pointers are initialized and during
post-send the psn buffer is pulled based on current producer index.

Adjusting post_send code to accommodate dynamic psn-pull and changing
post_recv code to match post-send code wrt pseudo flush wqe generation.

Link: https://lore.kernel.org/r/1594822619-4098-4-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:32:49 -03:00
Devesh Sharma
159fb4ceac RDMA/bnxt_re: introduce a function to allocate swq
The bnxt_re driver now allocates shadow sq and rq to maintain per wqe
wr_id and few other flags required to support variable wqe. Segregated the
allocation of shadow queue in a separate function and adjust the cqe
polling logic. The new polling logic is based on shadow queue indices.

Link: https://lore.kernel.org/r/1594822619-4098-3-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:32:49 -03:00
Devesh Sharma
1da968e0ef RDMA/bnxt_re: introduce wqe mode to select execution path
The bnxt_re driver need to decide on how much SQ and RQ memory should to
be allocated and which wqe posting/polling algorithm to use.

Making changes to set the wqe-mode to a default value during device
registration sequence. The wqe-mode is passed to the lower layer driver as
well. Going forward in the lower layer driver wqe-mode will be used to
decide execution path. Initializing the wqe-mode to static wqe type for
now.

Link: https://lore.kernel.org/r/1594822619-4098-2-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:32:49 -03:00
Kamal Heib
ca4beeee98 RDMA/qedr: Remove the query_pkey callback
Now that the query_pkey() isn't mandatory by the RDMA core for iwarp
providers, this callback can be removed from the common ops and moved to
the RoCE only ops within the qedr driver.

Link: https://lore.kernel.org/r/20200714183414.61069-8-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:18:17 -03:00
Kamal Heib
c1c5e9fd3a RDMA/i40iw: Remove the query_pkey callback
Now that the query_pkey() isn't mandatory by the RDMA core for iwarp
providers, this callback can be removed.

Link: https://lore.kernel.org/r/20200714183414.61069-7-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:18:16 -03:00
Kamal Heib
ce07f1c6a8 RDMA/cxgb4: Remove the query_pkey callback
Now that the query_pkey() isn't mandatory by the RDMA core for iwarp
providers, this callback can be removed.

Link: https://lore.kernel.org/r/20200714183414.61069-6-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:18:16 -03:00
Kamal Heib
c4995bd354 RDMA/siw: Remove the query_pkey callback
Now that the query_pkey() isn't mandatory by the RDMA core for iwarp
providers, this callback can be removed.

Link: https://lore.kernel.org/r/20200714183414.61069-5-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:18:16 -03:00
Kamal Heib
ab75a6cb8c RDMA/core: Remove query_pkey from the mandatory ops
The query_pkey() isn't mandatory for the iwarp providers, so remove this
requirement from the RDMA core.

Link: https://lore.kernel.org/r/20200714183414.61069-4-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:18:16 -03:00
Kamal Heib
6f38efca9b RDMA/core: Allocate the pkey cache only if the pkey_tbl_len is set
Allocate the pkey cache only if the pkey_tbl_len is set by the provider,
also add checks to avoid accessing the pkey cache when it not initialized.

Link: https://lore.kernel.org/r/20200714183414.61069-3-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:18:16 -03:00
Kamal Heib
90efc8b2d4 RDMA/core: Expose pkeys sysfs files only if pkey_tbl_len is set
Expose the pkeys sysfs files only if the pkey_tbl_len is set by the
providers.

Link: https://lore.kernel.org/r/20200714183414.61069-2-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20 16:18:16 -03:00
Christoph Hellwig
2f9237d4f6 dma-mapping: make support for dma ops optional
Avoid the overhead of the dma ops support for tiny builds that only
use the direct mapping.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-07-19 09:29:23 +02:00
Kees Cook
3f649ab728 treewide: Remove uninitialized_var() usage
Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
	xargs perl -pi -e \
		's/\buninitialized_var\(([^\)]+)\)/\1/g;
		 s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-16 12:35:15 -07:00
Mikhail Malygin
5f0b2a6093 RDMA/rxe: Prevent access to wr->next ptr afrer wr is posted to send queue
rxe_post_send_kernel() iterates over linked list of wr's, until the
wr->next ptr is NULL.  However if we've got an interrupt after last wr is
posted, control may be returned to the code after send completion callback
is executed and wr memory is freed.

As a result, wr->next pointer may contain incorrect value leading to
panic. Store the wr->next on the stack before posting it.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200716190340.23453-1-m.malygin@yadro.com
Signed-off-by: Mikhail Malygin <m.malygin@yadro.com>
Signed-off-by: Sergey Kojushev <s.kojushev@yadro.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 16:12:07 -03:00
Michal Kalderon
eb7f84e379 RDMA/qedr: Add EDPM max size to alloc ucontext response
User space should receive the maximum edpm size from kernel driver,
similar to other edpm/ldpm related limits.  Add an additional parameter to
the alloc_ucontext_resp structure for the edpm maximum size.

In addition, pass an indication from user-space to kernel
(and not just kernel to user) that the DPM sizes are supported.

This is for supporting backward-forward compatibility between driver and
lib for everything related to DPM transaction and limit sizes.

This should have been part of commit mentioned in Fixes tag.

Link: https://lore.kernel.org/r/20200707063100.3811-3-michal.kalderon@marvell.com
Fixes: 93a3d05f9d ("RDMA/qedr: Add kernel capability flags for dpm enabled mode")
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 16:01:55 -03:00
Michal Kalderon
bbe4f42452 RDMA/qedr: Add EDPM mode type for user-fw compatibility
In older FW versions the completion flag was treated as the ack flag in
edpm messages.  commit ff937b916e ("qed: Add EDPM mode type for user-fw
compatibility") exposed the FW option of setting which mode the QP is in
by adding a flag to the qedr <-> qed API.

This patch adds the qedr <-> libqedr interface so that the libqedr can set
the flag appropriately and qedr can pass it down to FW.  Flag is added for
backward compatibility with libqedr.

For older libs, this flag didn't exist and therefore set to zero.

Fixes: ac1b36e55a ("qedr: Add support for user context verbs")
Link: https://lore.kernel.org/r/20200707063100.3811-2-michal.kalderon@marvell.com
Signed-off-by: Yuval Bason <yuval.bason@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 16:01:55 -03:00
Christophe JAILLET
3e9fed7fb6 RDMA/usnic: switch from 'pci_' to 'dma_' API
The wrappers in include/linux/pci-dma-compat.h should go away.

The patch has been generated with the coccinelle script bellow.
It has been compile tested.

When memory is allocated, GFP_ATOMIC should be used to be consistent with
the surrounding code.

@@
@@
-    PCI_DMA_BIDIRECTIONAL
+    DMA_BIDIRECTIONAL

@@
@@
-    PCI_DMA_TODEVICE
+    DMA_TO_DEVICE

@@
@@
-    PCI_DMA_FROMDEVICE
+    DMA_FROM_DEVICE

@@
@@
-    PCI_DMA_NONE
+    DMA_NONE

@@
expression e1, e2, e3;
@@
-    pci_alloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3;
@@
-    pci_zalloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3, e4;
@@
-    pci_free_consistent(e1, e2, e3, e4)
+    dma_free_coherent(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_single(e1, e2, e3, e4)
+    dma_map_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_single(e1, e2, e3, e4)
+    dma_unmap_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4, e5;
@@
-    pci_map_page(e1, e2, e3, e4, e5)
+    dma_map_page(&e1->dev, e2, e3, e4, e5)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_page(e1, e2, e3, e4)
+    dma_unmap_page(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_sg(e1, e2, e3, e4)
+    dma_map_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_sg(e1, e2, e3, e4)
+    dma_unmap_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
+    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_device(e1, e2, e3, e4)
+    dma_sync_single_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
+    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
+    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2;
@@
-    pci_dma_mapping_error(e1, e2)
+    dma_mapping_error(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_dma_mask(e1, e2)
+    dma_set_mask(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_consistent_dma_mask(e1, e2)
+    dma_set_coherent_mask(&e1->dev, e2)

Link: https://lore.kernel.org/r/20200711073120.249146-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 15:42:05 -03:00
Gustavo A. R. Silva
535ee8cdbc IB/hfi1: Remove unnecessary fall-through markings
Reorganize the code a bit in a more standard way[1] and remove
unnecessary fall-through markings.

[1] https://lore.kernel.org/lkml/20200708054703.GR207186@unreal/

Link: https://lore.kernel.org/r/20200709235250.GA26678@embeddedor
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 15:00:02 -03:00
Yuval Basson
acca72e2b0 RDMA/qedr: SRQ's bug fixes
QP's with the same SRQ, working on different CQs and running in parallel
on different CPUs could lead to a race when maintaining the SRQ consumer
count, and leads to FW running out of SRQs. Update the consumer
atomically.  Make sure the wqe_prod is updated after the sge_prod due to
FW requirements.

Fixes: 3491c9e799 ("qedr: Add support for kernel mode SRQ's")
Link: https://lore.kernel.org/r/20200708195526.31040-1-ybason@marvell.com
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Yuval Basson <ybason@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 14:57:08 -03:00
Max Gurtovoy
317000b926 IB/isert: allocate RW ctxs according to max IO size
Current iSER target code allocates MR pool budget based on queue size.
Since there is no handshake between iSER initiator and target on max IO
size, we'll set the iSER target to support upto 16MiB IO operations and
allocate the correct number of RDMA ctxs according to the factor of MR's
per IO operation. This would guarantee sufficient size of the MR pool for
the required IO queue depth and IO size.

Link: https://lore.kernel.org/r/20200708091908.162263-1-maxg@mellanox.com
Reported-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Tested-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 14:23:22 -03:00
Daria Velikovsky
0829d2da60 RDMA/mlx5: Init dest_type when create flow
When using action drop dest_type was never assigned to any value.  Add
initialization of dest_type to -1 since 0 is valid.

Fixes: f29de9eee7 ("RDMA/mlx5: Add support for drop action in DV steering")
Link: https://lore.kernel.org/r/20200707110259.882276-1-leon@kernel.org
Signed-off-by: Daria Velikovsky <daria@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 14:11:53 -03:00
Kamal Heib
420bd9e2d9 RDMA/rxe: Remove rxe_link_layer()
Instead of returning IB_LINK_LAYER_ETHERNET from rxe_link_layer, return it
directly from get_link_layer callback and remove rxe_link_layer().

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200705104313.283034-5-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 13:57:21 -03:00
Kamal Heib
293d8440a0 RDMA/rxe: Return void from rxe_mem_init_dma()
The return value from rxe_mem_init_dma() is always 0 - change it to be
void and fix the callers accordingly.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200705104313.283034-4-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 13:57:21 -03:00
Kamal Heib
9d576eac63 RDMA/rxe: Return void from rxe_init_port_param()
The return value from rxe_init_port_param() is always 0 - change it to be
void.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200705104313.283034-3-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 13:57:20 -03:00
Kamal Heib
6112ef6282 RDMA/rxe: Drop pointless checks in rxe_init_ports
Both pkey_tbl_len and gid_tbl_len are set in rxe_init_port_param() - so no
need to check if they aren't set.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200705104313.283034-2-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 13:57:20 -03:00
Maor Gottlieb
87c4c774cb RDMA/cm: Protect access to remote_sidr_table
cm.lock must be held while accessing remote_sidr_table. This fixes the
below NULL pointer dereference.

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0002) - not-present page
  PGD 0 P4D 0
  Oops: 0002 [#1] SMP PTI
  CPU: 2 PID: 7288 Comm: udaddy Not tainted 5.7.0_for_upstream_perf_2020_06_09_15_14_20_38 #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  RIP: 0010:rb_erase+0x10d/0x360
  Code: 00 00 00 48 89 c1 48 89 d0 48 8b 50 08 48 39 ca 74 48 f6 02 01 75 af 48 8b 7a 10 48 89 c1 48 83 c9 01 48 89 78 08 48 89 42 10 <48> 89 0f 48 8b 08 48 89 0a 48 83 e1 fc 48 89 10 0f 84 b1 00 00 00
  RSP: 0018:ffffc90000f77c30 EFLAGS: 00010086
  RAX: ffff8883df27d458 RBX: ffff8883df27da58 RCX: ffff8883df27d459
  RDX: ffff8883d183fa58 RSI: ffffffffa01e8d00 RDI: 0000000000000000
  RBP: ffff8883d62ac800 R08: 0000000000000000 R09: 00000000000000ce
  R10: 000000000000000a R11: 0000000000000000 R12: ffff8883df27da00
  R13: ffffc90000f77c98 R14: 0000000000000130 R15: 0000000000000000
  FS:  00007f009f877740(0000) GS:ffff8883f1a00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 00000003d467e003 CR4: 0000000000160ee0
  Call Trace:
   cm_send_sidr_rep_locked+0x15a/0x1a0 [ib_cm]
   ib_send_cm_sidr_rep+0x2b/0x50 [ib_cm]
   cma_send_sidr_rep+0x8b/0xe0 [rdma_cm]
   __rdma_accept+0x21d/0x2b0 [rdma_cm]
   ? ucma_get_ctx+0x2b/0xe0 [rdma_ucm]
   ? _copy_from_user+0x30/0x60
   ucma_accept+0x13e/0x1e0 [rdma_ucm]
   ucma_write+0xb4/0x130 [rdma_ucm]
   vfs_write+0xad/0x1a0
   ksys_write+0x9d/0xb0
   do_syscall_64+0x48/0x130
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f009ef60924
  Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00 00 8b 05 2a ef 2c 00 48 63 ff 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 55 53 48 89 d5 48 89 f3 48 83
  RSP: 002b:00007fff843edf38 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 000055743042e1d0 RCX: 00007f009ef60924
  RDX: 0000000000000130 RSI: 00007fff843edf40 RDI: 0000000000000003
  RBP: 00007fff843ee0e0 R08: 0000000000000000 R09: 0000557430433090
  R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
  R13: 00007fff843edf40 R14: 000000000000038c R15: 00000000ffffff00
  CR2: 0000000000000000

Fixes: 6a8824a74b ("RDMA/cm: Allow ib_send_cm_sidr_rep() to be done under lock")
Link: https://lore.kernel.org/r/20200716105519.1424266-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 09:58:53 -03:00
Leon Romanovsky
0d1fd39bb2 RDMA/core: Fix race in rdma_alloc_commit_uobject()
The FD should not be installed until all of the setup is completed as the
fd_install() transfers ownership of the kref to the FD table. A thread can
race a close() and trigger concurrent rdma_alloc_commit_uobject() and
uverbs_uobject_fd_release() which, at least, triggers a safety WARN_ON:

  WARNING: CPU: 4 PID: 6913 at drivers/infiniband/core/rdma_core.c:768 uverbs_uobject_fd_release+0x202/0x230
  Kernel panic - not syncing: panic_on_warn set ...
  CPU: 4 PID: 6913 Comm: syz-executor.3 Not tainted 5.7.0-rc2 #22
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  [..]
  RIP: 0010:uverbs_uobject_fd_release+0x202/0x230
  Code: fe 4c 89 e7 e8 af 23 fe ff e9 2a ff ff ff e8 c5 fa 61 fe be 03 00 00 00 4c 89 e7 e8 68 eb f5 fe e9 13 ff ff ff e8 ae fa 61 fe <0f> 0b eb ac e8 e5 aa 3c fe e8 50 2b 86 fe e9 6a fe ff ff e8 46 2b
  RSP: 0018:ffffc90008117d88 EFLAGS: 00010293
  RAX: ffff88810e146580 RBX: 1ffff92001022fb1 RCX: ffffffff82d5b902
  RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88811951b040
  RBP: ffff88811951b000 R08: ffffed10232a3609 R09: ffffed10232a3609
  R10: ffff88811951b043 R11: 0000000000000001 R12: ffff888100a7c600
  R13: ffff888100a7c650 R14: ffffc90008117da8 R15: ffffffff82d5b700
   ? __uverbs_cleanup_ufile+0x270/0x270
   ? uverbs_uobject_fd_release+0x202/0x230
   ? uverbs_uobject_fd_release+0x202/0x230
   ? __uverbs_cleanup_ufile+0x270/0x270
   ? locks_remove_file+0x282/0x3d0
   ? security_file_free+0xaa/0xd0
   __fput+0x2be/0x770
   task_work_run+0x10e/0x1b0
   exit_to_usermode_loop+0x145/0x170
   do_syscall_64+0x2d0/0x390
   ? prepare_exit_to_usermode+0x17a/0x230
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x414da7
  Code: 00 00 0f 05 48 3d 00 f0 ff ff 77 3f f3 c3 0f 1f 44 00 00 53 89 fb 48 83 ec 10 e8 f4 fb ff ff 89 df 89 c2 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2b 89 d7 89 44 24 0c e8 36 fc ff ff 8b 44 24
  RSP: 002b:00007fff39d379d0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
  RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000414da7
  RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000003
  RBP: 00007fff39d37a3c R08: 0000000400000000 R09: 0000000400000000
  R10: 00007fff39d37910 R11: 0000000000000293 R12: 0000000000000001
  R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000003

Reorder so that fd_install() is the last thing done in
rdma_alloc_commit_uobject().

Fixes: aba94548c9 ("IB/uverbs: Move the FD uobj type struct file allocation to alloc_commit")
Link: https://lore.kernel.org/r/20200716102059.1420681-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 09:58:53 -03:00
Xi Wang
79d5208386 RDMA/hns: Fix wrong PBL offset when VA is not aligned to PAGE_SIZE
ROCE uses "VA % buf_page_size" to caclulate the offset in the PBL's first
page, the actual PA corresponding to the MR's VA is equal to MR's PA plus
this offset. The first PA in PBL has already been aligned to PAGE_SIZE
after calling ib_umem_get(), but the MR's VA may not. If the buf_page_size
is smaller than the PAGE_SIZE, this will lead the HW to access the wrong
memory because the offset is smaller than expected.

Fixes: 9b2cf76c9f ("RDMA/hns: Optimize PBL buffer allocation process")
Link: https://lore.kernel.org/r/1594726935-45666-1-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 09:55:01 -03:00
Weihang Li
7b9bd73ed1 RDMA/hns: Fix wrong assignment of lp_pktn_ini in QPC
The RoCE Engine will schedule to another QP after one has sent
(2 ^ lp_pktn_ini) packets. lp_pktn_ini is set in QPC and should be
calculated from 2 factors:

1. current MTU as a integer
2. the RoCE Engine's maximum slice length 64KB

But the driver use MTU as a enum ib_mtu and the max inline capability, the
lp_pktn_ini will be much bigger than expected which may cause traffic of
some QPs to never get scheduled.

Fixes: b713128de7 ("RDMA/hns: Adjust lp_pktn_ini dynamically")
Link: https://lore.kernel.org/r/1594726138-49294-1-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 09:52:14 -03:00
Maor Gottlieb
c3d6057e07 RDMA/mlx5: Use xa_lock_irq when access to SRQ table
SRQ table is accessed both from interrupt and process context,
therefore we must use xa_lock_irq.

   inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
   kworker/u17:9/8573   takes:
   ffff8883e3503d30 (&xa->xa_lock#13){?...}-{2:2}, at: mlx5_cmd_get_srq+0x18/0x70 [mlx5_ib]
   {IN-HARDIRQ-W} state was registered at:
     lock_acquire+0xb9/0x3a0
     _raw_spin_lock+0x25/0x30
     srq_event_notifier+0x2b/0xc0 [mlx5_ib]
     notifier_call_chain+0x45/0x70
     __atomic_notifier_call_chain+0x69/0x100
     forward_event+0x36/0xc0 [mlx5_core]
     notifier_call_chain+0x45/0x70
     __atomic_notifier_call_chain+0x69/0x100
     mlx5_eq_async_int+0xc5/0x160 [mlx5_core]
     notifier_call_chain+0x45/0x70
     __atomic_notifier_call_chain+0x69/0x100
     mlx5_irq_int_handler+0x19/0x30 [mlx5_core]
     __handle_irq_event_percpu+0x43/0x2a0
     handle_irq_event_percpu+0x30/0x70
     handle_irq_event+0x34/0x60
     handle_edge_irq+0x7c/0x1b0
     do_IRQ+0x60/0x110
     ret_from_intr+0x0/0x2a
     default_idle+0x34/0x160
     do_idle+0x1ec/0x220
     cpu_startup_entry+0x19/0x20
     start_secondary+0x153/0x1a0
     secondary_startup_64+0xa4/0xb0
   irq event stamp: 20907
   hardirqs last  enabled at (20907):   _raw_spin_unlock_irq+0x24/0x30
   hardirqs last disabled at (20906):   _raw_spin_lock_irq+0xf/0x40
   softirqs last  enabled at (20746):   __do_softirq+0x2c9/0x436
   softirqs last disabled at (20681):   irq_exit+0xb3/0xc0

   other info that might help us debug this:
    Possible unsafe locking scenario:

          CPU0
          ----
     lock(&xa->xa_lock#13);
     <Interrupt>
       lock(&xa->xa_lock#13);

    *** DEADLOCK ***

   2 locks held by kworker/u17:9/8573:
    #0: ffff888295218d38 ((wq_completion)mlx5_ib_page_fault){+.+.}-{0:0}, at: process_one_work+0x1f1/0x5f0
    #1: ffff888401647e78 ((work_completion)(&pfault->work)){+.+.}-{0:0}, at: process_one_work+0x1f1/0x5f0

   stack backtrace:
   CPU: 0 PID: 8573 Comm: kworker/u17:9 Tainted: GO      5.7.0_for_upstream_min_debug_2020_06_14_11_31_46_41 #1
   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
   Workqueue: mlx5_ib_page_fault mlx5_ib_eqe_pf_action [mlx5_ib]
   Call Trace:
    dump_stack+0x71/0x9b
    mark_lock+0x4f2/0x590
    ? print_shortest_lock_dependencies+0x200/0x200
    __lock_acquire+0xa00/0x1eb0
    lock_acquire+0xb9/0x3a0
    ? mlx5_cmd_get_srq+0x18/0x70 [mlx5_ib]
    _raw_spin_lock+0x25/0x30
    ? mlx5_cmd_get_srq+0x18/0x70 [mlx5_ib]
    mlx5_cmd_get_srq+0x18/0x70 [mlx5_ib]
    mlx5_ib_eqe_pf_action+0x257/0xa30 [mlx5_ib]
    ? process_one_work+0x209/0x5f0
    process_one_work+0x27b/0x5f0
    ? __schedule+0x280/0x7e0
    worker_thread+0x2d/0x3c0
    ? process_one_work+0x5f0/0x5f0
    kthread+0x111/0x130
    ? kthread_park+0x90/0x90
    ret_from_fork+0x24/0x30

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/20200712102641.15210-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 09:28:16 -03:00
David S. Miller
71930d6102 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
All conflicts seemed rather trivial, with some guidance from
Saeed Mameed on the tc_ct.c one.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-11 00:46:00 -07:00
Mark Zhang
cbeb7d896c RDMA/counter: Allow manually bind QPs with different pids to same counter
In manual mode allow bind user QPs with different pids to same counter,
since this is allowed in auto mode.
Bind kernel QPs and user QPs to the same counter are not allowed.

Fixes: 1bd8e0a9d0 ("RDMA/counter: Allow manual mode configuration support")
Link: https://lore.kernel.org/r/20200702082933.424537-4-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-10 16:50:53 -03:00
Mark Zhang
c9f557421e RDMA/counter: Only bind user QPs in auto mode
In auto mode only bind user QPs to a dynamic counter, since this feature
is mainly used for system statistic and diagnostic purpose, while there's
no need to counter kernel QPs so far.

Fixes: 99fa331dc8 ("RDMA/counter: Add "auto" configuration mode support")
Link: https://lore.kernel.org/r/20200702082933.424537-3-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-10 16:50:53 -03:00
Mark Zhang
7c97f3aded RDMA/counter: Add PID category support in auto mode
With the "PID" category QPs have same PID will be bound to same counter;
If this category is not set then QPs have different PIDs will be bound
to same counter.

This is implemented for 2 reasons:
1. The counter is a limited resource, while there may be dozens of
   applications, each of which creates several types of QPs, which means
   it may doesn't have enough counter.
2. The system administrator needs all QPs created by all applications
   with same type bound to one counter.

The counter name and PID is only make sense when "PID" category are
configured.

This category can also be used in combine with others, e.g. QP type.

Link: https://lore.kernel.org/r/20200702082933.424537-2-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-10 16:50:53 -03:00
Gal Pressman
6c72a038bf RDMA/mlx5: Remove unused to_mibmr function
The to_mibmr function is unused, remove it.

Link: https://lore.kernel.org/r/20200705141143.47303-1-galpress@amazon.com
Signed-off-by: Gal Pressman <galpress@amazon.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-10 16:40:39 -03:00
Leon Romanovsky
0a03715068 RDMA/mlx5: Set PD pointers for the error flow unwind
ib_pd is accessed internally during destroy of the TIR/TIS, but PD
can be not set yet. This leading to the following kernel panic.

  BUG: kernel NULL pointer dereference, address: 0000000000000074
  PGD 8000000079eaa067 P4D 8000000079eaa067 PUD 7ae81067 PMD 0 Oops: 0000 [#1] SMP PTI
  CPU: 1 PID: 709 Comm: syz-executor.0 Not tainted 5.8.0-rc3 #41 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  RIP: 0010:destroy_raw_packet_qp_tis drivers/infiniband/hw/mlx5/qp.c:1189 [inline]
  RIP: 0010:destroy_raw_packet_qp drivers/infiniband/hw/mlx5/qp.c:1527 [inline]
  RIP: 0010:destroy_qp_common+0x2ca/0x4f0 drivers/infiniband/hw/mlx5/qp.c:2397
  Code: 00 85 c0 74 2e e8 56 18 55 ff 48 8d b3 28 01 00 00 48 89 ef e8 d7 d3 ff ff 48 8b 43 08 8b b3 c0 01 00 00 48 8b bd a8 0a 00 00 <0f> b7 50 74 e8 0d 6a fe ff e8 28 18 55 ff 49 8d 55 50 4c 89 f1 48
  RSP: 0018:ffffc900007bbac8 EFLAGS: 00010293
  RAX: 0000000000000000 RBX: ffff88807949e800 RCX: 0000000000000998
  RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88807c180140
  RBP: ffff88807b50c000 R08: 000000000002d379 R09: ffffc900007bba00
  R10: 0000000000000001 R11: 000000000002d358 R12: ffff888076f37000
  R13: ffff88807949e9c8 R14: ffffc900007bbe08 R15: ffff888076f37000
  FS:  00000000019bf940(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000074 CR3: 0000000076d68004 CR4: 0000000000360ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   mlx5_ib_create_qp+0xf36/0xf90 drivers/infiniband/hw/mlx5/qp.c:3014
   _ib_create_qp drivers/infiniband/core/core_priv.h:333 [inline]
   create_qp+0x57f/0xd20 drivers/infiniband/core/uverbs_cmd.c:1443
   ib_uverbs_create_qp+0xcf/0x100 drivers/infiniband/core/uverbs_cmd.c:1564
   ib_uverbs_write+0x5fa/0x780 drivers/infiniband/core/uverbs_main.c:664
   __vfs_write+0x3f/0x90 fs/read_write.c:495
   vfs_write+0xc7/0x1f0 fs/read_write.c:559
   ksys_write+0x5e/0x110 fs/read_write.c:612
   do_syscall_64+0x3e/0x70 arch/x86/entry/common.c:359
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x466479
  Code: Bad RIP value.
  RSP: 002b:00007ffd057b62b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000466479
  RDX: 0000000000000070 RSI: 0000000020000240 RDI: 0000000000000003
  RBP: 00000000019bf8fc R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
  R13: 0000000000000bf6 R14: 00000000004cb859 R15: 00000000006fefc0

Fixes: 6c41965d64 ("RDMA/mlx5: Don't access ib_qp fields in internal destroy QP path")
Link: https://lore.kernel.org/r/20200707110612.882962-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-08 20:15:59 -03:00
Aya Levin
530c8632b5 IB/mlx5: Fix 50G per lane indication
Some released FW versions mistakenly don't set the capability that 50G per
lane link-modes are supported for VFs (ptys_extended_ethernet capability
bit).

Use PTYS.ext_eth_proto_capability instead, as this indication is always
accurate. If PTYS.ext_eth_proto_capability is valid
(has a non-zero value) conclude that the HCA supports 50G per lane.

Otherwise, conclude that the HCA doesn't support 50G per lane.

Fixes: 08e8676f16 ("IB/mlx5: Add support for 50Gbps per lane link modes")
Link: https://lore.kernel.org/r/20200707110612.882962-3-leon@kernel.org
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-08 20:15:58 -03:00
Kamal Heib
04340645f6 RDMA/siw: Fix reporting vendor_part_id
Move the initialization of the vendor_part_id to be before calling
ib_register_device(), this is needed because the query_device() callback
is called from the context of ib_register_device() before initializing the
vendor_part_id, so the reported value is wrong.

Fixes: bdcf26bf9b ("rdma/siw: network and RDMA core interface")
Link: https://lore.kernel.org/r/20200707130931.444724-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-08 09:24:45 -03:00
Leon Romanovsky
1e2b5a90de RDMA/mlx5: Delete one-time used functions
Merge them into their callers, usually the only thing the caller did was
to call the one function, so this is clearer.

Link: https://lore.kernel.org/r/20200702081809.423482-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-07 14:08:03 -03:00
Leon Romanovsky
d8b7515e25 RDMA/mlx5: Cleanup DEVX initialization flow
Move DEVX initialization and cleanup flows to the devx.c instead of having
almost empty functions in main.c

Link: https://lore.kernel.org/r/20200702081809.423482-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-07 14:05:51 -03:00
Leon Romanovsky
f7c4ffda0c RDMA/mlx5: Separate flow steering logic from main.c
Move flow steering logic to be in separate file and rename flow.c to be
fs.c because it is better describe the content.

Link: https://lore.kernel.org/r/20200702081809.423482-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-07 14:05:51 -03:00
Leon Romanovsky
64825827ae RDMA/mlx5: Separate counters from main.c
There are number of counters types supported in mlx5_ib: HW counters,
congestion counters, Q-counters and flow counters. Almost all supporting
code was placed in main.c that made almost impossible to maintain the code
anymore. Let's create separate code namespace for the counters to easy
future generalization effort.

Link: https://lore.kernel.org/r/20200702081809.423482-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-07 14:05:51 -03:00
Leon Romanovsky
b572ebe667 RDMA/mlx5: Separate restrack callbacks initialization from main.c
The restrack code has separate .c, so move callbacks initialization to
that file to improve code locality.

Link: https://lore.kernel.org/r/20200702081809.423482-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-07 14:05:51 -03:00
Leon Romanovsky
ac47bf5ef1 RDMA/mlx5: Limit the scope of mlx5_ib_enable_driver function
The mlx5_ib_enable_driver() is local function and doesn't need to be
shared in mlx5_ib, so change it's signature to have static keyword in it.

Link: https://lore.kernel.org/r/20200702081809.423482-2-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-07 14:05:51 -03:00
Xi Wang
cc33b23e1e RDMA/hns: Optimize MTR level-0 addressing to access huge page
If hns ROCEE is set to level-0 addressing, the length of the entire buffer
can be used as the page size. The driver needn't to split the buffer into
small units because all pages are continuous.

Link: https://lore.kernel.org/r/1593525696-12570-1-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-07 13:15:10 -03:00
Zhu Yanjun
5c99274be8 RDMA/rxe: Skip dgid check in loopback mode
In the loopback tests, the following call trace occurs.

 Call Trace:
  __rxe_do_task+0x1a/0x30 [rdma_rxe]
  rxe_qp_destroy+0x61/0xa0 [rdma_rxe]
  rxe_destroy_qp+0x20/0x60 [rdma_rxe]
  ib_destroy_qp_user+0xcc/0x220 [ib_core]
  uverbs_free_qp+0x3c/0xc0 [ib_uverbs]
  destroy_hw_idr_uobject+0x24/0x70 [ib_uverbs]
  uverbs_destroy_uobject+0x43/0x1b0 [ib_uverbs]
  uobj_destroy+0x41/0x70 [ib_uverbs]
  __uobj_get_destroy+0x39/0x70 [ib_uverbs]
  ib_uverbs_destroy_qp+0x88/0xc0 [ib_uverbs]
  ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xb9/0xf0 [ib_uverbs]
  ib_uverbs_cmd_verbs+0xb16/0xc30 [ib_uverbs]

The root cause is that the actual RDMA connection is not created in the
loopback tests and the rxe_match_dgid will fail randomly.

To fix this call trace which appear in the loopback tests, skip check of
the dgid.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200630123605.446959-1-leon@kernel.org
Signed-off-by: Zhu Yanjun <yanjunz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-07 13:13:32 -03:00
Leon Romanovsky
28ad5f65c3 RDMA: Move XRCD to be under ib_core responsibility
Update the code to allocate and free ib_xrcd structure in the
ib_core instead of inside drivers.

Link: https://lore.kernel.org/r/20200630101855.368895-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 20:11:24 -03:00
Leon Romanovsky
3b023e1b68 RDMA/core: Create and destroy counters in the ib_core
Move allocation and destruction of counters under ib_core responsibility

Link: https://lore.kernel.org/r/20200630101855.368895-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 20:04:40 -03:00
Yishai Hadas
6c01e6b218 IB/uverbs: Expose UAPI to query MR
Expose UAPI to query MR, this will let user space application that
didn't allocate the MR but has access to by owning the matching command
FD to retrieve its information.

Link: https://lore.kernel.org/r/20200630093916.332097-8-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 19:50:34 -03:00
Yishai Hadas
05f71ef979 RDMA/mlx5: Introduce UAPI to query PD attributes
Introduce UAPI to query PD attributes, this can be used to retrieve PD
attributes by having the PD handle of the created one and owning the
command FD for the ucontxet.

Link: https://lore.kernel.org/r/20200630093916.332097-7-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 19:50:34 -03:00
Yishai Hadas
0fb556b2b5 RDMA/mlx5: Implement the query ucontext functionality
Implement the query ucontext functionality by returning the original
ucontext data as part of an extra mlx5 attribute that holds the driver
UAPI response.

Link: https://lore.kernel.org/r/20200630093916.332097-6-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 19:50:33 -03:00
Yishai Hadas
45ec21c971 RDMA/mlx5: Refactor mlx5_ib_alloc_ucontext() response
Refactor mlx5_ib_alloc_ucontext() to set its response fields in a
cleaner way.

It includes,
- Move the relevant code to a self contained function.
- Calculate the response length once and drop redundant code all around.
- Reuse previously set ucontext fields once preparing the response.

The self contained function will be used in next patch as part of
implementing the query ucontext functionality.

Link: https://lore.kernel.org/r/20200630093916.332097-5-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 19:50:33 -03:00
Yishai Hadas
1c8fb1ea5a IB/uverbs: Expose UAPI to query ucontext
Expose UAPI to query ucontext, this will let user space application that
didn't allocate the ucontext but has access to by owning the matching
command FD to retrieve the ucontext information.

Link: https://lore.kernel.org/r/20200630093916.332097-4-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 19:50:33 -03:00
Yishai Hadas
04c0a5fcfc IB/uverbs: Set IOVA on IB MR in uverbs layer
Set IOVA on IB MR in uverbs layer to let all drivers have it, this
includes both reg/rereg MR flows.
As part of this change cleaned-up this setting from the drivers that
already did it by themselves in their user flows.

Fixes: e6f0330106 ("mlx4_ib: set user mr attributes in struct ib_mr")
Link: https://lore.kernel.org/r/20200630093916.332097-3-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 19:50:33 -03:00
Yishai Hadas
263427526f IB/uverbs: Enable CQ ioctl commands by default
Enable CQ ioctl commands by default, this functionality is fully mature
to be used over ioctl, no reason to maintain any more the EXP KCONFIG
entry to enable it.

Link: https://lore.kernel.org/r/20200630093916.332097-2-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 19:50:33 -03:00
Maor Gottlieb
6f3ca6f4f5 RDMA/core: Optimize XRC target lookup
Replace the mutex with read write semaphore and use xarray instead of
linked list for XRC target QPs. This will give faster XRC target
lookup. In addition, when QP is closed, don't insert it back to the xarray
if the destroy command failed.

Link: https://lore.kernel.org/r/20200706122716.647338-4-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 19:32:23 -03:00
Maor Gottlieb
b73efcb26e RDMA/core: Clean ib_alloc_xrcd() and reuse it to allocate XRC domain
ib_alloc_xrcd() already does the required initialization, so move the
uverbs to call it and save code duplication, while cleaning the function
argument lists of that function.

Link: https://lore.kernel.org/r/20200706122716.647338-3-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 19:32:23 -03:00
Leon Romanovsky
f4375443b7 RDMA/mlx5: Get XRCD number directly for the internal use
The mlx5_ib creates XRC domain and uses for creating internal SRQ.
However all that is needed is XRCD number and not full blown ib_xrcd
objects.

Update the code to get and store the number only.

Link: https://lore.kernel.org/r/20200706122716.647338-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 19:32:23 -03:00
Gal Pressman
42a3b15396 RDMA: Remove the udata parameter from alloc_mr callback
Allocating an MR flow can only be initiated by kernel users, and not from
userspace so a udata parameter is redundant.

Link: https://lore.kernel.org/r/20200706120343.10816-4-galpress@amazon.com
Signed-off-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 19:25:53 -03:00
Gal Pressman
b64b74b1d5 RDMA/core: Remove ib_alloc_mr_user function
Allocating an MR flow can only be initiated by kernel users, and not from
userspace. As a result, the udata parameter is always being passed as
NULL. Rename ib_alloc_mr_user function to ib_alloc_mr and remove the udata
parameter.

Link: https://lore.kernel.org/r/20200706120343.10816-3-galpress@amazon.com
Signed-off-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 19:25:53 -03:00
Gal Pressman
c5f42b2105 RDMA/core: Check for error instead of success in alloc MR function
The common kernel pattern is to check for error, not success.  Flip the if
statement accordingly and keep the main flow unindented.

Link: https://lore.kernel.org/r/20200706120343.10816-2-galpress@amazon.com
Signed-off-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 19:24:20 -03:00
Chuck Lever
c367124e6c RDMA/core: Clean up tracepoint headers
There's no need for core/trace.c to include rdma/ib_verbs.h twice.

Link: https://lore.kernel.org/r/20200702141946.3775.51943.stgit@klimt.1015granger.net
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 14:54:46 -03:00
Jason Gunthorpe
f11f3f76c7 Merge branch 'mlx5_ipoib_qpn' into rdma.git for-next
Michael Guralnik says:

====================
This series handles IPoIB child interface creation with setting
interface's HW address.

In current implementation, lladdr requested by user is ignored and
overwritten. Child interface gets the same GID as the parent interface and
a QP number which is assigned by the underlying drivers.

In this series we fix this behavior so that user's requested address is
assigned to the newly created interface.

As specific QP number request is not supported for all vendors, QP number
requested by user will still be overwritten when this is not supported.

Behavior of creation of child interfaces through the sysfs mechanism or
without specifying a requested address, stays the same.
====================

Based on the mlx5-next branch at
      git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.

* branch 'mlx5_ipoib_qpn':
  RDMA/ipoib: Handle user-supplied address when creating child
  net/mlx5: Enable QP number request when creating IPoIB underlay QP

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 14:29:58 -03:00
Michael Guralnik
87fb5c1ccb RDMA/ipoib: Handle user-supplied address when creating child
Use the address supplied by user when creating a child interface.

Previously, the address requested by the user was ignored and overridden
with parent's GID and the random QP number assigned to the child.

Link: https://lore.kernel.org/r/20200623110105.1225750-3-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 12:57:58 -03:00
Maor Gottlieb
d473f4dc2f RDMA/mlx5: Introduce ODP prefetch counter
For debugging purpose it will be easier to understand if prefetch works
okay if it has its own counter. Introduce ODP prefetch counter and count
per MR the total number of prefetched pages.

In addition remove comment which is not relevant anymore and anyway not in
the correct place.

Link: https://lore.kernel.org/r/20200621104147.53795-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-03 09:16:25 -03:00
Divya Indi
f427f4d621 IB/sa: Resolv use-after-free in ib_nl_make_request()
There is a race condition where ib_nl_make_request() inserts the request
data into the linked list but the timer in ib_nl_request_timeout() can see
it and destroy it before ib_nl_send_msg() is done touching it. This could
happen, for instance, if there is a long delay allocating memory during
nlmsg_new()

This causes a use-after-free in the send_mad() thread:

  [<ffffffffa02f43cb>] ? ib_pack+0x17b/0x240 [ib_core]
  [ <ffffffffa032aef1>] ib_sa_path_rec_get+0x181/0x200 [ib_sa]
  [<ffffffffa0379db0>] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm]
  [<ffffffffa0374450>] ? cma_bind_port+0xa0/0xa0 [rdma_cm]
  [<ffffffffa040f850>] ? rds_rdma_cm_event_handler_cmn+0x850/0x850 [rds_rdma]
  [<ffffffffa040f22c>] rds_rdma_cm_event_handler_cmn+0x22c/0x850 [rds_rdma]
  [<ffffffffa040f860>] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma]
  [<ffffffffa037778e>] addr_handler+0x9e/0x140 [rdma_cm]
  [<ffffffffa026cdb4>] process_req+0x134/0x190 [ib_addr]
  [<ffffffff810a02f9>] process_one_work+0x169/0x4a0
  [<ffffffff810a0b2b>] worker_thread+0x5b/0x560
  [<ffffffff810a0ad0>] ? flush_delayed_work+0x50/0x50
  [<ffffffff810a68fb>] kthread+0xcb/0xf0
  [<ffffffff816ec49a>] ? __schedule+0x24a/0x810
  [<ffffffff816ec49a>] ? __schedule+0x24a/0x810
  [<ffffffff810a6830>] ? kthread_create_on_node+0x180/0x180
  [<ffffffff816f25a7>] ret_from_fork+0x47/0x90
  [<ffffffff810a6830>] ? kthread_create_on_node+0x180/0x180

The ownership rule is once the request is on the list, ownership transfers
to the list and the local thread can't touch it any more, just like for
the normal MAD case in send_mad().

Thus, instead of adding before send and then trying to delete after on
errors, move the entire thing under the spinlock so that the send and
update of the lists are atomic to the conurrent threads. Lightly reoganize
things so spinlock safe memory allocations are done in the final NL send
path and the rest of the setup work is done before and outside the lock.

Fixes: 3ebd2fd0d0 ("IB/sa: Put netlink request into the request list before sending")
Link: https://lore.kernel.org/r/1592964789-14533-1-git-send-email-divya.indi@oracle.com
Signed-off-by: Divya Indi <divya.indi@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-02 16:05:12 -03:00
Jason Gunthorpe
0cb42c0265 RDMA/core: Fix bogus WARN_ON during ib_unregister_device_queued()
ib_unregister_device_queued() can only be used by drivers using the new
dealloc_device callback flow, and it has a safety WARN_ON to ensure
drivers are using it properly.

However, if unregister and register are raced there is a special
destruction path that maintains the uniform error handling semantic of
'caller does ib_dealloc_device() on failure'. This requires disabling the
dealloc_device callback which triggers the WARN_ON.

Instead of using NULL to disable the callback use a special function
pointer so the WARN_ON does not trigger.

Fixes: d0899892ed ("RDMA/device: Provide APIs from the core code to help unregistration")
Link: https://lore.kernel.org/r/0-v1-a36d512e0a99+762-syz_dealloc_driver_jgg@nvidia.com
Reported-by: syzbot+4088ed905e4ae2b0e13b@syzkaller.appspotmail.com
Suggested-by: Hillf Danton <hdanton@sina.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-02 14:25:42 -03:00
Kaike Wan
2315ec12ee IB/hfi1: Do not destroy link_wq when the device is shut down
The workqueue link_wq should only be destroyed when the hfi1 driver is
unloaded, not when the device is shut down.

Fixes: 71d47008ca ("IB/hfi1: Create workqueue for link events")
Link: https://lore.kernel.org/r/20200623204053.107638.70315.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-02 13:54:50 -03:00
Kaike Wan
28b70cd923 IB/hfi1: Do not destroy hfi1_wq when the device is shut down
The workqueue hfi1_wq is destroyed in function shutdown_device(), which is
called by either shutdown_one() or remove_one(). The function
shutdown_one() is called when the kernel is rebooted while remove_one() is
called when the hfi1 driver is unloaded. When the kernel is rebooted,
hfi1_wq is destroyed while all qps are still active, leading to a kernel
crash:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000102
  IP: [<ffffffff94cb7b02>] __queue_work+0x32/0x3e0
  PGD 0
  Oops: 0000 [#1] SMP
  Modules linked in: dm_round_robin nvme_rdma(OE) nvme_fabrics(OE) nvme_core(OE) ib_isert iscsi_target_mod target_core_mod ib_ucm mlx4_ib iTCO_wdt iTCO_vendor_support mxm_wmi sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm rpcrdma sunrpc irqbypass crc32_pclmul ghash_clmulni_intel rdma_ucm aesni_intel ib_uverbs lrw gf128mul opa_vnic glue_helper ablk_helper ib_iser cryptd ib_umad rdma_cm iw_cm ses enclosure libiscsi scsi_transport_sas pcspkr joydev ib_ipoib(OE) scsi_transport_iscsi ib_cm sg ipmi_ssif mei_me lpc_ich i2c_i801 mei ioatdma ipmi_si dm_multipath ipmi_devintf ipmi_msghandler wmi acpi_pad acpi_power_meter hangcheck_timer ip_tables ext4 mbcache jbd2 mlx4_en sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm hfi1(OE)
  crct10dif_pclmul crct10dif_common crc32c_intel drm ahci mlx4_core libahci rdmavt(OE) igb megaraid_sas ib_core libata drm_panel_orientation_quirks ptp pps_core devlink dca i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod
  CPU: 19 PID: 0 Comm: swapper/19 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1
  Hardware name: Phegda X2226A/S2600CW, BIOS SE5C610.86B.01.01.0024.021320181901 02/13/2018
  task: ffff8a799ba0d140 ti: ffff8a799bad8000 task.ti: ffff8a799bad8000
  RIP: 0010:[<ffffffff94cb7b02>] [<ffffffff94cb7b02>] __queue_work+0x32/0x3e0
  RSP: 0018:ffff8a90dde43d80 EFLAGS: 00010046
  RAX: 0000000000000082 RBX: 0000000000000086 RCX: 0000000000000000
  RDX: ffff8a90b924fcb8 RSI: 0000000000000000 RDI: 000000000000001b
  RBP: ffff8a90dde43db8 R08: ffff8a799ba0d6d8 R09: ffff8a90dde53900
  R10: 0000000000000002 R11: ffff8a90dde43de8 R12: ffff8a90b924fcb8
  R13: 000000000000001b R14: 0000000000000000 R15: ffff8a90d2890000
  FS: 0000000000000000(0000) GS:ffff8a90dde40000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000102 CR3: 0000001a70410000 CR4: 00000000001607e0
  Call Trace:
  [<ffffffff94cb8105>] queue_work_on+0x45/0x50
  [<ffffffffc03f781e>] _hfi1_schedule_send+0x6e/0xc0 [hfi1]
  [<ffffffffc03f78a2>] hfi1_schedule_send+0x32/0x70 [hfi1]
  [<ffffffffc02cf2d9>] rvt_rc_timeout+0xe9/0x130 [rdmavt]
  [<ffffffff94ce563a>] ? trigger_load_balance+0x6a/0x280
  [<ffffffffc02cf1f0>] ? rvt_free_qpn+0x40/0x40 [rdmavt]
  [<ffffffff94ca7f58>] call_timer_fn+0x38/0x110
  [<ffffffffc02cf1f0>] ? rvt_free_qpn+0x40/0x40 [rdmavt]
  [<ffffffff94caa3bd>] run_timer_softirq+0x24d/0x300
  [<ffffffff94ca0f05>] __do_softirq+0xf5/0x280
  [<ffffffff9537832c>] call_softirq+0x1c/0x30
  [<ffffffff94c2e675>] do_softirq+0x65/0xa0
  [<ffffffff94ca1285>] irq_exit+0x105/0x110
  [<ffffffff953796c8>] smp_apic_timer_interrupt+0x48/0x60
  [<ffffffff95375df2>] apic_timer_interrupt+0x162/0x170
  <EOI>
  [<ffffffff951adfb7>] ? cpuidle_enter_state+0x57/0xd0
  [<ffffffff951ae10e>] cpuidle_idle_call+0xde/0x230
  [<ffffffff94c366de>] arch_cpu_idle+0xe/0xc0
  [<ffffffff94cfc3ba>] cpu_startup_entry+0x14a/0x1e0
  [<ffffffff94c57db7>] start_secondary+0x1f7/0x270
  [<ffffffff94c000d5>] start_cpu+0x5/0x14

The solution is to destroy the workqueue only when the hfi1 driver is
unloaded, not when the device is shut down. In addition, when the device
is shut down, no more work should be scheduled on the workqueues and the
workqueues are flushed.

Fixes: 8d3e71136a ("IB/{hfi1, qib}: Add handling of kernel restart")
Link: https://lore.kernel.org/r/20200623204047.107638.77646.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-02 13:54:50 -03:00
Leon Romanovsky
f81b4565c1 RDMA/mlx5: Fix legacy IPoIB QP initialization
Legacy IPoIB sets IB_QP_CREATE_NETIF_QP QP create flag and because mlx5
doesn't use this flag, the process_create_flags() failed to create IPoIB
QPs.

Fixes: 2978975ce7 ("RDMA/mlx5: Process create QP flags in one place")
Link: https://lore.kernel.org/r/20200630122147.445847-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-02 11:17:10 -03:00
Nathan Chancellor
e18321acfb IB/hfi1: Add explicit cast OPA_MTU_8192 to 'enum ib_mtu'
Clang warns:

drivers/infiniband/hw/hfi1/qp.c:198:9: warning: implicit conversion from enumeration type 'enum opa_mtu' to different enumeration type 'enum ib_mtu' [-Wenum-conversion]
                mtu = OPA_MTU_8192;
                    ~ ^~~~~~~~~~~~

enum opa_mtu extends enum ib_mtu. There are typically two ways to deal
with this:

* Remove the expected types and just use 'int' for all parameters and
  types.

* Explicitly cast the enums between each other.

This driver chooses to do the later so do the same thing here.

Fixes: 6d72344cf6 ("IB/ipoib: Increase ipoib Datagram mode MTU's upper limit")
Link: https://lore.kernel.org/r/20200623005224.492239-1-natechancellor@gmail.com
Link: https://github.com/ClangBuiltLinux/linux/issues/1062
Link: https://lore.kernel.org/linux-rdma/20200527040350.GA3118979@ubuntu-s3-xlarge-x86/
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-02 11:16:52 -03:00
Jason Gunthorpe
65936bf25f RDMA/ipoib: Fix ABBA deadlock with ipoib_reap_ah()
ipoib_mcast_carrier_on_task() insanely open codes a rtnl_lock() such that
the only time flush_workqueue() can be called is if it also clears
IPOIB_FLAG_OPER_UP.

Thus the flush inside ipoib_flush_ah() will deadlock if it gets unlucky
enough, and lockdep doesn't help us to find it early:

          CPU0               CPU1          CPU2
   __ipoib_ib_dev_flush()
      down_read(vlan_rwsem)

                         ipoib_vlan_add()
                           rtnl_trylock()
                           down_write(vlan_rwsem)

				      ipoib_mcast_carrier_on_task()
					 while (!rtnl_trylock())
					      msleep(20);

      ipoib_flush_ah()
	flush_workqueue(priv->wq)

Clean up the ah_reaper related functions and lifecycle to make sense:

 - Start/Stop of the reaper should only be done in open/stop NDOs, not in
   any other places

 - cancel and flush of the reaper should only happen in the stop NDO.
   cancel is only functional when combined with IPOIB_STOP_REAPER.

 - Non-stop places were flushing the AH's just need to flush out dead AH's
   synchronously and ignore the background task completely. It is fully
   locked and harmless to leave running.

Which ultimately fixes the ABBA deadlock by removing the unnecessary
flush_workqueue() from the problematic place under the vlan_rwsem.

Fixes: efc82eeeae ("IB/ipoib: No longer use flush as a parameter")
Link: https://lore.kernel.org/r/20200625174219.290842-1-kamalheib1@gmail.com
Reported-by: Kamal Heib <kheib@redhat.com>
Tested-by: Kamal Heib <kheib@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-02 10:46:06 -03:00
Bolarinwa Olayemi Saheed
c4334a99d3 IB/hfi1: Convert PCIBIOS_* errors to generic -E* errors
pcie_speeds() and restore_pci_variables() returns PCIBIOS_ error codes
from PCIe capability accessors.

PCIBIOS_ error codes have positive values. Passing on these values is
inconsistent with functions which return only a negative value on failure.

Before passing on the return value of PCIe capability accessors, call
pcibios_err_to_errno() to convert any positive PCIBIOS_ error codes to
negative generic error values.

Link: https://lore.kernel.org/r/20200615073225.24061-3-refactormyself@gmail.com
Suggested-by: Bjorn Helgaas <bjorn@helgaas.com>
Signed-off-by: Bolarinwa Olayemi Saheed <refactormyself@gmail.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-30 13:27:14 -03:00
David S. Miller
b0f46a9754 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2020-06-25

This series contains updates to i40e driver and removes the individual
driver versions from all of the Intel wired LAN drivers.

Shiraz moves the client header so that it can easily be shared between
the i40e LAN driver and i40iw RDMA driver.

Jesse cleans up the unused defines, since they are just dead weight.

Alek reduces the unreasonably long wait time for a PF reset after reboot
by using jiffies to limit the maximum wait time for the PF reset to
succeed.  Added additional logging to let the user know when the driver
transitions into recovery mode.  Adds new device support for our 5 Gbps
NICs.

Todd adds a check to see if MFS is set after warm reboot and notifies
the user when MFS is set to anything lower than the default value.

Arkadiusz fixes a possible race condition, where were holding a
spin-lock while in atomic context.

v2: removed code comments that were no longer applicable in patch 2 of
    the series.  Also removed 'inline' from patch 4 and patch 8 of the
    series.  Also re-arranged code to be able to remove the forward
    function declarations.  Dropped patch 9 of the series, while the
    author works on cleaning up the commit message.
v3: Updated patch 8 description to answer Jakub's questions
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-26 12:22:34 -07:00
Shiraz Saleem
fe21b6c3a6 i40e: Move client header location
Move i40e_client.h to include/linux/net/intel/*
since its shared between i40iw and i40e.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-06-25 22:25:13 -07:00
Leon Romanovsky
14c2b89634 RDMA/core: Delete not-used create RWQ table function
The RWQ table is used for RSS uverbs and not in used for the kernel
consumers, delete ib_create_rwq_ind_table() routine that is not
called at all.

Link: https://lore.kernel.org/r/20200624105422.1452290-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24 16:46:18 -03:00
Shay Drory
5611074a20 IB/mad: Delete RMPP_STATE_CANCELING state
The cancel_delayed_work can be called under lock since it doesn't sleep.
This makes the RMPP_STATE_CANCELING state not needed anymore, remove it.

Link: https://lore.kernel.org/r/20200621104738.54850-5-leon@kernel.org
Signed-off-by: Shay Drory <shayd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24 16:43:45 -03:00
Shay Drory
e41c425349 IB/mad: Change atomics to refcount API
The refcount API provides better safety than atomics API.  Therefore,
change atomic functions to refcount functions.

Link: https://lore.kernel.org/r/20200621104738.54850-4-leon@kernel.org
Signed-off-by: Shay Drory <shayd@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24 16:43:45 -03:00
Shay Drory
b9af0e2d5a IB/mad: Issue complete whenever decrements agent refcount
Replace calls of atomic_dec() to mad_agent_priv->refcount with calls to
deref_mad_agent() in order to issue complete. Most likely the refcount is
> 1 at these points, but it is difficult to prove. Performance is not
important on these paths, so be obviously correct.

Link: https://lore.kernel.org/r/20200621104738.54850-3-leon@kernel.org
Signed-off-by: Shay Drory <shayd@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24 16:43:44 -03:00
Mike Marciniszyn
38fd98afee IB/hfi1: Add atomic triggered sleep/wakeup
When running iperf in a two host configuration the following trace can
occur:

[  319.728730] NETDEV WATCHDOG: ib0 (hfi1): transmit queue 0 timed out

The issue happens because the current implementation relies on the netif
txq being stopped to control the flushing of the tx list.

There are two resources that the transmit logic can wait on and stop the
txq:
- SDMA descriptors
- Ring space to hold completions

The ring space is tested on the sending side and relieved when the ring is
consumed in the napi tx reaping.

Unfortunately, that reaping can run conncurrently with the workqueue
flushing of the txlist.  If the txq is started just before the workitem
executes, the txlist will never be flushed, leading to the txq being
stuck.

Fix by:
- Adding sleep/wakeup wrappers
  * Use an atomic to control the call to the netif routines inside the
    wrappers

- Use another atomic to record ring space exhaustion
  * Only wakeup when the a ring space exhaustion has happened and it
    relieved

Add additional wrappers to clarify the ring space resource handling.

Fixes: d99dc602e2 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
Link: https://lore.kernel.org/r/20200623204327.108092.4024.stgit@awfm-01.aw.intel.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24 16:13:38 -03:00
Mike Marciniszyn
82172b7655 IB/hfi1: Correct -EBUSY handling in tx code
The current code mishandles -EBUSY in two ways:
- The flow change doesn't test the return from the flush and runs on to
  process the current packet racing with the wakeup processing
- The -EBUSY handling for a single packet inserts the tx into the txlist
  after the submit call, racing with the same wakeup processing

Fix the first by dropping the skb and returning NETDEV_TX_OK.

Fix the second by insuring the the list entry within the txreq is inited
when allocated.  This enables the sleep routine to detect that the txreq
has used the non-list api and queue the packet to the txlist.

Both flaws can lead to having the flushing thread executing in causing two
threads to manipulate the txlist.

Fixes: d99dc602e2 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
Link: https://lore.kernel.org/r/20200623204321.108092.83898.stgit@awfm-01.aw.intel.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24 16:13:38 -03:00
Dennis Dalessandro
822fbd3741 IB/hfi1: Fix module use count flaw due to leftover module put calls
When the try_module_get calls were removed from opening and closing of the
i2c debugfs file, the corresponding module_put calls were missed.  This
results in an inaccurate module use count that requires a power cycle to
fix.

Fixes: 09fbca8e62 ("IB/hfi1: No need to use try_module_get for debugfs")
Link: https://lore.kernel.org/r/20200623203230.106975.76240.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24 15:54:08 -03:00
Dennis Dalessandro
b46925a24a IB/hfi1: Restore kfree in dummy_netdev cleanup
We need to do some rework on the dummy netdev. Calling the free_netdev()
would normally make sense, and that will be addressed in an upcoming
patch. For now just revert the behavior to what it was before keeping the
unused variable removal part of the patch.

The dd->dumm_netdev is mainly used for packet receiving through
alloc_netdev_mqs() for typical net devices. A a result, it should be freed
with kfree instead of free_netdev() that leads to a crash when unloading
the hfi1 module:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 8000000855b54067 P4D 8000000855b54067 PUD 84a4f5067 PMD 0
  Oops: 0000 [#1] SMP PTI
  CPU: 73 PID: 10299 Comm: modprobe Not tainted 5.6.0-rc5+ #1
  Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
  RIP: 0010:__hw_addr_flush+0x12/0x80
  Code: 40 00 48 83 c4 08 4c 89 e7 5b 5d 41 5c e9 76 77 18 00 66 0f 1f 44 00 00 0f 1f 44 00 00 41 54 49 89 fc 55 53 48 8b 1f 48 39 df <48> 8b 2b 75 08 eb 4a 48 89 eb 48 89 c5 48 89 df e8 99 bf d0 ff 84
  RSP: 0018:ffffb40e08783db8 EFLAGS: 00010282
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
  RDX: ffffb40e00000000 RSI: 0000000000000246 RDI: ffff88ab13662298
  RBP: ffff88ab13662000 R08: 0000000000001549 R09: 0000000000001549
  R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffff88ab13662298
  R13: ffff88ab1b259e20 R14: ffff88ab1b259e42 R15: 0000000000000000
  FS:  00007fb39b534740(0000) GS:ffff88b31f940000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 000000084d3ea004 CR4: 00000000003606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   dev_addr_flush+0x15/0x30
   free_netdev+0x7e/0x130
   hfi1_netdev_free+0x59/0x70 [hfi1]
   remove_one+0x65/0x110 [hfi1]
   pci_device_remove+0x3b/0xc0
   device_release_driver_internal+0xec/0x1b0
   driver_detach+0x46/0x90
   bus_remove_driver+0x58/0xd0
   pci_unregister_driver+0x26/0xa0
   hfi1_mod_cleanup+0xc/0xd54 [hfi1]
   __x64_sys_delete_module+0x16c/0x260
   ? exit_to_usermode_loop+0xa4/0xc0
   do_syscall_64+0x5b/0x200
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 193ba03141 ("IB/hfi1: Use free_netdev() in hfi1_netdev_free()")
Link: https://lore.kernel.org/r/20200623203224.106975.16926.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24 15:54:08 -03:00
Kamal Heib
95a5631f6c RDMA/ipoib: Return void from ipoib_ib_dev_stop()
The return value from ipoib_ib_dev_stop() is always 0 - change it to be
void.

Link: https://lore.kernel.org/r/20200623105236.18683-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24 08:52:31 -03:00
Jason Gunthorpe
3506c37dcc Merge branch 'raw_dumps' into rdma.git for-next
Maor Gottlieb says:

====================
The following series adds support to get the RDMA resource data in RAW
format. The main motivation for doing this is to enable vendors to return
the entire QP/CQ/MR data without a need from the vendor to set each
field separately.
====================

Based on the mlx5-next branch at
      git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies

* branch 'raw_dumps':
  RDMA/mlx5: Add support to get MR resource in RAW format
  RDMA/mlx5: Add support to get CQ resource in RAW format
  RDMA/mlx5: Add support to get QP resource in RAW format
  RDMA: Add support to dump resource tracker in RAW format
  RDMA: Add dedicated CM_ID resource tracker function
  RDMA: Add dedicated QP resource tracker function
  RDMA: Add a dedicated CQ resource tracker function
  RDMA: Add dedicated MR resource tracker function
  RDMA/core: Don't call fill_res_entry for PD
  net/mlx5: Add support in query QP, CQ and MKEY segments
  net/mlx5: Export resource dump interface

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24 08:52:30 -03:00
Maor Gottlieb
28b5fa687f RDMA/mlx5: Add support to get MR resource in RAW format
Add support to get MR (mkey) resource dump in RAW format.

Link: https://lore.kernel.org/r/20200623113043.1228482-12-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24 08:52:29 -03:00
Maor Gottlieb
1ccecc88af RDMA/mlx5: Add support to get CQ resource in RAW format
Add support to get CQ resource dump in RAW format.

Link: https://lore.kernel.org/r/20200623113043.1228482-11-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24 08:52:29 -03:00
Maor Gottlieb
1776dd234a RDMA/mlx5: Add support to get QP resource in RAW format
Add a generic function to use the resource dump mechanism to get the
QP resource data.

Link: https://lore.kernel.org/r/20200623113043.1228482-10-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24 08:52:29 -03:00
Maor Gottlieb
65959522f8 RDMA: Add support to dump resource tracker in RAW format
Add support to get resource dump in raw format. It enable drivers to
return the entire device specific QP/CQ/MR context without a need from the
driver to set each field separately.

The raw query returns only the device specific data, general data is still
returned by using the existing queries.

Example:

$ rdma res show mr dev mlx5_1 mrn 2 -r -j
[{"ifindex":7,"ifname":"mlx5_1",
"data":[0,4,255,254,0,0,0,0,0,0,0,0,16,28,0,216,...]}]

Link: https://lore.kernel.org/r/20200623113043.1228482-9-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24 08:52:29 -03:00
Maor Gottlieb
211cd9459f RDMA: Add dedicated CM_ID resource tracker function
In order to avoid double multiplexing of the resource when it is a cm id,
add a dedicated callback function. In addition remove fill_res_entry which
is not used anymore.

Link: https://lore.kernel.org/r/20200623113043.1228482-8-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-23 11:46:27 -03:00
Maor Gottlieb
5cc34116cc RDMA: Add dedicated QP resource tracker function
In order to avoid double multiplexing of the resource when it is a QP, add
a dedicated callback function.

Link: https://lore.kernel.org/r/20200623113043.1228482-7-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-23 11:46:27 -03:00
Maor Gottlieb
9e2a187a93 RDMA: Add a dedicated CQ resource tracker function
In order to avoid double multiplexing of the resource when it is a CQ, add
a dedicated callback function.

Link: https://lore.kernel.org/r/20200623113043.1228482-6-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-23 11:46:27 -03:00
Maor Gottlieb
f443452900 RDMA: Add dedicated MR resource tracker function
In order to avoid double multiplexing of the resource when it is a MR, add
a dedicated callback function.

Link: https://lore.kernel.org/r/20200623113043.1228482-5-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-23 11:46:27 -03:00
Maor Gottlieb
24fd6d6f85 RDMA/core: Don't call fill_res_entry for PD
None of the drivers implement it, remove it.

Link: https://lore.kernel.org/r/20200623113043.1228482-4-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-23 11:46:26 -03:00
Shay Drory
116a1b9f1c IB/mad: Fix use after free when destroying MAD agent
Currently, when RMPP MADs are processed while the MAD agent is destroyed,
it could result in use after free of rmpp_recv, as decribed below:

	cpu-0						cpu-1
	-----						-----
ib_mad_recv_done()
 ib_mad_complete_recv()
  ib_process_rmpp_recv_wc()
						unregister_mad_agent()
						 ib_cancel_rmpp_recvs()
						  cancel_delayed_work()
   process_rmpp_data()
    start_rmpp()
     queue_delayed_work(rmpp_recv->cleanup_work)
						  destroy_rmpp_recv()
						   free_rmpp_recv()
     cleanup_work()[1]
      spin_lock_irqsave(&rmpp_recv->agent->lock) <-- use after free

[1] cleanup_work() == recv_cleanup_handler

Fix it by waiting for the MAD agent reference count becoming zero before
calling to ib_cancel_rmpp_recvs().

Fixes: 9a41e38a46 ("IB/mad: Use IDR for agent IDs")
Link: https://lore.kernel.org/r/20200621104738.54850-2-leon@kernel.org
Signed-off-by: Shay Drory <shayd@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-22 14:57:44 -03:00
Kamal Heib
f6b4c11fc5 RDMA/rxe: Remove unused rxe_mem_map_pages
This function is not in use - delete it.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200622100731.27359-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-22 14:49:27 -03:00
Kamal Heib
d5fdffe239 RDMA/hfi1: Remove hfi1_create_qp declaration
The function isn't implemented - delete the declaration.

Link: https://lore.kernel.org/r/20200622094709.12981-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-22 14:49:27 -03:00
Kamal Heib
90cdff90df RDMA/ipoib: Return void from ipoib_mcast_stop_thread()
The return value from ipoib_mcast_stop_thread() is always 0 - change it to
be void.

Link: https://lore.kernel.org/r/20200622092256.6931-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-22 14:49:27 -03:00
Leon Romanovsky
6eefa839c4 RDMA/mlx5: Protect from kernel crash if XRC_TGT doesn't have udata
Don't deref udata if it is NULL

  BUG: kernel NULL pointer dereference, address: 0000000000000030
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000   SMP PTI
  CPU: 2 PID: 1592 Comm: python3 Not tainted 5.7.0-rc6+ #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  RIP: 0010:create_qp+0x39e/0xae0 [mlx5_ib]
  Code: c0 0d 00 00 bf 10 01 00 00 e8 be a9 e4 e0 48 85 c0 49 89 c2 0f 84 0c 07 00 00 41 8b 85 74 63 01 00 0f c8 a9 00 00 00 10 74 0a <41> 8b 46 30 0f c8 41 89 42 14 41 8b 52 18 41 0f b6 4a 1c 0f ca 89
  RSP: 0018:ffffc9000067f8b0 EFLAGS: 00010206
  RAX: 0000000010170000 RBX: ffff888441313000 RCX: 0000000000000000
  RDX: 0000000000000200 RSI: 0000000000000000 RDI: ffff88845b1d4400
  RBP: ffffc9000067fa60 R08: 0000000000000200 R09: ffff88845b1d4200
  R10: ffff88845b1d4200 R11: ffff888441313000 R12: ffffc9000067f950
  R13: ffff88846ac00140 R14: 0000000000000000 R15: ffff88846c2bc000
  FS:  00007faa1a3c0540(0000) GS:ffff88846fd00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000030 CR3: 0000000446dca003 CR4: 0000000000760ea0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
   ? __switch_to_asm+0x40/0x70
   ? __switch_to_asm+0x34/0x70
   mlx5_ib_create_qp+0x897/0xfa0 [mlx5_ib]
   ib_create_qp+0x9e/0x300 [ib_core]
   create_qp+0x92d/0xb20 [ib_uverbs]
   ? ib_uverbs_cq_event_handler+0x30/0x30 [ib_uverbs]
   ? release_resource+0x30/0x30
   ib_uverbs_create_qp+0xc4/0xe0 [ib_uverbs]
   ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xc8/0xf0 [ib_uverbs]
   ib_uverbs_run_method+0x223/0x770 [ib_uverbs]
   ? track_pfn_remap+0xa7/0x100
   ? uverbs_disassociate_api+0xd0/0xd0 [ib_uverbs]
   ? remap_pfn_range+0x358/0x490
   ib_uverbs_cmd_verbs.isra.6+0x19b/0x370 [ib_uverbs]
   ? rdma_umap_priv_init+0x82/0xe0 [ib_core]
   ? vm_mmap_pgoff+0xec/0x120
   ib_uverbs_ioctl+0xc0/0x120 [ib_uverbs]
   ksys_ioctl+0x92/0xb0
   __x64_sys_ioctl+0x16/0x20
   do_syscall_64+0x48/0x130
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: e383085c24 ("RDMA/mlx5: Set ECE options during QP create")
Link: https://lore.kernel.org/r/20200621115959.60126-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-22 14:40:53 -03:00
Mark Zhang
c1d869d64a RDMA/counter: Query a counter before release
Query a dynamically-allocated counter before release it, to update it's
hwcounters and log all of them into history data. Otherwise all values of
these hwcounters will be lost.

Fixes: f34a55e497 ("RDMA/core: Get sum value of all counters when perform a sysfs stat read")
Link: https://lore.kernel.org/r/20200621110000.56059-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-22 14:36:56 -03:00
Colton Lewis
11708142bc RDMA: Correct trivial kernel-doc inconsistencies
Silence documentation build warnings by correcting kernel-doc comments.

./drivers/infiniband/core/verbs.c:1004: warning: Function parameter or member 'uobject' not described in 'ib_create_srq_user'
./drivers/infiniband/core/verbs.c:1004: warning: Function parameter or member 'udata' not described in 'ib_create_srq_user'
./drivers/infiniband/core/umem_odp.c:161: warning: Function parameter or member 'ops' not described in 'ib_umem_odp_alloc_child'
./drivers/infiniband/core/umem_odp.c:225: warning: Function parameter or member 'ops' not described in 'ib_umem_odp_get'
./drivers/infiniband/sw/rdmavt/ah.c:104: warning: Excess function parameter 'ah_attr' description in 'rvt_create_ah'
./drivers/infiniband/sw/rdmavt/ah.c:104: warning: Excess function parameter 'create_flags' description in 'rvt_create_ah'
./drivers/infiniband/ulp/iser/iscsi_iser.h:363: warning: Function parameter or member 'all_list' not described in 'iser_fr_desc'
./drivers/infiniband/ulp/iser/iscsi_iser.h:377: warning: Function parameter or member 'all_list' not described in 'iser_fr_pool'
./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:148: warning: Function parameter or member 'rsvd0' not described in 'opa_vesw_info'
./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:148: warning: Function parameter or member 'rsvd1' not described in 'opa_vesw_info'
./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:148: warning: Function parameter or member 'rsvd2' not described in 'opa_vesw_info'
./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:148: warning: Function parameter or member 'rsvd3' not described in 'opa_vesw_info'
./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:148: warning: Function parameter or member 'rsvd4' not described in 'opa_vesw_info'
./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:205: warning: Function parameter or member 'rsvd0' not described in 'opa_per_veswport_info'
./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:205: warning: Function parameter or member 'rsvd1' not described in 'opa_per_veswport_info'
./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:205: warning: Function parameter or member 'rsvd2' not described in 'opa_per_veswport_info'
./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:205: warning: Function parameter or member 'rsvd3' not described in 'opa_per_veswport_info'
./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:342: warning: Function parameter or member 'reserved' not described in 'opa_veswport_summary_counters'
./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd0' not described in 'opa_veswport_error_counters'
./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd1' not described in 'opa_veswport_error_counters'
./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd2' not described in 'opa_veswport_error_counters'
./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd3' not described in 'opa_veswport_error_counters'
./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd4' not described in 'opa_veswport_error_counters'
./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd5' not described in 'opa_veswport_error_counters'
./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd6' not described in 'opa_veswport_error_counters'
./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd7' not described in 'opa_veswport_error_counters'
./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd8' not described in 'opa_veswport_error_counters'
./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd9' not described in 'opa_veswport_error_counters'
./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:460: warning: Function parameter or member 'reserved' not described in 'opa_vnic_vema_mad'
./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:485: warning: Function parameter or member 'reserved' not described in 'opa_vnic_notice_attr'
./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:500: warning: Function parameter or member 'reserved' not described in 'opa_vnic_vema_mad_trap'

Link: https://lore.kernel.org/r/5373936.DvuYhMxLoT@laptop.coltonlewis.name
Signed-off-by: Colton Lewis <colton.w.lewis@protonmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-22 11:57:39 -03:00
Fan Guo
a17f4bed81 RDMA/mad: Fix possible memory leak in ib_mad_post_receive_mads()
If ib_dma_mapping_error() returns non-zero value,
ib_mad_post_receive_mads() will jump out of loops and return -ENOMEM
without freeing mad_priv. Fix this memory-leak problem by freeing mad_priv
in this case.

Fixes: 2c34e68f42 ("IB/mad: Check and handle potential DMA mapping errors")
Link: https://lore.kernel.org/r/20200612063824.180611-1-guofan5@huawei.com
Signed-off-by: Fan Guo <guofan5@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-19 09:23:30 -03:00
Jing Xiangfeng
a7ca4c3ebe IB/srpt: Remove WARN_ON from srpt_cm_req_recv
The callers pass the pointer '&req' or 'private_data' to
srpt_cm_req_recv(), and 'private_data' is initialized in srp_send_req().
'sdev' is allocated and stored in srpt_add_one(). It's easy to show that
sdev and req are always valid. So we remove unnecessary WARN_ON.

Link: https://lore.kernel.org/r/20200617140803.181333-1-jingxiangfeng@huawei.com
Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-18 15:18:00 -03:00
Max Gurtovoy
9e0dc7b9e1 RDMA/mlx5: Fix integrity enabled QP creation
create_flags checks was refactored and broke the creation on integrity
enabled QPs and actually broke the NVMe/RDMA and iSER ULP's when using
mlx5 driven devices.

Fixes: 2978975ce7 ("RDMA/mlx5: Process create QP flags in one place")
Link: https://lore.kernel.org/r/20200617130230.2846915-1-leon@kernel.org
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-18 15:14:57 -03:00
Leon Romanovsky
2c0f5292d5 RDMA/mlx5: Remove ECE limitation from the RAW_PACKET QPs
Like any other QP type, rely on FW for the RAW_PACKET QPs to decide if ECE
is supported or not. This fixes an inability to create RAW_PACKET QPs with
latest rdma-core with the ECE support.

Fixes: e383085c24 ("RDMA/mlx5: Set ECE options during QP create")
Link: https://lore.kernel.org/r/20200618112507.3453496-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-18 14:59:12 -03:00
Maor Gottlieb
d44335572f RDMA/mlx5: Fix remote gid value in query QP
Remote gid is not copied to the right address. Fix it by using
rdma_ah_set_dgid_raw to copy the remote gid value from the QP context on
query QP.

Fixes: 70bd7fb876 ("RDMA/mlx5: Remove manually crafted QP context the query call")
Link: https://lore.kernel.org/r/20200618112507.3453496-3-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-18 14:42:27 -03:00
Leon Romanovsky
6c41965d64 RDMA/mlx5: Don't access ib_qp fields in internal destroy QP path
destroy_qp_common is called for flows where QP is already created by
HW. While it is called from IB/core, the ibqp.* fields will be fully
initialized, but it is not the case if this function is called during QP
creation.

Don't rely on ibqp fields as much as possible and initialize
send_cq/recv_cq as temporal solution till all drivers will be converted to
IB/core QP allocation scheme.

refcount_t: underflow; use-after-free.
WARNING: CPU: 1 PID: 5372 at lib/refcount.c:28 refcount_warn_saturate+0xfe/0x1a0
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 5372 Comm: syz-executor.2 Not tainted 5.5.0-rc5 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
 mlx5_core_put_rsc+0x70/0x80
 destroy_resource_common+0x8e/0xb0
 mlx5_core_destroy_qp+0xaf/0x1d0
 mlx5_ib_destroy_qp+0xeb0/0x1460
 ib_destroy_qp_user+0x2d5/0x7d0
 create_qp+0xed3/0x2130
 ib_uverbs_create_qp+0x13e/0x190
 ? ib_uverbs_ex_create_qp
 ib_uverbs_write+0xaa5/0xdf0
 __vfs_write+0x7c/0x100
 ksys_write+0xc8/0x200
 do_syscall_64+0x9c/0x390
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 08d5397660 ("RDMA/mlx5: Copy response to the user in one place")
Link: https://lore.kernel.org/r/20200617130148.2846643-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-18 14:26:04 -03:00
Leon Romanovsky
4121fb0db6 RDMA/core: Check that type_attrs is not NULL prior access
In disassociate flow, the type_attrs is set to be NULL, which is in an
implicit way is checked in alloc_uobj() by "if (!attrs->context)".

Change the logic to rely on that check, to be consistent with other
alloc_uobj() places that will fix the following kernel splat.

 BUG: kernel NULL pointer dereference, address: 0000000000000018
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP PTI
 CPU: 3 PID: 2743 Comm: python3 Not tainted 5.7.0-rc6-for-upstream-perf-2020-05-23_19-04-38-5 #1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
 RIP: 0010:alloc_begin_fd_uobject+0x18/0xf0 [ib_uverbs]
 Code: 89 43 48 eb 97 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 f5 41 54 55 48 89 fd 53 48 83 ec 08 48 8b 1f <48> 8b 43 18 48 8b 80 80 00 00 00 48 3d 20 10 33 a0 74 1c 48 3d 30
 RSP: 0018:ffffc90001127b70 EFLAGS: 00010282
 RAX: ffffffffa0339fe0 RBX: 0000000000000000 RCX: 8000000000000007
 RDX: fffffffffffffffb RSI: ffffc90001127d28 RDI: ffff88843fe1f600
 RBP: ffff88843fe1f600 R08: ffff888461eb06d8 R09: ffff888461eb06f8
 R10: ffff888461eb0700 R11: 0000000000000000 R12: ffff88846a5f6450
 R13: ffffc90001127d28 R14: ffff88845d7d6ea0 R15: ffffc90001127cb8
 FS: 00007f469bff1540(0000) GS:ffff88846f980000(0000) knlGS:0000000000000000
 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000018 CR3: 0000000450018003 CR4: 0000000000760ee0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
 ? xa_store+0x28/0x40
 rdma_alloc_begin_uobject+0x4f/0x90 [ib_uverbs]
 ib_uverbs_create_comp_channel+0x87/0xf0 [ib_uverbs]
 ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xb1/0xf0 [ib_uverbs]
 ib_uverbs_cmd_verbs.isra.8+0x96d/0xae0 [ib_uverbs]
 ? get_page_from_freelist+0x3bb/0xf70
 ? _copy_to_user+0x22/0x30
 ? uverbs_disassociate_api+0xd0/0xd0 [ib_uverbs]
 ? __wake_up_common_lock+0x87/0xc0
 ib_uverbs_ioctl+0xbc/0x130 [ib_uverbs]
 ksys_ioctl+0x83/0xc0
 ? ksys_write+0x55/0xd0
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x48/0x130
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7f469ac43267

Fixes: 849e149063 ("RDMA/core: Do not allow alloc_commit to fail")
Link: https://lore.kernel.org/r/20200617061826.2625359-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-18 10:59:21 -03:00
Yangyang Li
3ec5f54f7a RDMA/hns: Fix an cmd queue issue when resetting
If a IMP reset caused by some hardware errors and hns RoCE driver reset
occurred at the same time, there is a possiblity that the IMP will stop
dealing with command and users can't use the hardware. The logs are as
follows:

 hns3 0000:fd:00.1: cleaned 0, need to clean 1
 hns3 0000:fd:00.1: firmware version query failed -11
 hns3 0000:fd:00.1: Cmd queue init failed
 hns3 0000:fd:00.1: Upgrade reset level
 hns3 0000:fd:00.1: global reset interrupt

The hns NIC driver divides the reset process into 3 status:
initialization, hardware resetting and softwaring restting. RoCE driver
gets reset status by interfaces provided by NIC driver and commands will
not be sent to the IMP if the driver is in any above status. The main
reason for this issue is that there is a time gap between status 1 and 2,
if the RoCE driver sends commands to the IMP during this gap, the IMP will
stop working because it is not ready.

To eliminate the time gap, the hns NIC driver has added a new interface in
commit a4de02287a ("net: hns3: provide .get_cmdq_stat interface for the
client"), so RoCE driver can ensure that no commands will be sent during
resetting.

Link: https://lore.kernel.org/r/1592314778-52822-1-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-18 10:48:39 -03:00
Yangyang Li
98a6151907 RDMA/hns: Fix a calltrace when registering MR from userspace
ibmr.device is assigned after MR is successfully registered, but both
write_mtpt() and frmr_write_mtpt() accesses it during the mr registration
process, which may cause the following error when trying to register MR in
userspace and pbl_hop_num is set to 0.

  pc : hns_roce_mtr_find+0xa0/0x200 [hns_roce]
  lr : set_mtpt_pbl+0x54/0x118 [hns_roce_hw_v2]
  sp : ffff00023e73ba20
  x29: ffff00023e73ba20 x28: ffff00023e73bad8
  x27: 0000000000000000 x26: 0000000000000000
  x25: 0000000000000002 x24: 0000000000000000
  x23: ffff00023e73bad0 x22: 0000000000000000
  x21: ffff0000094d9000 x20: 0000000000000000
  x19: ffff8020a6bdb2c0 x18: 0000000000000000
  x17: 0000000000000000 x16: 0000000000000000
  x15: 0000000000000000 x14: 0000000000000000
  x13: 0140000000000000 x12: 0040000000000041
  x11: ffff000240000000 x10: 0000000000001000
  x9 : 0000000000000000 x8 : ffff802fb7558480
  x7 : ffff802fb7558480 x6 : 000000000003483d
  x5 : ffff00023e73bad0 x4 : 0000000000000002
  x3 : ffff00023e73bad8 x2 : 0000000000000000
  x1 : 0000000000000000 x0 : ffff0000094d9708
  Call trace:
   hns_roce_mtr_find+0xa0/0x200 [hns_roce]
   set_mtpt_pbl+0x54/0x118 [hns_roce_hw_v2]
   hns_roce_v2_write_mtpt+0x14c/0x168 [hns_roce_hw_v2]
   hns_roce_mr_enable+0x6c/0x148 [hns_roce]
   hns_roce_reg_user_mr+0xd8/0x130 [hns_roce]
   ib_uverbs_reg_mr+0x14c/0x2e0 [ib_uverbs]
   ib_uverbs_write+0x27c/0x3e8 [ib_uverbs]
   __vfs_write+0x60/0x190
   vfs_write+0xac/0x1c0
   ksys_write+0x6c/0xd8
   __arm64_sys_write+0x24/0x30
   el0_svc_common+0x78/0x130
   el0_svc_handler+0x38/0x78
   el0_svc+0x8/0xc

Solve above issue by adding a pointer of structure hns_roce_dev as a
parameter of write_mtpt() and frmr_write_mtpt(), so that both of these
functions can access it before finishing MR's registration.

Fixes: 9b2cf76c9f ("RDMA/hns: Optimize PBL buffer allocation process")
Link: https://lore.kernel.org/r/1592314629-51715-1-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-18 10:47:04 -03:00
Leon Romanovsky
ab183d460d RDMA/mlx5: Add missed RST2INIT and INIT2INIT steps during ECE handshake
Missed steps during ECE handshake left userspace application with less
options for the ECE handshake. Pass ECE options in the additional
transitions.

Fixes: 50aec2c313 ("RDMA/mlx5: Return ECE data after modify QP")
Link: https://lore.kernel.org/r/20200616104536.2426384-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-18 09:52:29 -03:00
Mark Zhang
730c891248 RDMA/cma: Protect bind_list and listen_list while finding matching cm id
The bind_list and listen_list must be accessed under a lock, add the
missing locking around the access in cm_ib_id_from_event()

In addition add lockdep asserts to make it clearer what the locking
semantic is here.

  general protection fault: 0000 [#1] SMP NOPTI
  CPU: 226 PID: 126135 Comm: kworker/226:1 Tainted: G OE 4.12.14-150.47-default #1 SLE15
  Hardware name: Cray Inc. Windom/Windom, BIOS 0.8.7 01-10-2020
  Workqueue: ib_cm cm_work_handler [ib_cm]
  task: ffff9c5a60a1d2c0 task.stack: ffffc1d91f554000
  RIP: 0010:cma_ib_req_handler+0x3f1/0x11b0 [rdma_cm]
  RSP: 0018:ffffc1d91f557b40 EFLAGS: 00010286
  RAX: deacffffffffff30 RBX: 0000000000000001 RCX: ffff9c2af5bb6000
  RDX: 00000000000000a9 RSI: ffff9c5aa4ed2f10 RDI: ffffc1d91f557b08
  RBP: ffffc1d91f557d90 R08: ffff9c340cc80000 R09: ffff9c2c0f901900
  R10: 0000000000000000 R11: 0000000000000001 R12: deacffffffffff30
  R13: ffff9c5a48aeec00 R14: ffffc1d91f557c30 R15: ffff9c5c2eea3688
  FS: 0000000000000000(0000) GS:ffff9c5c2fa80000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00002b5cc03fa320 CR3: 0000003f8500a000 CR4: 00000000003406e0
  Call Trace:
  ? rdma_addr_cancel+0xa0/0xa0 [ib_core]
  ? cm_process_work+0x28/0x140 [ib_cm]
  cm_process_work+0x28/0x140 [ib_cm]
  ? cm_get_bth_pkey.isra.44+0x34/0xa0 [ib_cm]
  cm_work_handler+0xa06/0x1a6f [ib_cm]
  ? __switch_to_asm+0x34/0x70
  ? __switch_to_asm+0x34/0x70
  ? __switch_to_asm+0x40/0x70
  ? __switch_to_asm+0x34/0x70
  ? __switch_to_asm+0x40/0x70
  ? __switch_to_asm+0x34/0x70
  ? __switch_to_asm+0x40/0x70
  ? __switch_to+0x7c/0x4b0
  ? __switch_to_asm+0x40/0x70
  ? __switch_to_asm+0x34/0x70
  process_one_work+0x1da/0x400
  worker_thread+0x2b/0x3f0
  ? process_one_work+0x400/0x400
  kthread+0x118/0x140
  ? kthread_create_on_node+0x40/0x40
  ret_from_fork+0x22/0x40
  Code: 00 66 83 f8 02 0f 84 ca 05 00 00 49 8b 84 24 d0 01 00 00 48 85 c0 0f 84 68 07 00 00 48 2d d0 01
  00 00 49 89 c4 0f 84 59 07 00 00 <41> 0f b7 44 24 20 49 8b 77 50 66 83 f8 0a 75 9e 49 8b 7c 24 28

Fixes: 4c21b5bcef ("IB/cma: Add net_dev and private data checks to RDMA CM")
Link: https://lore.kernel.org/r/20200616104304.2426081-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-18 09:48:54 -03:00
Michal Kalderon
0dfbd5ecf2 RDMA/qedr: Fix KASAN: use-after-free in ucma_event_handler+0x532
Private data passed to iwarp_cm_handler is copied for connection request /
response, but ignored otherwise.  If junk is passed, it is stored in the
event and used later in the event processing.

The driver passes an old junk pointer during connection close which leads
to a use-after-free on event processing.  Set private data to NULL for
events that don 't have private data.

  BUG: KASAN: use-after-free in ucma_event_handler+0x532/0x560 [rdma_ucm]
  kernel: Read of size 4 at addr ffff8886caa71200 by task kworker/u128:1/5250
  kernel:
  kernel: Workqueue: iw_cm_wq cm_work_handler [iw_cm]
  kernel: Call Trace:
  kernel: dump_stack+0x8c/0xc0
  kernel: print_address_description.constprop.0+0x1b/0x210
  kernel: ? ucma_event_handler+0x532/0x560 [rdma_ucm]
  kernel: ? ucma_event_handler+0x532/0x560 [rdma_ucm]
  kernel: __kasan_report.cold+0x1a/0x33
  kernel: ? ucma_event_handler+0x532/0x560 [rdma_ucm]
  kernel: kasan_report+0xe/0x20
  kernel: check_memory_region+0x130/0x1a0
  kernel: memcpy+0x20/0x50
  kernel: ucma_event_handler+0x532/0x560 [rdma_ucm]
  kernel: ? __rpc_execute+0x608/0x620 [sunrpc]
  kernel: cma_iw_handler+0x212/0x330 [rdma_cm]
  kernel: ? iw_conn_req_handler+0x6e0/0x6e0 [rdma_cm]
  kernel: ? enqueue_timer+0x86/0x140
  kernel: ? _raw_write_lock_irq+0xd0/0xd0
  kernel: cm_work_handler+0xd3d/0x1070 [iw_cm]

Fixes: e411e0587e ("RDMA/qedr: Add iWARP connection management functions")
Link: https://lore.kernel.org/r/20200616093408.17827-1-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-18 09:44:45 -03:00
Gal Pressman
0133654d8e RDMA/efa: Set maximum pkeys device attribute
The max_pkeys device attribute was not set in query device verb, set it to
one in order to account for the default pkey (0xffff). This information is
exposed to userspace and can cause malfunction

Fixes: 40909f664d ("RDMA/efa: Add EFA verbs implementation")
Link: https://lore.kernel.org/r/20200614103534.88060-1-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-18 09:41:07 -03:00
Aditya Pakki
90a239ee25 RDMA/rvt: Fix potential memory leak caused by rvt_alloc_rq
In case of failure of alloc_ud_wq_attr(), the memory allocated by
rvt_alloc_rq() is not freed. Fix it by calling rvt_free_rq() using the
existing clean-up code.

Fixes: d310c4bf8a ("IB/{rdmavt, hfi1, qib}: Remove AH refcount for UD QPs")
Link: https://lore.kernel.org/r/20200614041148.131983-1-pakki001@umn.edu
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-18 09:35:47 -03:00
Leon Romanovsky
1ea7c546b8 RDMA/core: Annotate CMA unlock helper routine
Fix the following sparse error by adding annotation to
cm_queue_work_unlock() that it releases cm_id_priv->lock lock.

 drivers/infiniband/core/cm.c:936:24: warning: context imbalance in
 'cm_queue_work_unlock' - unexpected unlock

Fixes: e83f195aa4 ("RDMA/cm: Pull duplicated code into cm_queue_work_unlock()")
Link: https://lore.kernel.org/r/20200611130045.1994026-1-leon@kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-18 09:34:42 -03:00
Tom Seewald
6769b275a3 RDMA/siw: Fix pointer-to-int-cast warning in siw_rx_pbl()
The variable buf_addr is type dma_addr_t, which may not be the same size
as a pointer.  To ensure it is the correct size, cast to a uintptr_t.

Fixes: c536277e0d ("RDMA/siw: Fix 64/32bit pointer inconsistency")
Link: https://lore.kernel.org/r/20200610174717.15932-1-tseewald@gmail.com
Signed-off-by: Tom Seewald <tseewald@gmail.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-15 16:00:08 -03:00
Kieran Bingham
0dc63bbee0 RDMA/hfi1: Fix trivial mis-spelling of 'descriptor'
The word 'descriptor' is misspelled throughout the tree.

Fix it up accordingly:
    decriptors -> descriptors

Link: https://lore.kernel.org/r/20200609124610.3445662-3-kieran.bingham+renesas@ideasonboard.com
Link: https://lore.kernel.org/r/20200609124610.3445662-12-kieran.bingham+renesas@ideasonboard.com
Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-15 15:56:54 -03:00
Tom Seewald
4f5747cf8e RDMA/mlx5: Fix -Wformat warning in check_ucmd_data()
Variables of type size_t should use %zu rather than %lu [1]. The variables
"inlen", "ucmd", "last", and "size" are all size_t, so use the correct
format specifiers.

[1] https://www.kernel.org/doc/html/latest/core-api/printk-formats.html

Fixes: e383085c24 ("RDMA/mlx5: Set ECE options during QP create")
Link: https://lore.kernel.org/r/20200605023012.9527-1-tseewald@gmail.com
Signed-off-by: Tom Seewald <tseewald@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-15 15:39:36 -03:00
Colin Ian King
2ef5612391 RDMA/mlx5: Remove duplicated assignment to resp.response_length
The assignment to resp.response_length is never read since it is being
updated again on the next statement. The assignment is redundant so
removed it.

Fixes: a645a89d9a ("RDMA/mlx5: Return ECE DC support")
Link: https://lore.kernel.org/r/20200604143902.56021-1-colin.king@canonical.com
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-15 10:48:38 -03:00
Masahiro Yamada
a7f7f6248d treewide: replace '---help---' in Kconfig files with 'help'
Since commit 84af7a6194 ("checkpatch: kconfig: prefer 'help' over
'---help---'"), the number of '---help---' has been gradually
decreasing, but there are still more than 2400 instances.

This commit finishes the conversion. While I touched the lines,
I also fixed the indentation.

There are a variety of indentation styles found.

  a) 4 spaces + '---help---'
  b) 7 spaces + '---help---'
  c) 8 spaces + '---help---'
  d) 1 space + 1 tab + '---help---'
  e) 1 tab + '---help---'    (correct indentation)
  f) 1 tab + 1 space + '---help---'
  g) 1 tab + 2 spaces + '---help---'

In order to convert all of them to 1 tab + 'help', I ran the
following commend:

  $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-14 01:57:21 +09:00
Michel Lespinasse
c1e8d7c6a7 mmap locking API: convert mmap_sem comments
Convert comments that reference mmap_sem to reference mmap_lock instead.

[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Michel Lespinasse
d8ed45c5dc mmap locking API: use coccinelle to convert mmap_sem rwsem call sites
This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using coccinelle with the following rule:

// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&mm->mmap_sem)
+(mm)

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Mike Rapoport
65fddcfca8 mm: reorder includes after introduction of linux/pgtable.h
The replacement of <asm/pgrable.h> with <linux/pgtable.h> made the include
of the latter in the middle of asm includes.  Fix this up with the aid of
the below script and manual adjustments here and there.

	import sys
	import re

	if len(sys.argv) is not 3:
	    print "USAGE: %s <file> <header>" % (sys.argv[0])
	    sys.exit(1)

	hdr_to_move="#include <linux/%s>" % sys.argv[2]
	moved = False
	in_hdrs = False

	with open(sys.argv[1], "r") as f:
	    lines = f.readlines()
	    for _line in lines:
		line = _line.rstrip('
')
		if line == hdr_to_move:
		    continue
		if line.startswith("#include <linux/"):
		    in_hdrs = True
		elif not moved and in_hdrs:
		    moved = True
		    print hdr_to_move
		print line

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-4-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:13 -07:00
Mike Rapoport
ca5999fde0 mm: introduce include/linux/pgtable.h
The include/linux/pgtable.h is going to be the home of generic page table
manipulation functions.

Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
make the latter include asm/pgtable.h.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:13 -07:00
Mike Rapoport
e31cf2f4ca mm: don't include asm/pgtable.h if linux/mm.h is already included
Patch series "mm: consolidate definitions of page table accessors", v2.

The low level page table accessors (pXY_index(), pXY_offset()) are
duplicated across all architectures and sometimes more than once.  For
instance, we have 31 definition of pgd_offset() for 25 supported
architectures.

Most of these definitions are actually identical and typically it boils
down to, e.g.

static inline unsigned long pmd_index(unsigned long address)
{
        return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
}

static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
{
        return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
}

These definitions can be shared among 90% of the arches provided
XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

For architectures that really need a custom version there is always
possibility to override the generic version with the usual ifdefs magic.

These patches introduce include/linux/pgtable.h that replaces
include/asm-generic/pgtable.h and add the definitions of the page table
accessors to the new header.

This patch (of 12):

The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the
functions involving page table manipulations, e.g.  pte_alloc() and
pmd_alloc().  So, there is no point to explicitly include <asm/pgtable.h>
in the files that include <linux/mm.h>.

The include statements in such cases are remove with a simple loop:

	for f in $(git grep -l "include <linux/mm.h>") ; do
		sed -i -e '/include <asm\/pgtable.h>/ d' $f
	done

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:13 -07:00
Linus Torvalds
242b233198 RDMA 5.8 merge window pull request
A few large, long discussed works this time. The RNBD block driver has
 been posted for nearly two years now, and the removal of FMR has been a
 recurring discussion theme for a long time. The usual smattering of
 features and bug fixes.
 
 - Various small driver bugs fixes in rxe, mlx5, hfi1, and efa
 
 - Continuing driver cleanups in bnxt_re, hns
 
 - Big cleanup of mlx5 QP creation flows
 
 - More consistent use of src port and flow label when LAG is used and a
   mlx5 implementation
 
 - Additional set of cleanups for IB CM
 
 - 'RNBD' network block driver and target. This is a network block RDMA
   device specific to ionos's cloud environment. It brings strong multipath
   and resiliency capabilities.
 
 - Accelerated IPoIB for HFI1
 
 - QP/WQ/SRQ ioctl migration for uverbs, and support for multiple async fds
 
 - Support for exchanging the new IBTA defiend ECE data during RDMA CM
   exchanges
 
 - Removal of the very old and insecure FMR interface from all ULPs and
   drivers. FRWR should be preferred for at least a decade now.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl7X/IwACgkQOG33FX4g
 mxp2uw/+MI2S/aXqEBvZfTT8yrkAwqYezS0VeTDnwH/T6UlTMDhHVN/2Ji3tbbX3
 FEKT1i2mnAL5RqUAL1lr9g4sG/bVozrpN46Ws5Lu9dTbIPLKTNPWDuLFQDUShKY7
 OyMI/bRx6anGnsOy20iiBqnrQbrrZj5TECgnmrkAl62QFdcl7aBWe/yYjy4CT11N
 ub+aBXBREN1F1pc0HIjd2tI+8gnZc+mNm1LVVDRH9Capun/pI26qDNh7e6QwGyIo
 n8ItraC8znLwv/nsUoTE7/JRcsTEe6vJI26PQmczZfNJs/4O65G7fZg0eSBseZYi
 qKf7Uwtb3qW0R7jRUMEgFY4DKXVAA0G2ph40HXBuzOSsqlT6HqYMO2wgG8pJkrTc
 qAjoSJGzfAHIsjxzxKI8wKuufCddjCm30VWWU7EKeriI6h1J0uPVqKkQMfYBTkik
 696eZSBycAVgwayOng3XaehiTxOL7qGMTjUpDjUR6UscbiPG919vP+QsbIUuBXdb
 YoddBQJdyGJiaCXv32ciJjo9bjPRRi/bII7Q5qzCNI2mi4ZVbudF4ffzyQvdHtNJ
 nGnpRXoPi7kMvUrKTMPWkFjj0R5/UsPszsA51zbxPydfgBe0Dlc2PrrIG8dlzYAp
 wbV0Lec+iJucKlt7EZtrjz1xOiOOaQt/5/cW1bWqL+wk2t6gAuY=
 =9zTe
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "A more active cycle than most of the recent past, with a few large,
  long discussed works this time.

  The RNBD block driver has been posted for nearly two years now, and
  flowing through RDMA due to it also introducing a new ULP.

  The removal of FMR has been a recurring discussion theme for a long
  time.

  And the usual smattering of features and bug fixes.

  Summary:

   - Various small driver bugs fixes in rxe, mlx5, hfi1, and efa

   - Continuing driver cleanups in bnxt_re, hns

   - Big cleanup of mlx5 QP creation flows

   - More consistent use of src port and flow label when LAG is used and
     a mlx5 implementation

   - Additional set of cleanups for IB CM

   - 'RNBD' network block driver and target. This is a network block
     RDMA device specific to ionos's cloud environment. It brings strong
     multipath and resiliency capabilities.

   - Accelerated IPoIB for HFI1

   - QP/WQ/SRQ ioctl migration for uverbs, and support for multiple
     async fds

   - Support for exchanging the new IBTA defiend ECE data during RDMA CM
     exchanges

   - Removal of the very old and insecure FMR interface from all ULPs
     and drivers. FRWR should be preferred for at least a decade now"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (247 commits)
  RDMA/cm: Spurious WARNING triggered in cm_destroy_id()
  RDMA/mlx5: Return ECE DC support
  RDMA/mlx5: Don't rely on FW to set zeros in ECE response
  RDMA/mlx5: Return an error if copy_to_user fails
  IB/hfi1: Use free_netdev() in hfi1_netdev_free()
  RDMA/hns: Uninitialized variable in modify_qp_init_to_rtr()
  RDMA/core: Move and rename trace_cm_id_create()
  IB/hfi1: Fix hfi1_netdev_rx_init() error handling
  RDMA: Remove 'max_map_per_fmr'
  RDMA: Remove 'max_fmr'
  RDMA/core: Remove FMR device ops
  RDMA/rdmavt: Remove FMR memory registration
  RDMA/mthca: Remove FMR support for memory registration
  RDMA/mlx4: Remove FMR support for memory registration
  RDMA/i40iw: Remove FMR leftovers
  RDMA/bnxt_re: Remove FMR leftovers
  RDMA/mlx5: Remove FMR leftovers
  RDMA/core: Remove FMR pool API
  RDMA/rds: Remove FMR support for memory registration
  RDMA/srp: Remove support for FMR memory registration
  ...
2020-06-05 14:05:57 -07:00
Linus Torvalds
cb8e59cc87 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:

 1) Allow setting bluetooth L2CAP modes via socket option, from Luiz
    Augusto von Dentz.

 2) Add GSO partial support to igc, from Sasha Neftin.

 3) Several cleanups and improvements to r8169 from Heiner Kallweit.

 4) Add IF_OPER_TESTING link state and use it when ethtool triggers a
    device self-test. From Andrew Lunn.

 5) Start moving away from custom driver versions, use the globally
    defined kernel version instead, from Leon Romanovsky.

 6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin.

 7) Allow hard IRQ deferral during NAPI, from Eric Dumazet.

 8) Add sriov and vf support to hinic, from Luo bin.

 9) Support Media Redundancy Protocol (MRP) in the bridging code, from
    Horatiu Vultur.

10) Support netmap in the nft_nat code, from Pablo Neira Ayuso.

11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina
    Dubroca. Also add ipv6 support for espintcp.

12) Lots of ReST conversions of the networking documentation, from Mauro
    Carvalho Chehab.

13) Support configuration of ethtool rxnfc flows in bcmgenet driver,
    from Doug Berger.

14) Allow to dump cgroup id and filter by it in inet_diag code, from
    Dmitry Yakunin.

15) Add infrastructure to export netlink attribute policies to
    userspace, from Johannes Berg.

16) Several optimizations to sch_fq scheduler, from Eric Dumazet.

17) Fallback to the default qdisc if qdisc init fails because otherwise
    a packet scheduler init failure will make a device inoperative. From
    Jesper Dangaard Brouer.

18) Several RISCV bpf jit optimizations, from Luke Nelson.

19) Correct the return type of the ->ndo_start_xmit() method in several
    drivers, it's netdev_tx_t but many drivers were using
    'int'. From Yunjian Wang.

20) Add an ethtool interface for PHY master/slave config, from Oleksij
    Rempel.

21) Add BPF iterators, from Yonghang Song.

22) Add cable test infrastructure, including ethool interfaces, from
    Andrew Lunn. Marvell PHY driver is the first to support this
    facility.

23) Remove zero-length arrays all over, from Gustavo A. R. Silva.

24) Calculate and maintain an explicit frame size in XDP, from Jesper
    Dangaard Brouer.

25) Add CAP_BPF, from Alexei Starovoitov.

26) Support terse dumps in the packet scheduler, from Vlad Buslov.

27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei.

28) Add devm_register_netdev(), from Bartosz Golaszewski.

29) Minimize qdisc resets, from Cong Wang.

30) Get rid of kernel_getsockopt and kernel_setsockopt in order to
    eliminate set_fs/get_fs calls. From Christoph Hellwig.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits)
  selftests: net: ip_defrag: ignore EPERM
  net_failover: fixed rollback in net_failover_open()
  Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv"
  Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv"
  vmxnet3: allow rx flow hash ops only when rss is enabled
  hinic: add set_channels ethtool_ops support
  selftests/bpf: Add a default $(CXX) value
  tools/bpf: Don't use $(COMPILE.c)
  bpf, selftests: Use bpf_probe_read_kernel
  s390/bpf: Use bcr 0,%0 as tail call nop filler
  s390/bpf: Maintain 8-byte stack alignment
  selftests/bpf: Fix verifier test
  selftests/bpf: Fix sample_cnt shared between two threads
  bpf, selftests: Adapt cls_redirect to call csum_level helper
  bpf: Add csum_level helper for fixing up csum levels
  bpf: Fix up bpf_skb_adjust_room helper's skb csum setting
  sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()
  crypto/chtls: IPv6 support for inline TLS
  Crypto/chcr: Fixes a coccinile check error
  Crypto/chcr: Fixes compilations warnings
  ...
2020-06-03 16:27:18 -07:00
Ka-Cheong Poon
fba97dc7fc RDMA/cm: Spurious WARNING triggered in cm_destroy_id()
If the cm_id state is IB_CM_REP_SENT when cm_destroy_id() is called, it
calls cm_send_rej_locked().

In cm_send_rej_locked(), it calls cm_enter_timewait() and the state is
changed to IB_CM_TIMEWAIT.

Now back to cm_destroy_id(), it breaks from the switch statement, and the
next call is WARN_ON(cm_id->state != IB_CM_IDLE).

This triggers a spurious warning. Instead, the code should goto retest
after returning from cm_send_rej_locked() to move the state to IDLE.

Fixes: 67b3c8dcea ("RDMA/cm: Make sure the cm_id is in the IB_CM_IDLE state in destroy")
Link: https://lore.kernel.org/r/1591191218-9446-1-git-send-email-ka-cheong.poon@oracle.com
Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-03 15:48:18 -03:00
Leon Romanovsky
a645a89d9a RDMA/mlx5: Return ECE DC support
The DC QPs are many-to-one QP types that means that first connection will
establish ECE options that coming connections should follow.  Due to this
property, the ECE code was removed between first [1] and second [2] ECE
submissions.

This patch returns the dropped code, because ECE is a property of a
connection and like any other connection users are needed to manage this
data. Allow them to set ECE parameter for DC too and avoid need of having
compatibility flag for the DC ECE.

[1]
https://lore.kernel.org/linux-rdma/20200523132243.817936-1-leon@kernel.org/
[2]
https://lore.kernel.org/linux-rdma/20200525174401.71152-1-leon@kernel.org/

Link: https://lore.kernel.org/r/20200602125548.172654-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-03 15:45:02 -03:00
Leon Romanovsky
92cd667c0e RDMA/mlx5: Don't rely on FW to set zeros in ECE response
The FW returns zeros in case feature is not enabled, but it is better to
have the capability check and ensure that returned result is cleared.

Fixes: 3e09a427ae ("RDMA/mlx5: Get ECE options from FW during create QP")
Link: https://lore.kernel.org/r/20200602125548.172654-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-03 15:45:02 -03:00
Leon Romanovsky
6512f11d38 RDMA/mlx5: Return an error if copy_to_user fails
In theoretical event, the ib_copy_to_udata() can fail, so return -EFAULT
error to the user, so he will destroy the QP.

Fixes: 50aec2c313 ("RDMA/mlx5: Return ECE data after modify QP")
Link: https://lore.kernel.org/r/20200602125548.172654-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-03 15:45:01 -03:00
YueHaibing
193ba03141 IB/hfi1: Use free_netdev() in hfi1_netdev_free()
dummy_netdev shold be freed by free_netdev() instead of kfree(). Also
remove unneeded variable 'priv'

Fixes: 4730f4a6c6 ("IB/hfi1: Activate the dummy netdev")
Link: https://lore.kernel.org/r/20200602061635.31224-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-02 21:35:03 -03:00
Dan Carpenter
87d9e56849 RDMA/hns: Uninitialized variable in modify_qp_init_to_rtr()
The "dmac" variable is used before it is initialized.

Fixes: 494c3b3122 ("RDMA/hns: Refactor the QP context filling process related to WQE buffer configure")
Link: https://lore.kernel.org/r/20200529083918.GA1298465@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-02 20:32:54 -03:00
Chuck Lever
278f74b39e RDMA/core: Move and rename trace_cm_id_create()
The restrack ID for an rdma_cm_id is not assigned until it is
associated with a device.

Here's an example I captured while testing NFS/RDMA's support for
DEVICE_REMOVAL. The new tracepoint name is "cm_id_attach".

           <...>-4261  [001]   366.581299: cm_event_handler:     cm.id=0 src=0.0.0.0:45919 dst=192.168.2.55:20049 tos=0 ADDR_ERROR (1/-19)
           <...>-4261  [001]   366.581304: cm_event_done:        cm.id=0 src=0.0.0.0:45919 dst=192.168.2.55:20049 tos=0 ADDR_ERROR consumer returns 0
           <...>-1950  [000]   366.581309: cm_id_destroy:        cm.id=0 src=0.0.0.0:45919 dst=192.168.2.55:20049 tos=0
           <...>-7     [001]   369.589400: cm_event_handler:     cm.id=0 src=0.0.0.0:49023 dst=192.168.2.55:20049 tos=0 ADDR_ERROR (1/-19)
           <...>-7     [001]   369.589404: cm_event_done:        cm.id=0 src=0.0.0.0:49023 dst=192.168.2.55:20049 tos=0 ADDR_ERROR consumer returns 0
           <...>-1950  [000]   369.589407: cm_id_destroy:        cm.id=0 src=0.0.0.0:49023 dst=192.168.2.55:20049 tos=0
           <...>-4261  [001]   372.597650: cm_id_attach:         cm.id=0 src=192.168.2.51:47492 dst=192.168.2.55:20049 device=mlx4_0
           <...>-4261  [001]   372.597652: cm_event_handler:     cm.id=0 src=192.168.2.51:47492 dst=192.168.2.55:20049 tos=0 ADDR_RESOLVED (0/0)
           <...>-4261  [001]   372.597654: cm_event_done:        cm.id=0 src=192.168.2.51:47492 dst=192.168.2.55:20049 tos=0 ADDR_RESOLVED consumer returns 0
           <...>-4261  [001]   372.597738: cm_event_handler:     cm.id=0 src=192.168.2.51:47492 dst=192.168.2.55:20049 tos=0 ROUTE_RESOLVED (2/0)
           <...>-4261  [001]   372.597740: cm_event_done:        cm.id=0 src=192.168.2.51:47492 dst=192.168.2.55:20049 tos=0 ROUTE_RESOLVED consumer returns 0
           <...>-4691  [007]   372.600101: cm_qp_create:         cm.id=0 src=192.168.2.51:47492 dst=192.168.2.55:20049 tos=0 pd.id=2 qp_type=RC send_wr=4091 recv_wr=256 qp_num=530 rc=0
           <...>-4691  [007]   372.600207: cm_send_req:          cm.id=0 src=192.168.2.51:47492 dst=192.168.2.55:20049 tos=0 qp_num=530
           <...>-185   [002]   372.601212: cm_send_mra:          cm.id=0 src=192.168.2.51:47492 dst=192.168.2.55:20049 tos=0
           <...>-185   [002]   372.601362: cm_send_rtu:          cm.id=0 src=192.168.2.51:47492 dst=192.168.2.55:20049 tos=0
           <...>-185   [002]   372.601372: cm_event_handler:     cm.id=0 src=192.168.2.51:47492 dst=192.168.2.55:20049 tos=0 ESTABLISHED (9/0)
           <...>-185   [002]   372.601379: cm_event_done:        cm.id=0 src=192.168.2.51:47492 dst=192.168.2.55:20049 tos=0 ESTABLISHED consumer returns 0

Fixes: ed999f820a ("RDMA/cma: Add trace points in RDMA Connection Manager")
Link: https://lore.kernel.org/r/20200530174934.21362.56754.stgit@manet.1015granger.net
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-02 20:32:54 -03:00
Dan Carpenter
24c567ff75 IB/hfi1: Fix hfi1_netdev_rx_init() error handling
The hfi1_vnic_up() function doesn't check whether hfi1_netdev_rx_init()
returns errors.  In hfi1_vnic_init() we need to change the code to
preserve the error code instead of returning success.

Fixes: 2280740f01 ("IB/hfi1: Virtual Network Interface Controller (VNIC) HW support")
Fixes: 4730f4a6c6 ("IB/hfi1: Activate the dummy netdev")
Link: https://lore.kernel.org/r/20200530140224.GA1330098@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-02 20:32:54 -03:00
Jason Gunthorpe
4d12c04caa RDMA: Remove 'max_map_per_fmr'
Now that FMR support is gone, this attribute can be deleted from all
places.

Link: https://lore.kernel.org/r/13-v3-f58e6669d5d3+2cf-fmr_removal_jgg@mellanox.com
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-02 20:32:54 -03:00
Jason Gunthorpe
649392bf75 RDMA: Remove 'max_fmr'
Now that FMR support is gone, this attribute can be deleted from all
places.

Link: https://lore.kernel.org/r/12-v3-f58e6669d5d3+2cf-fmr_removal_jgg@mellanox.com
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-02 20:32:54 -03:00
Max Gurtovoy
3a578152a9 RDMA/core: Remove FMR device ops
After removing FMR support from all the RDMA ULPs and providers, there
is no need to keep FMR operation for IB devices.

Link: https://lore.kernel.org/r/11-v3-f58e6669d5d3+2cf-fmr_removal_jgg@mellanox.com
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-02 20:32:54 -03:00
Max Gurtovoy
22c9cc2408 RDMA/rdmavt: Remove FMR memory registration
Use FRWR method to register memory by default and remove the ancient and
unsafe FMR method.

Link: https://lore.kernel.org/r/10-v3-f58e6669d5d3+2cf-fmr_removal_jgg@mellanox.com
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Tested-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-02 20:32:54 -03:00
Max Gurtovoy
d6747b3715 RDMA/mthca: Remove FMR support for memory registration
Remove the ancient and unsafe FMR method.

Link: https://lore.kernel.org/r/9-v3-f58e6669d5d3+2cf-fmr_removal_jgg@mellanox.com
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-02 20:32:54 -03:00
Max Gurtovoy
1f55b7ab90 RDMA/mlx4: Remove FMR support for memory registration
HCA's that are driven by mlx4 driver support FRWR method to register
memory. Remove the ancient and unsafe FMR method.

Link: https://lore.kernel.org/r/8-v3-f58e6669d5d3+2cf-fmr_removal_jgg@mellanox.com
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-02 20:32:54 -03:00
Jason Gunthorpe
f0c73c70db RDMA/i40iw: Remove FMR leftovers
The ibfmr member is never referenced, remove it.

Link: https://lore.kernel.org/r/7-v3-f58e6669d5d3+2cf-fmr_removal_jgg@mellanox.com
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-02 20:32:54 -03:00
Jason Gunthorpe
7c08bc1956 RDMA/bnxt_re: Remove FMR leftovers
The bnxt_re_fmr struct is never referenced and the max_fmr items
in bnxt_qplib_dev_attr are never read.

Link: https://lore.kernel.org/r/6-v3-f58e6669d5d3+2cf-fmr_removal_jgg@mellanox.com
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-02 20:32:53 -03:00
Gal Pressman
d29d58e772 RDMA/mlx5: Remove FMR leftovers
Remove a few leftovers from FMR functionality which are no longer used.

Link: https://lore.kernel.org/r/5-v3-f58e6669d5d3+2cf-fmr_removal_jgg@mellanox.com
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-02 20:32:53 -03:00
Max Gurtovoy
4e373d5417 RDMA/core: Remove FMR pool API
This ancient and unsafe method for memory registration is no longer used
by any RDMA based ULP. Remove the FMR pool API from the core driver.

Link: https://lore.kernel.org/r/4-v3-f58e6669d5d3+2cf-fmr_removal_jgg@mellanox.com
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-02 20:32:53 -03:00
Max Gurtovoy
f273ad4f8d RDMA/srp: Remove support for FMR memory registration
FMR is not supported on most recent RDMA devices (that use fast memory
registration mechanism). Also, FMR was recently removed from NFS/RDMA
ULP.

Link: https://lore.kernel.org/r/2-v3-f58e6669d5d3+2cf-fmr_removal_jgg@mellanox.com
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-02 20:32:53 -03:00
Israel Rukshin
1fc431320a RDMA/iser: Remove support for FMR memory registration
FMR is not supported on most recent RDMA devices (that use fast memory
registration mechanism). Also, FMR was recently removed from NFS/RDMA
ULP.

Link: https://lore.kernel.org/r/1-v3-f58e6669d5d3+2cf-fmr_removal_jgg@mellanox.com
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-02 20:32:53 -03:00
Linus Torvalds
e0cd920687 Merge branch 'uaccess.access_ok' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull uaccess/access_ok updates from Al Viro:
 "Removals of trivially pointless access_ok() calls.

  Note: the fiemap stuff was removed from the series, since they are
  duplicates with part of ext4 series carried in Ted's tree"

* 'uaccess.access_ok' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vmci_host: get rid of pointless access_ok()
  hfi1: get rid of pointless access_ok()
  usb: get rid of pointless access_ok() calls
  lpfc_debugfs: get rid of pointless access_ok()
  efi_test: get rid of pointless access_ok()
  drm_read(): get rid of pointless access_ok()
  via-pmu: don't bother with access_ok()
  drivers/crypto/ccp/sev-dev.c: get rid of pointless access_ok()
  omapfb: get rid of pointless access_ok() calls
  amifb: get rid of pointless access_ok() calls
  drivers/fpga/dfl-afu-dma-region.c: get rid of pointless access_ok()
  drivers/fpga/dfl-fme-pr.c: get rid of pointless access_ok()
  cm4000_cs.c cmm_ioctl(): get rid of pointless access_ok()
  nvram: drop useless access_ok()
  n_hdlc_tty_read(): remove pointless access_ok()
  tomoyo_write_control(): get rid of pointless access_ok()
  btrfs_ioctl_send(): don't bother with access_ok()
  fat_dir_ioctl(): hadn't needed that access_ok() for more than a decade...
  dlmfs_file_write(): get rid of pointless access_ok()
2020-06-01 16:09:43 -07:00
David S. Miller
1806c13dc2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
xdp_umem.c had overlapping changes between the 64-bit math fix
for the calculation of npgs and the removal of the zerocopy
memory type which got rid of the chunk_size_nohdr member.

The mlx5 Kconfig conflict is a case where we just take the
net-next copy of the Kconfig entry dependency as it takes on
the ESWITCH dependency by one level of indirection which is
what the 'net' conflicting change is trying to ensure.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-31 17:48:46 -07:00
Saeed Mahameed
971ae1ed03 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
net/mlx5: Add ability to read and write ECE options
  net/mlx5: Add support for RDMA TX FT headers modifying
  net/mlx5: Move iseg access helper routines close to mlx5_core driver
  net/mlx5: Cleanup mlx5_ifc_fte_match_set_misc2_bits
  net/mlx5: Add support in forward to namespace
  {IB/net}/mlx5: Simplify don't trap code
  net/mlx5: Replace zero-length array with flexible-array

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-29 14:38:57 -07:00
Yamin Friedman
c7ff819aef RDMA/core: Introduce shared CQ pool API
Allow a ULP to ask the core to provide a completion queue based on a
least-used search on a per-device CQ pools. The device CQ pools grow in a
lazy fashion when more CQs are requested.

This feature reduces the amount of interrupts when using many QPs.  Using
shared CQs allows for more effcient completion handling. It also reduces
the amount of overhead needed for CQ contexts.

Test setup:
Intel(R) Xeon(R) Platinum 8176M CPU @ 2.10GHz servers.
Running NVMeoF 4KB read IOs over ConnectX-5EX across Spectrum switch.
TX-depth = 32. The patch was applied in the nvme driver on both the target
and initiator. Four controllers are accessed from each core. In the
current test case we have exposed sixteen NVMe namespaces using four
different subsystems (four namespaces per subsystem) from one NVM port.
Each controller allocated X queues (RDMA QPs) and attached to Y CQs.
Before this series we had X == Y, i.e for four controllers we've created
total of 4X QPs and 4X CQs. In the shared case, we've created 4X QPs and
only X CQs which means that we have four controllers that share a
completion queue per core. Until fourteen cores there is no significant
change in performance and the number of interrupts per second is less than
a million in the current case.
==================================================
|Cores|Current KIOPs  |Shared KIOPs  |improvement|
|-----|---------------|--------------|-----------|
|14   |2332           |2723          |16.7%      |
|-----|---------------|--------------|-----------|
|20   |2086           |2712          |30%        |
|-----|---------------|--------------|-----------|
|28   |1971           |2669          |35.4%      |
|=================================================
|Cores|Current avg lat|Shared avg lat|improvement|
|-----|---------------|--------------|-----------|
|14   |767us          |657us         |14.3%      |
|-----|---------------|--------------|-----------|
|20   |1225us         |943us         |23%        |
|-----|---------------|--------------|-----------|
|28   |1816us         |1341us        |26.1%      |
========================================================
|Cores|Current interrupts|Shared interrupts|improvement|
|-----|------------------|-----------------|-----------|
|14   |1.6M/sec          |0.4M/sec         |72%        |
|-----|------------------|-----------------|-----------|
|20   |2.8M/sec          |0.6M/sec         |72.4%      |
|-----|------------------|-----------------|-----------|
|28   |2.9M/sec          |0.8M/sec         |63.4%      |
====================================================================
|Cores|Current 99.99th PCTL lat|Shared 99.99th PCTL lat|improvement|
|-----|------------------------|-----------------------|-----------|
|14   |67ms                    |6ms                    |90.9%      |
|-----|------------------------|-----------------------|-----------|
|20   |5ms                     |6ms                    |-10%       |
|-----|------------------------|-----------------------|-----------|
|28   |8.7ms                   |6ms                    |25.9%      |
|===================================================================

Performance improvement with sixteen disks (sixteen CQs per core) is
comparable.

Link: https://lore.kernel.org/r/1590568495-101621-3-git-send-email-yaminf@mellanox.com
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-29 16:09:02 -03:00
Yamin Friedman
3446cbd2d5 RDMA/core: Add protection for shared CQs used by ULPs
A pre-step for adding shared CQs. Add the infrastructure to prevent shared
CQ users from altering the CQ configurations. For now all cqs are marked
as private (non-shared). The core driver should use the new force
functions to perform resize/destroy/moderation changes that are not
allowed for users of shared CQs.

Link: https://lore.kernel.org/r/1590568495-101621-2-git-send-email-yaminf@mellanox.com
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-29 15:40:51 -03:00
Qiushi Wu
0b8e125e21 RDMA/core: Fix several reference count leaks.
kobject_init_and_add() takes reference even when it fails.  If this
function returns an error, kobject_put() must be called to properly clean
up the memory associated with the object. Previous
commit b8eb718348 ("net-sysfs: Fix reference count leak in
rx|netdev_queue_add_kobject") fixed a similar problem.

Link: https://lore.kernel.org/r/20200528030231.9082-1-wu000273@umn.edu
Signed-off-by: Qiushi Wu <wu000273@umn.edu>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-29 15:35:49 -03:00
Colin Ian King
bcafcdfdae IB/hfi1: Fix spelling mistake "enought" -> "enough"
There is a spelling mistake in an error message. Fix it.

Link: https://lore.kernel.org/r/20200528110709.400935-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-29 15:27:51 -03:00
Colin Ian King
48062b0a8b RDMA/hns: remove duplicate assignment to pointer raq
The pointer raq is being assigned twice. Fix this by removing one of the
redundant assignments.

Fixes: 14ba87304b ("RDMA/hns: Remove redundant type cast for general pointers")
Link: https://lore.kernel.org/r/20200528150427.420624-1-colin.king@canonical.com
Addressses-Coverity: ("Evaluation order violation")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-29 15:19:57 -03:00
Mark Zhang
802dcc7fc5 RDMA/mlx5: Support TX port affinity for VF drivers in LAG mode
The mlx5 VF driver doesn't set QP tx port affinity because it doesn't know
if the lag is active or not, since the "lag_active" works only for PF
interfaces. In this case for VF interfaces only one lag is used which
brings performance issue.

Add a lag_tx_port_affinity CAP bit; When it is enabled and
"num_lag_ports > 1", then driver always set QP tx affinity, regardless
of lag state.

Link: https://lore.kernel.org/r/20200527055014.355093-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-29 15:15:30 -03:00
Bart Van Assche
e0cca8b456 RDMA/srpt: Increase max_send_sge
The ib_srpt driver limits max_send_sge to 16. Since that is a workaround
for an mlx4 bug that has been fixed, increase max_send_sge. See also
commit f95ccffc71 ("IB/mlx4: Use 4K pages for kernel QP's WQE buffer").

Link: https://lore.kernel.org/r/20200525172212.14413-5-bvanassche@acm.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-29 14:49:55 -03:00
Bart Van Assche
66ced2eb2a RDMA/srpt: Reduce max_recv_sge to 1
Since srpt_post_recv() always sets num_sge to 1, reduce the max_recv_sge
parameter that is used at queue pair allocation time to 1.

Link: https://lore.kernel.org/r/20200525172212.14413-4-bvanassche@acm.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-29 14:49:55 -03:00
Bart Van Assche
d4ee7f3a44 RDMA/srpt: Make debug output more detailed
Since the session name by itself is not sufficient to uniquely identify a
queue pair, include the queue pair number. Show the ASCII channel state
name instead of the numeric value. This change makes the ib_srpt debug
output more consistent.

Link: https://lore.kernel.org/r/20200525172212.14413-3-bvanassche@acm.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-29 14:49:55 -03:00
Bart Van Assche
87fee61c35 RDMA/srp: Make the channel count configurable per target
Increase the flexibility of the SRP initiator driver by making the channel
count configurable per target instead of only providing a kernel module
parameter for configuring the channel count.

Link: https://lore.kernel.org/r/20200525172212.14413-2-bvanassche@acm.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-29 14:49:55 -03:00
Al Viro
fd8ec4dd4a hfi1: get rid of pointless access_ok()
pin_user_pages_fast() doesn't need that from its caller.
NB: only reachable from ->ioctl(), and only under USER_DS

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-29 11:06:32 -04:00
Christoph Hellwig
12abc5ee78 tcp: add tcp_sock_set_nodelay
Add a helper to directly set the TCP_NODELAY sockopt from kernel space
without going through a fake uaccess.  Cleanup the callers to avoid
pointless wrappers now that this is a simple function call.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28 11:11:45 -07:00
Christoph Hellwig
b58f0e8f38 net: add sock_set_reuseaddr
Add a helper to directly set the SO_REUSEADDR sockopt from kernel space
without going through a fake uaccess.

For this the iscsi target now has to formally depend on inet to avoid
a mostly theoretical compile failure.  For actual operation it already
did depend on having ipv4 or ipv6 support.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28 11:11:44 -07:00
Valentine Fatiev
1acba6a817 IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode
When connected mode is set, and we have connected and datagram traffic in
parallel, ipoib might crash with double free of datagram skb.

The current mechanism assumes that the order in the completion queue is
the same as the order of sent packets for all QPs. Order is kept only for
specific QP, in case of mixed UD and CM traffic we have few QPs (one UD and
few CM's) in parallel.

The problem:
----------------------------------------------------------

Transmit queue:
-----------------
UD skb pointer kept in queue itself, CM skb kept in spearate queue and
uses transmit queue as a placeholder to count the number of total
transmitted packets.

0   1   2   3   4  5  6  7  8   9  10  11 12 13 .........127
------------------------------------------------------------
NL ud1 UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ...........
------------------------------------------------------------
    ^                                  ^
   tail                               head

Completion queue (problematic scenario) - the order not the same as in
the transmit queue:

  1  2  3  4  5  6  7  8  9
------------------------------------
 ud1 CM1 UD2 ud3 cm2 cm3 ud4 cm4 ud5
------------------------------------

1. CM1 'wc' processing
   - skb freed in cm separate ring.
   - tx_tail of transmit queue increased although UD2 is not freed.
     Now driver assumes UD2 index is already freed and it could be used for
     new transmitted skb.

0   1   2   3   4  5  6  7  8   9  10  11 12 13 .........127
------------------------------------------------------------
NL NL  UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ...........
------------------------------------------------------------
        ^   ^                       ^
      (Bad)tail                    head
(Bad - Could be used for new SKB)

In this case (due to heavy load) UD2 skb pointer could be replaced by new
transmitted packet UD_NEW, as the driver assumes its free.  At this point
we will have to process two 'wc' with same index but we have only one
pointer to free.

During second attempt to free the same skb we will have NULL pointer
exception.

2. UD2 'wc' processing
   - skb freed according the index we got from 'wc', but it was already
     overwritten by mistake. So actually the skb that was released is the
     skb of the new transmitted packet and not the original one.

3. UD_NEW 'wc' processing
   - attempt to free already freed skb. NUll pointer exception.

The fix:
-----------------------------------------------------------------------

The fix is to stop using the UD ring as a placeholder for CM packets, the
cyclic ring variables tx_head and tx_tail will manage the UD tx_ring, a
new cyclic variables global_tx_head and global_tx_tail are introduced for
managing and counting the overall outstanding sent packets, then the send
queue will be stopped and waken based on these variables only.

Note that no locking is needed since global_tx_head is updated in the xmit
flow and global_tx_tail is updated in the NAPI flow only.  A previous
attempt tried to use one variable to count the outstanding sent packets,
but it did not work since xmit and NAPI flows can run at the same time and
the counter will be updated wrongly. Thus, we use the same simple cyclic
head and tail scheme that we have today for the UD tx_ring.

Fixes: 2c104ea683 ("IB/ipoib: Get rid of the tx_outstanding variable in all modes")
Link: https://lore.kernel.org/r/20200527134705.480068-1-leon@kernel.org
Signed-off-by: Valentine Fatiev <valentinef@mellanox.com>
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 21:14:09 -03:00
Leon Romanovsky
50aec2c313 RDMA/mlx5: Return ECE data after modify QP
After users sets the ECE option, FW will return the agreed/supported bits
through an output structures of modify QP stages for regular QPs or
through create QP for the DCT.

Link: https://lore.kernel.org/r/20200526115440.205922-9-leon@kernel.org
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 16:07:49 -03:00
Leon Romanovsky
5f62a521ff RDMA/mlx5: Set ECE options during modify QP
The most common way to set ECE option will be during modify QP command in
INIT2RTR, RTR2RTS and RTS2RTS stages, so update mlx5 to support it.

The new bit in the comp_mask is needed to mark that kernel supports ECE
and can receive data instead of "reserved" field in the struct
mlx5_ib_modify_qp.

Link: https://lore.kernel.org/r/20200526115440.205922-8-leon@kernel.org
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 16:07:49 -03:00
Leon Romanovsky
f18e26af6a RDMA/mlx5: Convert modify QP to use MLX5_SET macros
Instead of hand crafted mlx5_qp_context and mlx5_qp_path use common
MLX5_SET() macros.

Link: https://lore.kernel.org/r/20200526115440.205922-7-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 16:07:49 -03:00
Leon Romanovsky
70bd7fb876 RDMA/mlx5: Remove manually crafted QP context the query call
As a preparation to removal hand crafted mlx5_qp_context, convert
query_qp_attr() to use proper MLX5_GET() macros.

Link: https://lore.kernel.org/r/20200526115440.205922-6-leon@kernel.org
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 16:07:49 -03:00
Leon Romanovsky
64bae2d455 RDMA/mlx5: Use direct modify QP implementation
As a preparation to removal hand crafted mlx5_qp_context, convert counter
code to use mlx5_cmd_exec_in() directly.

Link: https://lore.kernel.org/r/20200526115440.205922-5-leon@kernel.org
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 16:07:49 -03:00
Leon Romanovsky
e383085c24 RDMA/mlx5: Set ECE options during QP create
Allow users to ask creation of QPs with specific ECE options.  Such early
set even before RDMA-CM connection is established is useful if user knows
exactly which option he needs.

Link: https://lore.kernel.org/r/20200526115440.205922-4-leon@kernel.org
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 16:07:49 -03:00
Leon Romanovsky
3e09a427ae RDMA/mlx5: Get ECE options from FW during create QP
Supported ECE options are returned from FW in the create_qp phase and zero
means that field is not valid. Such default value allows us to reuse
reserved field without worries about comp_mask.

Update create QP API to return ECE options.

Link: https://lore.kernel.org/r/20200526115440.205922-3-leon@kernel.org
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 16:07:49 -03:00
Leon Romanovsky
8094ba0ace RDMA/cma: Provide ECE reject reason
IBTA declares "vendor option not supported" reject reason in REJ messages
if passive side doesn't want to accept proposed ECE options.

Due to the fact that ECE is managed by userspace, there is a need to let
users to provide such rejected reason.

Link: https://lore.kernel.org/r/20200526103304.196371-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 16:05:05 -03:00
Leon Romanovsky
0cb15372a6 RDMA/cma: Connect ECE to rdma_accept
The rdma_accept() is called by both passive and active sides of CMID
connection to mark readiness to start data transfer. For passive side,
this is called explicitly, for active side, it is called implicitly while
receiving REP message.

Provide ECE data to rdma_accept function needed for passive side to send
that REP message.

Link: https://lore.kernel.org/r/20200526103304.196371-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 16:05:05 -03:00
Leon Romanovsky
a20652e175 RDMA/cm: Send and receive ECE parameter over the wire
ECE parameters are exchanged through REQ->REP/SIDR_REP messages, this
patch adds the data to provide to other side of CMID communication
channel.

Link: https://lore.kernel.org/r/20200526103304.196371-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 16:05:05 -03:00
Leon Romanovsky
93531ee7b9 RDMA/ucma: Deliver ECE parameters through UCMA events
Passive side of CMID connection receives ECE request through REQ message
and needs to respond with relevant REP message which will be forwarded to
active side.

The UCMA events interface is responsible for such communication with the
user space (librdmacm). Extend it to provide ECE wire data.

Link: https://lore.kernel.org/r/20200526103304.196371-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 16:05:05 -03:00
Leon Romanovsky
34e2ab57a9 RDMA/ucma: Extend ucma_connect to receive ECE parameters
Active side of CMID initiates connection through librdmacm's
rdma_connect() and kernel's ucma_connect(). Extend UCMA interface to
handle those new parameters.

Link: https://lore.kernel.org/r/20200526103304.196371-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 16:05:05 -03:00
Jason Gunthorpe
e4fdf7625b Merge branch 'mellanox/mlx5-next' into rdma.git for/next
From the mlx5-next branch at
  git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Required for dependencies in following patches

* branch 'mellanox/mlx5-next':
  net/mlx5: Add ability to read and write ECE options
  net/mlx5: Add support for RDMA TX FT headers modifying
  net/mlx5: Move iseg access helper routines close to mlx5_core driver
  net/mlx5: Cleanup mlx5_ifc_fte_match_set_misc2_bits

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 16:01:17 -03:00
Mark Zhang
d246a30615 IB/mlx5: Fix DEVX support for MLX5_CMD_OP_INIT2INIT_QP command
The commit citied in the Fixes line wasn't complete and solved
only part of the problems. Update the mlx5_ib to properly support
MLX5_CMD_OP_INIT2INIT_QP command in the DEVX, that is required when
modify the QP tx_port_affinity.

Fixes: 819f7427ba ("RDMA/mlx5: Add init2init as a modify command")
Link: https://lore.kernel.org/r/20200527135703.482501-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 15:53:21 -03:00
Jason Gunthorpe
c85f4abe66 RDMA/core: Fix double destruction of uobject
Fix use after free when user user space request uobject concurrently for
the same object, within the RCU grace period.

In that case, remove_handle_idr_uobject() is called twice and we will have
an extra put on the uobject which cause use after free.  Fix it by leaving
the uobject write locked after it was removed from the idr.

Call to rdma_lookup_put_uobject with UVERBS_LOOKUP_DESTROY instead of
UVERBS_LOOKUP_WRITE will do the work.

  refcount_t: underflow; use-after-free.
  WARNING: CPU: 0 PID: 1381 at lib/refcount.c:28 refcount_warn_saturate+0xfe/0x1a0
  Kernel panic - not syncing: panic_on_warn set ...
  CPU: 0 PID: 1381 Comm: syz-executor.0 Not tainted 5.5.0-rc3 #8
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0x94/0xce
   panic+0x234/0x56f
   __warn+0x1cc/0x1e1
   report_bug+0x200/0x310
   fixup_bug.part.11+0x32/0x80
   do_error_trap+0xd3/0x100
   do_invalid_op+0x31/0x40
   invalid_op+0x1e/0x30
  RIP: 0010:refcount_warn_saturate+0xfe/0x1a0
  Code: 0f 0b eb 9b e8 23 f6 6d ff 80 3d 6c d4 19 03 00 75 8d e8 15 f6 6d ff 48 c7 c7 c0 02 55 bd c6 05 57 d4 19 03 01 e8 a2 58 49 ff <0f> 0b e9 6e ff ff ff e8 f6 f5 6d ff 80 3d 42 d4 19 03 00 0f 85 5c
  RSP: 0018:ffffc90002df7b98 EFLAGS: 00010282
  RAX: 0000000000000000 RBX: ffff88810f6a193c RCX: ffffffffba649009
  RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88811b0283cc
  RBP: 0000000000000003 R08: ffffed10236060e3 R09: ffffed10236060e3
  R10: 0000000000000001 R11: ffffed10236060e2 R12: ffff88810f6a193c
  R13: ffffc90002df7d60 R14: 0000000000000000 R15: ffff888116ae6a08
   uverbs_uobject_put+0xfd/0x140
   __uobj_perform_destroy+0x3d/0x60
   ib_uverbs_close_xrcd+0x148/0x170
   ib_uverbs_write+0xaa5/0xdf0
   __vfs_write+0x7c/0x100
   vfs_write+0x168/0x4a0
   ksys_write+0xc8/0x200
   do_syscall_64+0x9c/0x390
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x465b49
  Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007f759d122c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 000000000073bfa8 RCX: 0000000000465b49
  RDX: 000000000000000c RSI: 0000000020000080 RDI: 0000000000000003
  RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 00007f759d1236bc
  R13: 00000000004ca27c R14: 000000000070de40 R15: 00000000ffffffff
  Dumping ftrace buffer:
     (ftrace buffer empty)
  Kernel Offset: 0x39400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Fixes: 7452a3c745 ("IB/uverbs: Allow RDMA_REMOVE_DESTROY to work concurrently with disassociate")
Link: https://lore.kernel.org/r/20200527135534.482279-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 14:22:57 -03:00
Gustavo A. R. Silva
bebcfe85f4 RDMA/core: Use sizeof_field() helper
Make use of the sizeof_field() helper instead of an open-coded version.

Link: https://lore.kernel.org/r/20200527144152.GA22605@embeddedor
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 13:46:05 -03:00
Kamal Heib
ebd6e96b33 RDMA/ipoib: Remove can_sleep parameter from iboib_mcast_alloc
can_sleep is always 0 when iboib_mcast_alloc() is called, so remove it and
use GFP_ATOMIC instead of GFP_KERNEL.

Link: https://lore.kernel.org/r/20200525130305.171509-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-25 15:48:12 -03:00
Potnuri Bharat Teja
49ea0c036e RDMA/iw_cxgb4: cleanup device debugfs entries on ULD remove
Remove device specific debugfs entries immediately if LLD detaches a
particular ULD device in case of fatal PCI errors.

Link: https://lore.kernel.org/r/20200524190814.17599-1-bharat@chelsio.com
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-25 15:38:35 -03:00
Qiushi Wu
db857e6ae5 RDMA/pvrdma: Fix missing pci disable in pvrdma_pci_probe()
In function pvrdma_pci_probe(), pdev was not disabled in one error
path. Thus replace the jump target “err_free_device” by
"err_disable_pdev".

Fixes: 29c8d9eba5 ("IB: Add vmw_pvrdma driver")
Link: https://lore.kernel.org/r/20200523030457.16160-1-wu000273@umn.edu
Signed-off-by: Qiushi Wu <wu000273@umn.edu>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-25 14:24:09 -03:00
Yixian Liu
e1b43f07c0 RDMA/hns: Make the end of sge process more clear
Instead of i with the sge number of wr will make the comparision more
clear, that is, when the sge number in wr is small than the maximum
supported sge number in the queue, then a stop sge needed to be filled at
the end of sges in wr.

Link: https://lore.kernel.org/r/1590152579-32364-5-git-send-email-liweihang@huawei.com
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-25 14:20:46 -03:00
Lang Cheng
e4aaf4bad4 RDMA/hns: Simplify process related to poll cq
Set hns_roce_v2_cq_set_ci to inline type and remove unnecessary
next_cqe_sw_v2().

Link: https://lore.kernel.org/r/1590152579-32364-4-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-25 14:20:45 -03:00
Wenpeng Liang
f226f6765f RDMA/hns: Remove redundant parameters from free_srq/qp_wrid()
The redundant parameters "hr_dev" need to be removed from
free_kernel_wrid() and free_srq_wrid().

Link: https://lore.kernel.org/r/1590152579-32364-3-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-25 14:20:45 -03:00
Weihang Li
14ba87304b RDMA/hns: Remove redundant type cast for general pointers
There is no need to do a type cast on genernal pointers, they could be
assigned to any type of variables. In addition, optimize initialization of
some variables and adjust order of them.

Link: https://lore.kernel.org/r/1590152579-32364-2-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-25 14:20:45 -03:00
Xi Wang
8e029d386b RDMA/hns: Optimize the usage of MTR
Currently, the MTR region is configed before hns_roce_mtr_map() is
invoked, but in some scenarios, the region is configed at MTR creation,
the caller need to store this config and call hns_roce_mtr_map() later. So
optimize the usage by wrapping the MTR region config into MTR.

Link: https://lore.kernel.org/r/1589982799-28728-10-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-25 14:02:13 -03:00
Xi Wang
494c3b3122 RDMA/hns: Refactor the QP context filling process related to WQE buffer configure
Split the code related to WQE buffer configure from the QPC filling
process into two functions: config_qp_sq_buf() and config_qp_rq_buf(),
this will make the code more readable.

Link: https://lore.kernel.org/r/1589982799-28728-9-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-25 14:02:13 -03:00
Weihang Li
13aa13dddd RDMA/hns: Change variables representing quantity to unsigned
Number of sge/eqe is always non-negative, they should be defined in type
of unsigned.

Link: https://lore.kernel.org/r/1589982799-28728-8-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-25 14:02:12 -03:00
Weihang Li
82d07a4e46 RDMA/hns: Change all page_shift to unsigned
page_shift is used to calculate the page size, it's always non-negative,
and should be in type of unsigned.

Link: https://lore.kernel.org/r/1589982799-28728-7-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-25 14:02:12 -03:00
Xi Wang
e9f2cd2825 RDMA/hns: Rename QP buffer related function
Rename the function related to QP buffer to make the code more readable.

Link: https://lore.kernel.org/r/1589982799-28728-6-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-25 14:02:12 -03:00
Yangyang Li
b9c93e3aad RDMA/hns: Remove unused code about assert
The codes related to assert are no longer used and need to be deleted.

Link: https://lore.kernel.org/r/1589982799-28728-5-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-25 14:02:12 -03:00
Lang Cheng
0db6570947 RDMA/hns: Optimize post and poll process
Add unlikely() and likely() to optimize main I/O process code.

Link: https://lore.kernel.org/r/1589982799-28728-4-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-25 14:02:11 -03:00
Lang Cheng
05e6a5a635 RDMA/hns: Add CQ flag instead of independent enable flag
It's easier to understand and maintain enable flags of cq using a single
field in type of u32 than defining a field for every flags in the
structure hns_roce_cq, and we can add new flags for features more
conveniently in the future.

Link: https://lore.kernel.org/r/1589982799-28728-3-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-25 14:02:11 -03:00
Lang Cheng
25966e8931 RDMA/hns: Let software PI/CI grow naturally
The hardware can truncate PI/CI when posting or polling, the driver does
not need to do truncation. Therefore keep the software's PI/CI consistent
with it in the hardware.

Link: https://lore.kernel.org/r/1589982799-28728-2-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-25 14:02:11 -03:00
Danil Kipnis
a94dae867c RDMA/rtrs: Get rid of the do_next_path while_next_path macros
The macros do_each_path/while_each_path lead to a smatch warning:

drivers/infiniband/ulp/rtrs/rtrs-clt.c:1196 rtrs_clt_failover_req() warn: inconsistent indenting
drivers/infiniband/ulp/rtrs/rtrs-clt.c:2890 rtrs_clt_request() warn: inconsistent indenting

Also checkpatch complains:
ERROR: Macros with multiple statements should be enclosed in a do - while loop

The macros are used only in two places: for a normal IO path and for the
failover path triggered after errors.

Get rid of the macros and just use a for loop iterating over the list of
paths in both places. It is easier to read and also less lines of code.

Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/20200522053924.528980-1-danil.kipnis@cloud.ionos.com
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-22 15:50:22 -03:00
Md Haris Iqbal
e172037be7 RDMA/rtrs: server: Use already dereferenced rtrs_sess structure
The rtrs_sess structure has already been extracted above from the
rtrs_srv_sess structure. Use that to avoid redundant dereferencing.

Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20200522082833.1480551-1-haris.phnx@gmail.com
Signed-off-by: Md Haris Iqbal <haris.phnx@gmail.com>
Acked-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-22 15:50:22 -03:00
Maor Gottlieb
63a3345c2d IB/cma: Fix ports memory leak in cma_configfs
The allocated ports structure in never freed. The free function should be
called by release_cma_ports_group, but the group is never released since
we don't remove its default group.

Remove default groups when device group is deleted.

Fixes: 045959db65 ("IB/cma: Add configfs for rdma_cm")
Link: https://lore.kernel.org/r/20200521072650.567908-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-22 15:37:19 -03:00
Maor Gottlieb
189277f381 RDMA/mlx5: Fix NULL pointer dereference in destroy_prefetch_work
q_deferred_work isn't initialized when creating an explicit ODP memory
region. This can lead to a NULL pointer dereference when user performs
asynchronous prefetch MR. Fix it by initializing q_deferred_work for
explicit ODP.

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP PTI
  CPU: 4 PID: 6074 Comm: kworker/u16:6 Not tainted 5.7.0-rc1-for-upstream-perf-2020-04-17_07-03-39-64 #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  Workqueue: events_unbound mlx5_ib_prefetch_mr_work [mlx5_ib]
  RIP: 0010:__wake_up_common+0x49/0x120
  Code: 04 89 54 24 0c 89 4c 24 08 74 0a 41 f6 01 04 0f 85 8e 00 00 00 48 8b 47 08 48 83 e8 18 4c 8d 67 08 48 8d 50 18 49 39 d4 74 66 <48> 8b 70 18 31 db 4c 8d 7e e8 eb 17 49 8b 47 18 48 8d 50 e8 49 8d
  RSP: 0000:ffffc9000097bd88 EFLAGS: 00010082
  RAX: ffffffffffffffe8 RBX: ffff888454cd9f90 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff888454cd9f90
  RBP: ffffc9000097bdd0 R08: 0000000000000000 R09: ffffc9000097bdd0
  R10: 0000000000000000 R11: 0000000000000001 R12: ffff888454cd9f98
  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000003
  FS:  0000000000000000(0000) GS:ffff88846fd00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 000000044c19e002 CR4: 0000000000760ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
   __wake_up_common_lock+0x7a/0xc0
   destroy_prefetch_work+0x5a/0x60 [mlx5_ib]
   mlx5_ib_prefetch_mr_work+0x64/0x80 [mlx5_ib]
   process_one_work+0x15b/0x360
   worker_thread+0x49/0x3d0
   kthread+0xf5/0x130
   ? rescuer_thread+0x310/0x310
   ? kthread_bind+0x10/0x10
   ret_from_fork+0x1f/0x30

Fixes: de5ed007a0 ("IB/mlx5: Fix implicit ODP race")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200521072504.567406-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 20:51:50 -03:00
Yishai Hadas
6d1e7ba241 IB/uverbs: Introduce create/destroy QP commands over ioctl
Introduce create/destroy QP commands over the ioctl interface to let it
be extended to get an asynchronous event FD.

Link: https://lore.kernel.org/r/20200519072711.257271-8-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 20:39:36 -03:00
Yishai Hadas
ef3bc084a8 IB/uverbs: Introduce create/destroy WQ commands over ioctl
Introduce create/destroy WQ commands over the ioctl interface to let it
be extended to get an asynchronous event FD.

Link: https://lore.kernel.org/r/20200519072711.257271-7-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 20:39:35 -03:00
Yishai Hadas
c3eab946ab IB/uverbs: Introduce create/destroy SRQ commands over ioctl
Introduce create/destroy SRQ commands over the ioctl interface to let it
be extended to get an asynchronous event FD.

Link: https://lore.kernel.org/r/20200519072711.257271-6-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 20:39:35 -03:00
Yishai Hadas
cda9ee4942 IB/uverbs: Extend CQ to get its own asynchronous event FD
Extend CQ to get its own asynchronous event FD.
The event FD is an optional attribute, in case wasn't given the ufile
event FD will be used.

Link: https://lore.kernel.org/r/20200519072711.257271-4-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 20:34:53 -03:00
Yishai Hadas
98a8890f73 IB/uverbs: Refactor related objects to use their own asynchronous event FD
Refactor related objects to use their own asynchronous event FD.
The ufile event FD will be the default in case an object won't have its own
event FD.

Link: https://lore.kernel.org/r/20200519072711.257271-3-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 20:34:53 -03:00
Jason Gunthorpe
0ac8903cbb RDMA/core: Allow the ioctl layer to abort a fully created uobject
While creating a uobject every create reaches a point where the uobject is
fully initialized. For ioctls that go on to copy_to_user this means they
need to open code the destruction of a fully created uobject - ie the
RDMA_REMOVE_DESTROY sort of flow.

Open coding this creates bugs, eg the CQ does not properly flush the
events list when it does its error unwind.

Provide a uverbs_finalize_uobj_create() function which indicates that the
uobject is fully initialized and that abort should call to destroy_hw to
destroy the uobj->object and related.

Methods can call this function if they go on to have error cases after
setting uobj->object. Once done those error cases can simply do return,
without an error unwind.

Link: https://lore.kernel.org/r/20200519072711.257271-2-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 20:10:46 -03:00
Jason Gunthorpe
eafd47fc20 Linux 5.7-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl7BzV8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGg8EH/A2pXMTxtc96RI4S
 sttEsUQqbakFS0Z/2tQPpMGr/qW2e5eHgsTX/a3SiUeZiIXk6f4lMFkMuctzBf7p
 X77cNEDwGOEdbtCXTsMcmKSde7sP2zCXsPB8xTWLyE6rnaFRgikwwkeqgkIKhp1h
 bvOQV0t9HNGvxGAM0iZeOvQAvFl4vd7nS123/MYbir9cugfQUSJRueQ4BiCiJqVE
 6cNA7/vFzDJuFGszzIrJ7HXn/IdQMMWHkvTDjgBw0GZw1mDbGFbfbZwOeTz1ojCt
 smUQ4tIFxBa/VA5zx7dOy2P2keHbSVf4VLkZRPcceT7OqVS65ETmFDp+qt5NdWM5
 vZ8+7/0=
 =CyYH
 -----END PGP SIGNATURE-----

Merge tag 'v5.7-rc6' into rdma.git for-next

Linux 5.7-rc6

Conflict in drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c
resolved by deleting dr_cq_event, matching how netdev resolved it.

Required for dependencies in the following patches.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 17:08:27 -03:00
Piotr Stankiewicz
0ad45e5fdc IB/hfi1: Enable the transmit side of the datagram ipoib netdev
This patch hooks the transmit side of the datagram netdev with
ipoib by setting the rdma_netdev_get_params function for the
hfi1 ib_device_ops structue. It also enables the receiving side
by adding the AIP capability into the default capabilities.

Link: https://lore.kernel.org/r/20200511160712.173205.65700.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 11:23:58 -03:00
Gary Leshner
8f149b6847 IB/ipoib: Add capability to switch between datagram and connected mode
This is the prerequisite modification to the ipoib ulp to allow a
rdma netdev to obtain the default ndo ops for init/uninit/open/close.

This is accomplished by setting the netdev ops field within the
callback function passed to the netdev allocation routine which
in turn was passed into the rdma netdev allocation routine.

This allows the rdma netdev to call back into the ulp to create the
resources required for connected mode operation.

Additionally as the ulp is not re-entrant, when switching modes,
the number of real tx queues is set to 1 for the connected mode.

For datagram mode the number of real tx queues is set to the
actual number of tx queues specified at the netdev's allocation.

For the internal ulp netdev the number of tx queues defaults to 1.

It is up to the rdma netdev to specify the actual number it can support.

When the driver does not support a rdma netdev for acceleration,
(-ENOTSUPPORTED return code or the verbs function for allocation is
NULL) the ipoib ulp functions are unaffected by using the internal
netdev allocated by the ipoib ulp.

Link: https://lore.kernel.org/r/20200511160706.173205.19086.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Gary Leshner <Gary.S.Leshner@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 11:23:57 -03:00
Grzegorz Andrejczuk
7638c0e965 IB/hfi1: Add packet histogram trace event
Add a simple trace event taking context number and building simple
histogram to print packets distribution between contexts.

Link: https://lore.kernel.org/r/20200511160700.173205.84270.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 11:23:57 -03:00
Gary Leshner
b7e159eb00 IB/{hfi1, ipoib, rdma}: Broadcast ping sent packets which exceeded mtu size
When in connected mode ipoib sent broadcast pings which exceeded the mtu
size for broadcast addresses.

Add an mtu attribute to the rdma_netdev structure which ipoib sets to its
mcast mtu size.

The RDMA netdev uses this value to determine if the skb length is too long
for the mtu specified and if it is, drops the packet and logs an error
about the errant packet.

Link: https://lore.kernel.org/r/20200511160655.173205.14546.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Gary Leshner <Gary.S.Leshner@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 11:23:57 -03:00
Grzegorz Andrejczuk
4730f4a6c6 IB/hfi1: Activate the dummy netdev
As described in earlier patches, ipoib netdev will share receive
contexts with existing VNIC netdev through a dummy netdev. The
following changes are made to achieve that:
- Set up netdev receive contexts after user contexts. A function is
  added to count the available netdev receive contexts.
- Add functions to set/get receive map table free index.
- Rename NUM_VNIC_MAP_ENTRIES as NUM_NETDEV_MAP_ENTRIES.
- Let the dummy netdev own the receive contexts instead of VNIC.
- Allocate the dummy netdev when the hfi1 device is added and free it
  when the device is removed.
- Initialize AIP RSM rules when the IpoIb rxq is initialized and
  remove the rules when it is de-initialized.
- Convert VNIC to use the dummy netdev.

Link: https://lore.kernel.org/r/20200511160649.173205.4626.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 11:23:56 -03:00
Grzegorz Andrejczuk
370caa5b58 IB/hfi1: Add rx functions for dummy netdev
This patch adds the rx functions for the dummy netdev:
- Functions to allocate/free the dummy netdev.
- Functions to allocate/free receiving contexts for the netdev.
- Functions to initialize/de-initialize the receive queue.
- Functions to enable/disable the receive queue.

Link: https://lore.kernel.org/r/20200511160643.173205.75087.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 11:23:56 -03:00
Grzegorz Andrejczuk
0bae02d56b IB/hfi1: Add interrupt handler functions for accelerated ipoib
This patch adds the interrupt handler function, the NAPI poll
function, and its associated helper functions for receiving
accelerated ipoib packets. While we are here, fix the formats
of two error printouts.

Link: https://lore.kernel.org/r/20200511160637.173205.64890.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 11:23:56 -03:00
Kaike Wan
6991abcb99 IB/hfi1: Add functions to receive accelerated ipoib packets
Ipoib netdev will share receive contexts with existing VNIC netdev.
To achieve that, a dummy netdev is allocated with hfi1_devdata to
own the receive contexts, and ipoib and VNIC netdevs will be put
on top of it. Each receive context is associated with a single
NAPI object.

This patch adds the functions to receive incoming packets for
accelerated ipoib.

Link: https://lore.kernel.org/r/20200511160631.173205.54184.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 11:23:56 -03:00
Grzegorz Andrejczuk
89dcaa366b IB/hfi1: Rename num_vnic_contexts as num_netdev_contexts
Rename num_vnic_contexts as num_ndetdev_contexts since VNIC and ipoib
will share the same set of receive contexts.

Link: https://lore.kernel.org/r/20200511160625.173205.53306.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 11:23:55 -03:00
Kaike Wan
6d72344cf6 IB/ipoib: Increase ipoib Datagram mode MTU's upper limit
Currently the ipoib UD mtu is restricted to 4K bytes. Remove this
limitation so that the IPOIB module can potentially use an MTU (in UD
mode) that is bounded by the MTU of the underlying device. A field is
added to the ib_port_attr structure to indicate the maximum physical
MTU the underlying device supports.

Link: https://lore.kernel.org/r/20200511160618.173205.23053.stgit@awfm-01.aw.intel.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 11:23:55 -03:00
Grzegorz Andrejczuk
19d8b90a50 IB/hfi1: RSM rules for AIP
This is implementation of RSM rule for AIP packets.
AIP rule will use rule RSM2 and will match standard
Infiniband packet containg BTH (LNH==BTH) and
having Dest QPN prefixed with value 0x81. Spread between
receive contexts will be done using source QPN bits.

VNIC and AIP will share receive contexts, so their rules
will point to the same RMT entries and their shared
code is moved to separate functions.
If any of the rules is active RMT mapping will be skipped
for latter.

Changed function hfi1_vnic_is_rsm_full to be more general
and moved it from main header to chip.c.

Changed the order of RSM rules because AIP rule as
more specific one is needed to be placed before more
general QOS rule. Rules are occupying two last RSM
registers.

Link: https://lore.kernel.org/r/20200511160612.173205.73002.stgit@awfm-01.aw.intel.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 11:23:55 -03:00
Gary Leshner
7f90a5a069 IB/{rdmavt, hfi1}: Implement creation of accelerated UD QPs
Adds capability to create a qpn to be recognized as an accelerated
UD QP for ipoib.

This is accomplished by reserving 0x81 in byte[0] of the qpn as the
prefix for these qp types and reserving qpns between 0x810000 and
0x81ffff.

The hfi1 capability mask already contained a flag for the VNIC netdev.
This has been renamed and extended to include both VNIC and ipoib.

The rvt code to allocate qps now recognizes this flag and sets 0x81
into byte[0] of the qpn.

The code to allocate qpns is modified to reset the qpn numbering when it
is detected that a value is located in byte[0] for a UD QP and it is a
qpn being requested for net dev use. If it is a regular UD QP then it is
allowable to have bits set in byte[0] of the qpn and provide the
previously normal behavior.

The code to free the qpn now checks for the AIP prefix value of 0x81 and
removes it from the qpn before being freed so that the lower 16 bit
number can be reused.

This patch requires minor changes in the IB core and ipoib to facilitate
the creation of accelerated UP QPs.

Link: https://lore.kernel.org/r/20200511160607.173205.11757.stgit@awfm-01.aw.intel.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Gary Leshner <Gary.S.Leshner@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 11:23:54 -03:00
Gary Leshner
84e3b19a27 IB/hfi1: Remove module parameter for KDETH qpns
The module parameter for KDETH qpns is being removed in favor
of always using the default value of 0x80 as the qpn prefix.
Defines have been added for various KDETH values including
the prefix of 0x80.
The reserved range now starts at the base value for KDETH
qpns (0x80) and extends up to and including the last qpn for
other reserved QP prefixed types.
Adjust other QP prefixed define names to match KDETH defined
names.

Link: https://lore.kernel.org/r/20200511160600.173205.27508.stgit@awfm-01.aw.intel.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Gary Leshner <Gary.S.Leshner@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 11:23:54 -03:00
Gary Leshner
438d7dda98 IB/hfi1: Add the transmit side of a datagram ipoib RDMA netdev
This implements the transmit side of the multiple transmit queue RDMA
netdev used to accelerate ipoib.  The receive side remains the ipoib
internal implementation.

The init/unint/open/stop netdev operations are saved off and called by the
versions within the hfi1 netdev in order to initialize the connected mode
resources present in ipoib thus allowing us to switch modes between
datagram and connected.

The datagram queue pair instantiated by the ipoib ulp is used by this
implementation for its queue pair number and to register with multicast.

The above queue pair is not used on transmit other than its qpn as the
verbs layer is skipped and packets are directly submitted to the sdma
engines.

Link: https://lore.kernel.org/r/20200511160554.173205.1369.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Gary Leshner <Gary.S.Leshner@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 11:23:54 -03:00
Gary Leshner
d99dc602e2 IB/hfi1: Add functions to transmit datagram ipoib packets
This patch implements the mechanism to accelerate the transmit side of
a multiple transmit queue RDMA netdev by submitting the packets to
the SDMA engine directly instead of sending through the verbs layer.

This patch also changes the UD/SEND_ONLY op to output the entropy value
in byte 0 of deth[1]. UD/SEND_ONLY_WITH_IMMEDIATE uses the previous
behavior with no entropy value being output.

The code in the ipoib rdma netdev which submits tx requests upon
successful submission will call trace_sdma_output_ibhdr to output
the ibhdr to the trace buffer.

Link: https://lore.kernel.org/r/20200511160548.173205.45616.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Gary Leshner <Gary.S.Leshner@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 11:23:54 -03:00
Kaike Wan
fe810b509c IB/hfi1: Add accelerated IP capability bit
The accelerated IP capability bit is added to allow users to control
which feature is enabled and disabled.

Link: https://lore.kernel.org/r/20200511160541.173205.96870.stgit@awfm-01.aw.intel.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 11:23:53 -03:00
Gal Pressman
e1ca01a902 RDMA/efa: Report host information to the device
The host info feature allows the driver to infrom the EFA device
firmware with system configuration for debugging and troubleshooting
purposes.

The host info buffer is passed as an admin command DMA mapped control
buffer, and is unmapped and freed once the command CQE is consumed.

Currently, the setting of host info is done for each device on its
probe. Failing to set the host info for the device shall not disturb the
probe flow, any errors will be discarded.

Link: https://lore.kernel.org/r/20200512152204.93091-3-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Guy Tzalik <gtzalik@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 10:05:00 -03:00
Gal Pressman
cc8a635e24 RDMA/efa: Fix setting of wrong bit in get/set_feature commands
When using a control buffer the ctrl_data bit should be set in order to
indicate the control buffer address is valid, not ctrl_data_indirect
which is used when the control buffer itself is indirect.

Fixes: e9c6c53730 ("RDMA/efa: Add common command handlers")
Link: https://lore.kernel.org/r/20200512152204.93091-2-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 10:05:00 -03:00
Aharon Landau
819f7427ba RDMA/mlx5: Add init2init as a modify command
Missing INIT2INIT entry in the list of modify commands caused DEVX
applications to be unable to modify_qp for this transition state. Add the
MLX5_CMD_OP_INIT2INIT_QP opcode to the list of allowed DEVX opcodes.

Fixes: e662e14d80 ("IB/mlx5: Add DEVX support for modify and query commands")
Link: https://lore.kernel.org/r/20200513095550.211345-1-leon@kernel.org
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-19 21:02:12 -03:00
Kaike Wan
a35cd6447e IB/qib: Call kobject_put() when kobject_init_and_add() fails
When kobject_init_and_add() returns an error in the function
qib_create_port_files(), the function kobject_put() is not called for the
corresponding kobject, which potentially leads to memory leak.

This patch fixes the issue by calling kobject_put() even if
kobject_init_and_add() fails. In addition, the ppd->diagc_kobj is released
along with other kobjects when the sysfs is unregistered.

Fixes: f931551baf ("IB/qib: Add new qib driver for QLogic PCIe InfiniBand adapters")
Link: https://lore.kernel.org/r/20200512031328.189865.48627.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Suggested-by: Lin Yi <teroincn@gmail.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-19 20:58:03 -03:00
Lijun Ou
711195e57d RDMA/hns: Reserve one sge in order to avoid local length error
When rq/srq sge length is smaller than sq sge length, it will produce a
local length error and may cause the bus to hang. Therefore, for rq wqe
and srq wqe, one reserved sge pointing to a reserved mr is used to avoid
this error.

Link: https://lore.kernel.org/r/1588931159-56875-10-git-send-email-liweihang@huawei.com
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-19 20:54:59 -03:00
Xi Wang
9581a356cc RDMA/hns: Rename macro for defining hns hardware page size
Rename the PAGE_ADDR_SHIFT as HNS_HW_PAGE_SHIFT to make code more
readable.

Link: https://lore.kernel.org/r/1588931159-56875-9-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-19 20:54:59 -03:00
Weihang Li
252067e950 RDMA/hns: Remove redundant memcpy()
srq_context is a local variables and is only used to get some fields from
buffer of mailbox. It's meaningless to copy mailbox's buffer's contents
back to it.

Link: https://lore.kernel.org/r/1588931159-56875-8-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-19 20:54:58 -03:00
Lang Cheng
7b611d2f6e RDMA/hns: Store mr len information into mr obj
The length information should be stored in the struct ib_mr object,
otherwise the length value of a valid mr object would always be 0.

Link: https://lore.kernel.org/r/1588931159-56875-7-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-19 20:54:58 -03:00
Weihang Li
d4d8138741 RDMA/hns: Fix error with to_hr_hem_entries_count()
For ilog2(x), if x is 0 and not a constant variable, it will return
-1. And there will be an error as below:

 hns3 0000:7d:00.0 hns_0: Local work queue 0x8 catast error, sub_event type is: 2

So modify to_hr_hem_entries_shift() to return 0 if conut is 0.

Fixes: 54d6638765 ("RDMA/hns: Optimize WQE buffer size calculating process")
Link: https://lore.kernel.org/r/1588931159-56875-6-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-19 20:54:58 -03:00
Weihang Li
6968aeb5aa RDMA/hns: Fix wrong assignment of SRQ's max_wr
srq's attribute max_wr should be 1 less than the total count of wqe.

Fixes: ffb1308b88 ("RDMA/hns: Move SRQ code to the reasonable place")
Link: https://lore.kernel.org/r/1588931159-56875-5-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-19 20:54:58 -03:00
Wenpeng Liang
053c0acf52 RDMA/hns: Fix assignment to ba_pg_sz of eqe
When allocating eq buffer, the size of base address page should be defined
by eqe_ba_pg_sz instead of srqwqe_ba_pg_sz.

Fixes: 477a0a3870 ("RDMA/hns: Optimize 0 hop addressing for EQE buffer")
Link: https://lore.kernel.org/r/1588931159-56875-4-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-19 20:54:57 -03:00
Lang Cheng
441c88d5b3 RDMA/hns: Fix cmdq parameter of querying pf timer resource
The firmware has reduced the number of descriptions of command
HNS_ROCE_OPC_QUERY_PF_TIMER_RES to 1. The driver needs to adapt, otherwise
the hardware will report error 4(CMD_NEXT_ERR).

Fixes: 0e40dc2f70 ("RDMA/hns: Add timer allocation support for hip08")
Link: https://lore.kernel.org/r/1588931159-56875-3-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-19 20:54:57 -03:00
Lijun Ou
349be27650 RDMA/hns: Bugfix for querying qkey
The qkey queried through the query ud qp verb is a fixed value and it
should be read from qp context.

Fixes: 926a01dc00 ("RDMA/hns: Add QP operations support for hip08 SoC")
Link: https://lore.kernel.org/r/1588931159-56875-2-git-send-email-liweihang@huawei.com
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-19 20:54:57 -03:00
Gustavo A. R. Silva
bd25c8066f RDMA/siw: Replace one-element array and use struct_size() helper
The current codebase makes use of one-element arrays in the following
form:

struct something {
    int length;
    u8 data[1];
};

struct something *instance;

instance = kmalloc(sizeof(*instance) + size, GFP_KERNEL);
instance->length = size;
memcpy(instance->data, source, size);

but the preferred mechanism to declare variable-length types such as
these ones is a flexible array member[1][2], introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on. So, replace
the one-element array with a flexible-array member.

Also, make use of the new struct_size() helper to properly calculate the
size of struct siw_pbl.

This issue was found with the help of Coccinelle and, audited and fixed
_manually_.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Link: https://lore.kernel.org/r/20200519233018.GA6105@embeddedor
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-19 20:51:55 -03:00
Danil Kipnis
d6ea395072 rnbd/rtrs: Pass max segment size from blk user to the rdma library
When Block Device Layer is disabled, BLK_MAX_SEGMENT_SIZE is undefined.
The rtrs is a transport library and should compile independently of the
block layer. The desired max segment size should be passed down by the
user.

Introduce max_segment_size parameter for the rtrs_clt_open() call.

Fixes: f7a7a5c228 ("block/rnbd: client: main functionality")
Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Fixes: cb80329c94 ("RDMA/rtrs: client: private header with client structs and functions")
Fixes: b5c27cdb09 ("RDMA/rtrs: public interface header to establish RDMA connections")
Link: https://lore.kernel.org/r/20200519111419.924170-1-danil.kipnis@cloud.ionos.com
Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-19 20:43:26 -03:00
Wei Yongjun
6b31afcef5 RDMA/rtrs: server: Fix some error return code
Fix to return negative error code -ENOMEM from the some error handling
cases instead of 0, as done elsewhere in this function.

Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Fixes: 91b11610af ("RDMA/rtrs: server: sysfs interface functions")
Link: https://lore.kernel.org/r/20200519091912.134358-1-weiyongjun1@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-19 20:42:25 -03:00
Gustavo A. R. Silva
e198408670 RDMA/rtrs: client: Fix function return on success
Remove the if-statement and return the value contained in _err_,
unconditionally.

Link: https://lore.kernel.org/r/20200519163612.GA6043@embeddedor
Addresses-Coverity-ID: 1493753 ("Identical code for different branches")
Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-19 20:41:41 -03:00
Dan Carpenter
bf1d8edb38 RDMA/rtrs: Fix a couple off by one bugs in rtrs_srv_rdma_done()
These > comparisons should be >= to prevent accessing one element beyond
the end of the buffer.

Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20200519154525.GA66801@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-19 20:40:21 -03:00
Dan Carpenter
b386cd65d9 RDMA/rtrs: Fix some signedness bugs in error handling
The problem is that "req->sg_cnt" is an unsigned int so if "nr" is
negative, it gets type promoted to a high positive value and the condition
is false.  This patch fixes it by handling negatives separately.

Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/20200519133223.GN2078@kadam
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-19 20:40:20 -03:00
Michael Guralnik
ecf814e0e1 net/mlx5: Add support for RDMA TX FT headers modifying
Support adding header modifying actions to the RDMA TX flow table.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-18 09:21:46 -07:00
Kamal Heib
23bbd5818e RDMA/srpt: Fix disabling device management
Avoid disabling device management for devices that don't support
Management datagrams (MADs) by checking if the "mad_agent" pointer is
initialized before calling ib_modify_port, also fix the error flow in
srpt_refresh_port() to disable device management if
ib_register_mad_agent() fail.

Fixes: 09f8a1486d ("RDMA/srpt: Fix handling of SR-IOV and iWARP ports")
Link: https://lore.kernel.org/r/20200514114720.141139-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17 20:43:19 -03:00
Shay Drory
daeee97690 RDMA/mlx5: Update mlx5_ib driver name
Current description doesn't include new devices, change it by updating to
have more generic description and remove DRIVER_NAME and DRIVER_VERSION
defines.

Link: https://lore.kernel.org/r/20200513095304.210240-1-leon@kernel.org
Signed-off-by: Shay Drory <shayd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17 20:40:20 -03:00
Xiongfeng Wang
a8f5c1f1a5 RDMA/srpt: Add a newline when printing parameter 'srpt_service_guid' by sysfs
When I cat module parameter 'srpt_service_guid', it displays as follows.
It is better to add a newline for easy reading.

[root@hulk-202 ~]# cat /sys/module/ib_srpt/parameters/srpt_service_guid
0x0205cdfffe8346b9[root@hulk-202 ~]#

Link: https://lore.kernel.org/r/1589182629-27743-1-git-send-email-wangxiongfeng2@huawei.com
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17 20:37:52 -03:00
Jason Gunthorpe
b0810b037d RDMA/core: Consolidate ib_create_srq flows
The uverbs layer largely duplicate the code in ib_create_srq(), with the
slight difference that it passes in a udata. Move all the code together
into ib_create_srq_user() and provide an inline for kernel users, similar
to other create calls.

Link: https://lore.kernel.org/r/20200506082444.14502-6-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17 20:35:25 -03:00
Yishai Hadas
dbd6725286 RDMA/uverbs: Fix create WQ to use the given user handle
Fix create WQ to use the given user handle, in addition dropped some
duplicated code from this flow.

Fixes: fd3c7904db ("IB/core: Change idr objects to use the new schema")
Fixes: f213c05272 ("IB/uverbs: Add WQ support")
Link: https://lore.kernel.org/r/20200506082444.14502-9-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17 20:29:32 -03:00
Yishai Hadas
b19a530b00 RDMA/uverbs: Cleanup wq/srq context usage from uverbs layer
Both wq_context and srq_context are some leftover from the past in uverbs
layer, they are not really in use, drop them.

Link: https://lore.kernel.org/r/20200506082444.14502-5-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17 20:27:25 -03:00
Jack Wang
745b6a3d4a RDMA/rtrs: a bit of documentation
README with description of major sysfs entries, sysfs documentation has
been moved to ABI dir as suggested by Bart.

Link: https://lore.kernel.org/r/20200511135131.27580-15-danil.kipnis@cloud.ionos.com
Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17 18:57:15 -03:00
Jack Wang
c013fbc1fd RDMA/rtrs: include client and server modules into kernel compilation
Add rtrs Makefile, Kconfig and also corresponding lines into upper layer
infiniband/ulp files.

Link: https://lore.kernel.org/r/20200511135131.27580-14-danil.kipnis@cloud.ionos.com
Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17 18:57:15 -03:00
Jack Wang
91b11610af RDMA/rtrs: server: sysfs interface functions
This is the sysfs interface to rtrs sessions on server side:

  /sys/class/rtrs-server/<SESS-NAME>/
    *** rtrs session accepted from a client peer
    |
    |- paths/<SRC@DST>/
       *** established paths from a client in a session
       |
       |- disconnect
       |  *** disconnect path
       |
       |- hca_name
       |  *** HCA name
       |
       |- hca_port
       |  *** HCA port
       |
       |- stats/
          *** current path statistics
          |
	  |- rdma

Link: https://lore.kernel.org/r/20200511135131.27580-13-danil.kipnis@cloud.ionos.com
Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17 18:57:15 -03:00
Jack Wang
c4f07c60bb RDMA/rtrs: server: statistics functions
This introduces set of functions used on server side to account statistics
of RDMA data sent/received.

Link: https://lore.kernel.org/r/20200511135131.27580-12-danil.kipnis@cloud.ionos.com
Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17 18:57:15 -03:00
Jack Wang
9cb8374804 RDMA/rtrs: server: main functionality
This is main functionality of rtrs-server module, which accepts set of
RDMA connections (so called rtrs session), creates/destroys sysfs entries
associated with rtrs session and notifies upper layer
(user of RTRS API) about RDMA requests or link events.

Link: https://lore.kernel.org/r/20200511135131.27580-11-danil.kipnis@cloud.ionos.com
Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17 18:57:14 -03:00
Jack Wang
787f78a6b0 RDMA/rtrs: server: private header with server structs and functions
This header describes main structs and functions used by rtrs-server
module, mainly for accepting rtrs sessions, creating/destroying sysfs
entries, accounting statistics on server side.

Link: https://lore.kernel.org/r/20200511135131.27580-10-danil.kipnis@cloud.ionos.com
Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17 18:57:14 -03:00
Jack Wang
215378b838 RDMA/rtrs: client: sysfs interface functions
This is the sysfs interface to rtrs sessions on client side:

  /sys/class/rtrs-client/<SESS-NAME>/
    *** rtrs session created by rtrs_clt_open() API call
    |
    |- max_reconnect_attempts
    |  *** number of reconnect attempts for session
    |
    |- add_path
    |  *** adds another connection path into rtrs session
    |
    |- paths/<SRC@DST>/
       *** established paths to server in a session
       |
       |- disconnect
       |  *** disconnect path
       |
       |- reconnect
       |  *** reconnect path
       |
       |- remove_path
       |  *** remove current path
       |
       |- state
       |  *** retrieve current path state
       |
       |- hca_port
       |  *** HCA port number
       |
       |- hca_name
       |  *** HCA name
       |
       |- stats/
          *** current path statistics
          |
	  |- cpu_migration
	  |- rdma
	  |- reconnects
	  |- reset_all

Link: https://lore.kernel.org/r/20200511135131.27580-9-danil.kipnis@cloud.ionos.com
Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17 18:57:14 -03:00
Jack Wang
89dd4c3bdc RDMA/rtrs: client: statistics functions
This introduces set of functions used on client side to account statistics
of RDMA data sent/received, amount of IOs inflight, latency, cpu
migrations, etc.  Almost all statistics are collected using percpu
variables.

Link: https://lore.kernel.org/r/20200511135131.27580-8-danil.kipnis@cloud.ionos.com
Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17 18:57:14 -03:00
Jack Wang
6a98d71dae RDMA/rtrs: client: main functionality
This is main functionality of rtrs-client module, which manages set of
RDMA connections for each rtrs session, does multipathing, load balancing
and failover of RDMA requests.

Link: https://lore.kernel.org/r/20200511135131.27580-7-danil.kipnis@cloud.ionos.com
Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17 18:57:13 -03:00
Jack Wang
cb80329c94 RDMA/rtrs: client: private header with client structs and functions
This header describes main structs and functions used by rtrs-client
module, mainly for managing rtrs sessions, creating/destroying sysfs
entries, accounting statistics on client side.

Link: https://lore.kernel.org/r/20200511135131.27580-6-danil.kipnis@cloud.ionos.com
Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17 18:57:13 -03:00
Jack Wang
c0894b3ea6 RDMA/rtrs: core: lib functions shared between client and server modules
This is a set of library functions existing as a rtrs-core module, used by
client and server modules.

Mainly these functions wrap IB and RDMA calls and provide a bit higher
abstraction for implementing of RTRS protocol on client or server sides.

Link: https://lore.kernel.org/r/20200511135131.27580-5-danil.kipnis@cloud.ionos.com
Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17 18:57:13 -03:00
Jack Wang
91fddedd43 RDMA/rtrs: private headers with rtrs protocol structs and helpers
These are common private headers with rtrs protocol structures, logging,
sysfs and other helper functions, which are used on both client and server
sides.

Link: https://lore.kernel.org/r/20200511135131.27580-4-danil.kipnis@cloud.ionos.com
Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17 18:57:13 -03:00
Jack Wang
b5c27cdb09 RDMA/rtrs: public interface header to establish RDMA connections
Introduce public header which provides set of API functions to establish
RDMA connections from client to server machine using RTRS protocol, which
manages RDMA connections for each session, does multipathing and load
balancing.

Main functions for client (active) side:

 rtrs_clt_open() - Creates set of RDMA connections incapsulated
                    in IBTRS session and returns pointer on RTRS
		    session object.
 rtrs_clt_close() - Closes RDMA connections associated with RTRS
                     session.
 rtrs_clt_request() - Requests zero-copy RDMA transfer to/from
                       server.

Main functions for server (passive) side:

 rtrs_srv_open() - Starts listening for RTRS clients on specified
                    port and invokes RTRS callbacks for incoming
		    RDMA requests or link events.
 rtrs_srv_close() - Closes RTRS server context.

Link: https://lore.kernel.org/r/20200511135131.27580-3-danil.kipnis@cloud.ionos.com
Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17 18:57:13 -03:00
David S. Miller
da07f52d3c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Move the bpf verifier trace check into the new switch statement in
HEAD.

Resolve the overlapping changes in hinic, where bug fixes overlap
the addition of VF support.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15 13:48:59 -07:00
Leon Romanovsky
59dde4d19c RDMA/mlx5: Fix query_srq_cmd() function
The output buffer used in mlx5_cmd_exec_inout() was wrongly changed from
pre-allocated srq_out pointer to an input "out" point. That leads to
unpredictable results in the get_srqc() call later.

Fixes: 31578defe4 ("RDMA/mlx5: Update mlx5_ib to use new cmd interface")
Link: https://lore.kernel.org/r/20200513100809.246315-1-leon@kernel.org
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-13 16:01:50 -03:00
Daria Velikovsky
f29de9eee7 RDMA/mlx5: Add support for drop action in DV steering
When drop action is used the matching packet will stop processing in
steering and will be dropped. This functionality will allow users to drop
matching packets.

Link: https://lore.kernel.org/r/20200504054227.271486-1-leon@kernel.org
Signed-off-by: Daria Velikovsky <daria@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-13 15:58:54 -03:00
Maor Gottlieb
8c112a5f29 RDMA/mlx5: Add support in steering default miss
User can configure default miss rule in order to skip matching in the user
domain and forward the packet to the kernel steering domain.  When user
requests a default miss rule, we add steering rule to forward the traffic
to the next namespace.

Link: https://lore.kernel.org/r/20200504053012.270689-5-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-13 15:55:41 -03:00
Maor Gottlieb
b9019507aa RDMA/mlx5: Refactor DV create flow
Move part of the code that get the destinations into function so the code
will be more readable.  In addition change the variables definition to be
in reversed christmas tree.

Link: https://lore.kernel.org/r/20200504053012.270689-4-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-13 15:55:40 -03:00
Jason Gunthorpe
10c2615513 Merge branch 'mellanox/mlx5-next' into rdma.git for/next
From the mlx5-next branch at
  git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Required for dependencies in following patches

* branch 'mellanox/mlx5-next':
  net/mlx5: Add support in forward to namespace
  {IB/net}/mlx5: Simplify don't trap code
  net/mlx5: Replace zero-length array with flexible-array

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-13 15:54:19 -03:00
Maor Gottlieb
14c129e301 {IB/net}/mlx5: Simplify don't trap code
The fs_core already supports creation of rules with multiple
actions/destinations. Refactor fs_core to handle the case
when don't trap rule is created with destination. Adapt the
calling code in the driver.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-05-13 18:56:18 +03:00
Jason Gunthorpe
a0e46db4e7 RDMA/cm: Increment the refcount inside cm_find_listen()
All callers need the 'get', so do it in a central place before returning
the pointer.

Link: https://lore.kernel.org/r/20200506074701.9775-11-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-12 21:32:54 -03:00
Jason Gunthorpe
51e8463cfc RDMA/cm: Remove needless cm_id variable
Just put the expression in the only reader

Link: https://lore.kernel.org/r/20200506074701.9775-10-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-12 21:32:54 -03:00
Jason Gunthorpe
1cc44279f2 RDMA/cm: Remove the cm_free_id() wrapper function
Just call xa_erase directly during cm_destroy_id()

Link: https://lore.kernel.org/r/20200506074701.9775-9-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-12 21:32:54 -03:00
Jason Gunthorpe
cfa68b0d04 RDMA/cm: Make find_remote_id() return a cm_id_private
The only caller doesn't care about the timewait, so acquire and return the
cm_id_private from the function.

Link: https://lore.kernel.org/r/20200506074701.9775-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-12 21:32:54 -03:00
Jason Gunthorpe
09fb406a56 RDMA/cm: Add a note explaining how the timewait is eventually freed
The way the cm_timewait_info is converted into a work and then freed
is very subtle and surprising, add a note clarifying the lifetime
here.

Link: https://lore.kernel.org/r/20200506074701.9775-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-12 21:32:53 -03:00
Jason Gunthorpe
9767a27e1a RDMA/cm: Pass the cm_id_private into cm_cleanup_timewait
Also rename it to cm_remove_remote(). This function now removes the
tracking of the remote ID/QPN in the redblack trees from a cm_id_private.

Replace a open-coded version with a call. The open coded version was
deleting only the remote_id, however at this call site the qpn can not
have been in the RB tree either, so the cm_remove_remote() will do the
same.

Link: https://lore.kernel.org/r/20200506074701.9775-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-12 21:32:53 -03:00
Jason Gunthorpe
e83f195aa4 RDMA/cm: Pull duplicated code into cm_queue_work_unlock()
While unlocking a spinlock held by the caller is a disturbing pattern,
this extensively duplicated code is even worse. Pull all the duplicates
into a function and explain the purpose of the algorithm.

The on creation side call in cm_req_handler() which is different has been
micro-optimized on the basis that the work_count == -1 during creation,
remove that and just use the normal function.

Link: https://lore.kernel.org/r/20200506074701.9775-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-12 21:32:53 -03:00
Danit Goldberg
42113eed8f RDMA/cm: Remove unused store to ret in cm_rej_handler
The 'goto out' label doesn't read ret, so don't set it.

Link: https://lore.kernel.org/r/20200506074701.9775-4-leon@kernel.org
Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-12 21:32:53 -03:00
Jason Gunthorpe
d3552fb65d RDMA/cm: Remove return code from add_cm_id_to_port_list
This cannot happen, all callers pass in one of the two pointers. Use
a WARN_ON guard instead.

Link: https://lore.kernel.org/r/20200506074701.9775-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-12 21:32:53 -03:00
Jason Gunthorpe
f8f2a576cb RDMA/addr: Mark addr_resolve as might_sleep()
Under one path through ib_nl_fetch_ha() this calls nlmsg_new(GFP_KERNEL)
which is a sleeping call. This is a very rare path, so mark fetch_ha() and
the module external entry point that conditionally calls through to
fetch_ha() as might_sleep().

Link: https://lore.kernel.org/r/20200506074701.9775-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-12 21:32:52 -03:00
Lang Cheng
90ae0b57e4 RDMA/hns: Combine enable flags of qp
It's easier to understand and maintain enable flags of qp using a single
field in type of unsigned long than defining a field for every flags in
the structure hns_roce_qp, and we can add new flags for features more
conveniently in the future.

Link: https://lore.kernel.org/r/1588674607-25337-4-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-12 20:37:06 -03:00
Weihang Li
30661322b8 RDMA/hns: Extend capability flags for HIP08_C
12 bits is not enough for HIP08_C, so extend a new field in length of 16
bits for it.

Link: https://lore.kernel.org/r/1588674607-25337-3-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-12 20:37:06 -03:00
Leon Romanovsky
17793833f8 RDMA/ucma: Return stable IB device index as identifier
The librdmacm uses node_guid as identifier to correlate between IB devices
and CMA devices. However FW resets cause to such "connection" to be lost
and require from the user to restart its application.

Extend UCMA to return IB device index, which is stable identifier.

Link: https://lore.kernel.org/r/20200504132541.355710-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-12 19:52:27 -03:00
Jason Gunthorpe
ccfdbaa5cf RDMA/uverbs: Move IB_EVENT_DEVICE_FATAL to destroy_uobj
When multiple async FDs were allowed to exist the idea was for all
broadcast events to be delivered to all async FDs, however
IB_EVENT_DEVICE_FATAL was missed.

Instead of having ib_uverbs_free_hw_resources() special case the global
async_fd, have it cause the event during the uobject destruction. Every
async fd is now a uobject so simply generate the IB_EVENT_DEVICE_FATAL
while destroying the async fd uobject. This ensures every async FD gets a
copy of the event.

Fixes: d680e88e20 ("RDMA/core: Add UVERBS_METHOD_ASYNC_EVENT_ALLOC")
Link: https://lore.kernel.org/r/20200507063348.98713-3-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-12 17:02:25 -03:00
Jason Gunthorpe
c485b19d52 RDMA/uverbs: Do not discard the IB_EVENT_DEVICE_FATAL event
The commit below moved all of the destruction to the disassociate step and
cleaned up the event channel during destroy_uobj.

However, when ib_uverbs_free_hw_resources() pushes IB_EVENT_DEVICE_FATAL
and then immediately goes to destroy all uobjects this causes
ib_uverbs_free_event_queue() to discard the queued event if userspace
hasn't already read() it.

Unlike all other event queues async FD needs to defer the
ib_uverbs_free_event_queue() until FD release. This still unregisters the
handler from the IB device during disassociation.

Fixes: 3e032c0e92 ("RDMA/core: Make ib_uverbs_async_event_file into a uobject")
Link: https://lore.kernel.org/r/20200507063348.98713-2-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-12 17:02:25 -03:00
Potnuri Bharat Teja
c8b1f340e5 RDMA/iw_cxgb4: Fix incorrect function parameters
While reading the TCB field in t4_tcb_get_field32() the wrong mask is
passed as a parameter which leads the driver eventually to a kernel
panic/app segfault from access to an illegal SRQ index while flushing the
SRQ completions during connection teardown.

Fixes: 11a27e2121 ("iw_cxgb4: complete the cached SRQ buffers")
Link: https://lore.kernel.org/r/20200511185608.5202-1-bharat@chelsio.com
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-12 11:47:48 -03:00
Maor Gottlieb
50bbe3d34f RDMA/core: Fix double put of resource
Do not decrease the reference count of resource tracker object twice in
the error flow of res_get_common_doit.

Fixes: c5dfe0ea6f ("RDMA/nldev: Add resource tracker doit callback")
Link: https://lore.kernel.org/r/20200507062942.98305-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-12 11:47:48 -03:00
Jack Morgenstein
1901b91f99 IB/core: Fix potential NULL pointer dereference in pkey cache
The IB core pkey cache is populated by procedure ib_cache_update().
Initially, the pkey cache pointer is NULL. ib_cache_update allocates a
buffer and populates it with the device's pkeys, via repeated calls to
procedure ib_query_pkey().

If there is a failure in populating the pkey buffer via ib_query_pkey(),
ib_cache_update does not replace the old pkey buffer cache with the
updated one -- it leaves the old cache as is.

Since initially the pkey buffer cache is NULL, when calling
ib_cache_update the first time, a failure in ib_query_pkey() will cause
the pkey buffer cache pointer to remain NULL.

In this situation, any calls subsequent to ib_get_cached_pkey(),
ib_find_cached_pkey(), or ib_find_cached_pkey_exact() will try to
dereference the NULL pkey cache pointer, causing a kernel panic.

Fix this by checking the ib_cache_update() return value.

Fixes: 8faea9fd4a ("RDMA/cache: Move the cache per-port data into the main ib_port_data")
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/r/20200507071012.100594-1-leon@kernel.org
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-12 11:47:48 -03:00
Mike Marciniszyn
fa8dac3968 IB/hfi1: Fix another case where pq is left on waitlist
The commit noted below fixed a case where a pq is left on the sdma wait
list.

It however missed another case.

user_sdma_send_pkts() has two calls from hfi1_user_sdma_process_request().

If the first one fails as indicated by -EBUSY, the pq will be placed on
the waitlist as by design.

If the second call then succeeds, the pq is still on the waitlist setting
up a race with the interrupt handler if a subsequent request uses a
different SDMA engine

Fix by deleting the first call.

The use of pcount and the intent to send a short burst of packets followed
by the larger balance of packets was never correctly implemented, because
the two calls always send pcount packets no matter what.  A subsequent
patch will correct that issue.

Fixes: 9a293d1e21 ("IB/hfi1: Ensure pq is not left on waitlist")
Link: https://lore.kernel.org/r/20200504130917.175613.43231.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-12 11:47:48 -03:00
Denis V. Lunev
856ec7f646 IB/i40iw: Remove bogus call to netdev_master_upper_dev_get()
Local variable netdev is not used in these calls.

It should be noted, that this change is required to work in bonded mode.
Otherwise we would get the following assert:

 "RTNL: assertion failed at net/core/dev.c (5665)"

With the calltrace as follows:
	dump_stack+0x19/0x1b
	netdev_master_upper_dev_get+0x61/0x70
	i40iw_addr_resolve_neigh+0x1e8/0x220
	i40iw_make_cm_node+0x296/0x700
	? i40iw_find_listener.isra.10+0xcc/0x110
	i40iw_receive_ilq+0x3d4/0x810
	i40iw_puda_poll_completion+0x341/0x420
	i40iw_process_ceq+0xa5/0x280
	i40iw_ceq_dpc+0x1e/0x40
	tasklet_action+0x83/0x140
	__do_softirq+0x125/0x2bb
	call_softirq+0x1c/0x30
	do_softirq+0x65/0xa0
	irq_exit+0x105/0x110
	do_IRQ+0x56/0xf0
	common_interrupt+0x16a/0x16a
	? cpuidle_enter_state+0x57/0xd0
	cpuidle_idle_call+0xde/0x230
	arch_cpu_idle+0xe/0xc0
	cpu_startup_entry+0x14a/0x1e0
	start_secondary+0x1f7/0x270
	start_cpu+0x5/0x14

Link: https://lore.kernel.org/r/20200428131511.11049-1-den@openvz.org
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-12 11:47:48 -03:00
Jack Morgenstein
6693ca95bd IB/mlx4: Test return value of calls to ib_get_cached_pkey
In the mlx4_ib_post_send() flow, some functions call ib_get_cached_pkey()
without checking its return value. If ib_get_cached_pkey() returns an
error code, these functions should return failure.

Fixes: 1ffeb2eb8b ("IB/mlx4: SR-IOV IB context objects and proxy/tunnel SQP support")
Fixes: 225c7b1fee ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters")
Fixes: e622f2f4ad ("IB: split struct ib_send_wr")
Link: https://lore.kernel.org/r/20200426075921.130074-1-leon@kernel.org
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-12 11:47:48 -03:00
Sudip Mukherjee
bb43c8e382 RDMA/rxe: Always return ERR_PTR from rxe_create_mmap_info()
The commit below modified rxe_create_mmap_info() to return ERR_PTR's but
didn't update the callers to handle them. Modify rxe_create_mmap_info() to
only return ERR_PTR and fix all error checking after
rxe_create_mmap_info() is called.

Ensure that all other exit paths properly set the error return.

Fixes: ff23dfa134 ("IB: Pass only ib_udata in function prototypes")
Link: https://lore.kernel.org/r/20200425233545.17210-1-sudipm.mukherjee@gmail.com
Link: https://lore.kernel.org/r/20200511183742.GB225608@mwanda
Cc: stable@vger.kernel.org [5.4+]
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-12 11:47:00 -03:00
Colin Ian King
52c81f47f0 RDMA/mlx5: Remove duplicated assignment to variable rcqe_sz
The variable rcqe_sz is being unnecessarily assigned twice, fix this by
removing one of the duplicates.

Fixes: 8bde2c509e ("RDMA/mlx5: Update all DRIVER QP places to use QP subtype")
Link: https://lore.kernel.org/r/20200507151610.52636-1-colin.king@canonical.com
Addresses-Coverity: ("Evaluation order violation")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-07 21:02:58 -03:00
David S. Miller
3793faad7b Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts were all overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 22:10:13 -07:00
Mark Bloch
42caf9cb59 RDMA/mlx5: Allow only raw Ethernet QPs when RoCE isn't enabled
When operating in switchdev mode or using devlink to disable RoCE
only raw Ethernet QPs are allowed to be created.

When in switchdev mode this can lead to passing an invalid port number
as part of the modify qp firmware cmd and will lead to a syndrome
reported back to the user, such as:

 * mlx5_cmd_check:803:(pid 50148): RST2INIT_QP(0x502) op_mod(0x0) failed,
   status bad parameter(0x3), syndrome (0x177405).

Internal UD QP might be used to test for write combining support (even if
externally we report RoCE as disabled) check for that specific flag and
allow is specifically.

Fixes: b5ca15ad7e ("IB/mlx5: Add proper representors support")
Link: https://lore.kernel.org/r/20200506071602.7177-3-leon@kernel.org
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 17:52:01 -03:00
Mark Bloch
8d93efb8c5 RDMA/mlx5: Assign profile before calling stages
Assign the profile to the IB device before executing stages. This will
allow to check which profile is being used from within a stage.

Link: https://lore.kernel.org/r/20200506071602.7177-2-leon@kernel.org
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 17:52:01 -03:00
Leon Romanovsky
029e88fd1e RDMA/mlx5: Move all WR logic from qp.c to separate file
Split qp.c by removing all WR logic to separate file.

Link: https://lore.kernel.org/r/20200506065513.4668-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 17:42:45 -03:00
Max Gurtovoy
6671cde83d RDMA/mlx5: Refactor mlx5_post_send() to improve readability
Add small helpers in order to avoid code duplication and improve code
readability. Decrease the amount of code in the gigantic post_send
function and divide it to readable methods that will help in code
maintenance in the future.

Link: https://lore.kernel.org/r/20200506065513.4668-3-leon@kernel.org
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 17:42:45 -03:00
Leon Romanovsky
31578defe4 RDMA/mlx5: Update mlx5_ib to use new cmd interface
Reuse newly introduced mlx5_cmd_exec_in() and mlx5_cmd_exec_inout() to
reduce code duplication in mlx5_ib module.

Link: https://lore.kernel.org/r/20200506065513.4668-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 17:42:45 -03:00
Wenpeng Liang
e4faa478c6 RDMA/hns: Remove redundant assignment of caps
These caps are assigned in query_pf_caps() or set_default_caps(), and
should not be assigned out of these two functions.

Link: https://lore.kernel.org/r/1588242691-12913-4-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 17:33:20 -03:00
Weihang Li
b713128de7 RDMA/hns: Adjust lp_pktn_ini dynamically
lp_pktn_ini means the number of loopback slice packets for long messages,
it should depend on MTU(fixed to 4096B currently) and max size of SQ
inline.

Link: https://lore.kernel.org/r/1588242691-12913-3-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 17:33:20 -03:00
Weihang Li
23190b8f47 RDMA/hns: Fix comments with non-English symbols
There is a comments with some chinese semicolons that cause encoding
issues each time hns_roc_hw_v2.h was modified from a IDE. So fix this by
using correct symbols.

Link: https://lore.kernel.org/r/1588242691-12913-2-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 17:33:19 -03:00
Xi Wang
67954a6e37 RDMA/hns: Optimize SRQ buffer size calculating process
Optimize the SRQ's WQE buffer parameters calculating process to make the
codes more readable by using new functions about multi-hop addressing to
calculating capabilities of SRQ.

Link: https://lore.kernel.org/r/1588071823-40200-6-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 17:26:44 -03:00
Yixian Liu
ffb1308b88 RDMA/hns: Move SRQ code to the reasonable place
Just move the SRQ related code to more reasonable place, and unify format
of some prints.

Link: https://lore.kernel.org/r/1588071823-40200-5-git-send-email-liweihang@huawei.com
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 17:26:43 -03:00
Xi Wang
54d6638765 RDMA/hns: Optimize WQE buffer size calculating process
Optimize the QP's WQE buffer parameters calculating process to make the
codes more readable mainly by merging calculation of extended sge space of
kernel and userspace. In addition, add some inline functions to simply
codes about multi-hop addressing.

Link: https://lore.kernel.org/r/1588071823-40200-4-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 17:26:43 -03:00
Xi Wang
2929c40f08 RDMA/hns: Remove unused MTT functions
The MTT (Memory Translate Table) interface is no longer used to configure
the buffer address to BT (Base Address Table) that requires driver
mapping.  Because the MTT is not compatible with multi-hop addressing of
the hip08, it is replaced by MTR (Memory Translate Region) interface, and
all the MTT functions should be removed.

Link: https://lore.kernel.org/r/1588071823-40200-3-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 17:26:43 -03:00
Xi Wang
9b2cf76c9f RDMA/hns: Optimize PBL buffer allocation process
PBL table has its own implementation for multi-hop addressing currently,
but for the hardware, all table's addressing use the same logic, there is
no need to implement repeatedly. So optimize the PBL buffer allocation
process by using the mtr's interfaces.

Link: https://lore.kernel.org/r/1588071823-40200-2-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 17:26:43 -03:00
Mark Zhang
5ac55dfc6d RDMA/mlx5: Set UDP source port based on the grh.flow_label
Calculate UDP source port based on the grh.flow_label. If grh.flow_label
is not valid, we will use minimal supported UDP source port.

Link: https://lore.kernel.org/r/20200504051935.269708-6-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 16:51:44 -03:00
Mark Zhang
f665340519 RDMA/cma: Initialize the flow label of CM's route path record
If flow label is not set by the user or it's not IPv4, initialize it with
the cma src/dst based on the "Kernighan and Ritchie's hash function".

Link: https://lore.kernel.org/r/20200504051935.269708-5-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 16:51:44 -03:00
Mark Zhang
2b880b2e5e RDMA/mlx5: Define RoCEv2 udp source port when set path
Calculate and set UDP source port based on the flow label. If flow label
is not defined in GRH then calculate it based on lqpn/rqpn.

Link: https://lore.kernel.org/r/20200504051935.269708-4-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 16:51:44 -03:00
Maor Gottlieb
9611d53aa1 RDMA/core: Consider flow label when building skb
Use rdma_flow_label_to_udp_sport to calculate the UDP source port of the
RoCEV2 packet.

Link: https://lore.kernel.org/r/20200504051935.269708-3-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 16:51:43 -03:00
Jason Gunthorpe
11a0ae4c4b RDMA: Allow ib_client's to fail when add() is called
When a client is added it isn't allowed to fail, but all the client's have
various failure paths within their add routines.

This creates the very fringe condition where the client was added, failed
during add and didn't set the client_data. The core code will then still
call other client_data centric ops like remove(), rename(), get_nl_info(),
and get_net_dev_by_params() with NULL client_data - which is confusing and
unexpected.

If the add() callback fails, then do not call any more client ops for the
device, even remove.

Remove all the now redundant checks for NULL client_data in ops callbacks.

Update all the add() callbacks to return error codes
appropriately. EOPNOTSUPP is used for cases where the ULP does not support
the ib_device - eg because it only works with IB.

Link: https://lore.kernel.org/r/20200421172440.387069-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 11:57:33 -03:00
Maor Gottlieb
04c349a965 RDMA/mad: Remove snoop interface
Snoop interface is not used. Remove it.

Link: https://lore.kernel.org/r/20200413132408.931084-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 11:50:22 -03:00
Dan Carpenter
37e31d2d26 i40iw: Fix error handling in i40iw_manage_arp_cache()
The i40iw_arp_table() function can return -EOVERFLOW if
i40iw_alloc_resource() fails so we can't just test for "== -1".

Fixes: 4e9042e647 ("i40iw: add hw and utils files")
Link: https://lore.kernel.org/r/20200422092211.GA195357@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-04 14:12:35 -03:00
Gal Pressman
f86e34374a RDMA/efa: Count admin commands errors
Add a new stat that counts admin commands failures, which might help when
debugging different issues.

Link: https://lore.kernel.org/r/20200420062213.44577-4-galpress@amazon.com
Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-02 20:32:14 -03:00
Gal Pressman
eca5757f80 RDMA/efa: Count mmap failures
Add a new stat that counts mmap failures, which might help when debugging
different issues.

Link: https://lore.kernel.org/r/20200420062213.44577-3-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-02 20:32:14 -03:00
Gal Pressman
b2ea69b3b4 RDMA/efa: Report create CQ error counter
Create CQ errors are already being counted, report them along all other
counters.

Link: https://lore.kernel.org/r/20200420062213.44577-2-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-02 20:32:14 -03:00
Maor Gottlieb
cfc1a89e44 RDMA/mlx5: Set lag tx affinity according to slave
The patch sets the lag tx affinity of the data QPs and the GSI QPs
according to the LAG xmit slave.

For GSI QPs, in case the link layer is Ethenet (RoCE) we create two GSI
QPs, one for each physical port. When the driver selects the GSI QP, it
will consider the port affinity result.  For connected QPs, the driver
sets the affinity of the xmit slave.

The above, ensures that RC QP and it's corresponding GSI QP will transmit
from the same physical port.

Link: https://lore.kernel.org/r/20200430192146.12863-17-maorg@mellanox.com
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-02 20:19:54 -03:00
Maor Gottlieb
5163b2743a RDMA/mlx5: Refactor affinity related code
Move affinity related code in modify qp to function.  It's a preparation
for next patch the extend the affinity calculation to consider the xmit
slave.

Link: https://lore.kernel.org/r/20200430192146.12863-16-maorg@mellanox.com
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-02 20:19:54 -03:00
Maor Gottlieb
51aab12631 RDMA/core: Get xmit slave for LAG
Add a call to rdma_lag_get_ah_roce_slave() when the address handle is
created. Lower driver can use it to select the QP's affinity port.

Link: https://lore.kernel.org/r/20200430192146.12863-15-maorg@mellanox.com
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-02 20:19:54 -03:00
Maor Gottlieb
bd3920eac1 RDMA/core: Add LAG functionality
Add support to get the RoCE LAG xmit slave by building skb of the RoCE
packet and call to master_get_xmit_slave.  If driver wants to get the
slave assume all slaves are available, then need to set
RDMA_LAG_FLAGS_HASH_ALL_SLAVES in flags.

Link: https://lore.kernel.org/r/20200430192146.12863-14-maorg@mellanox.com
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-02 20:19:54 -03:00
Maor Gottlieb
fa5d010c56 RDMA: Group create AH arguments in struct
Following patch adds additional argument to the create AH function, so it
make sense to group ah_attr and flags arguments in struct.

Link: https://lore.kernel.org/r/20200430192146.12863-13-maorg@mellanox.com
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Acked-by: Devesh Sharma <devesh.sharma@broadcom.com>
Acked-by: Gal Pressman <galpress@amazon.com>
Acked-by: Weihang Li <liweihang@huawei.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-02 20:19:53 -03:00
Aharon Landau
0eacc574aa RDMA/mlx5: Verify that QP is created with RQ or SQ
RAW packet QP and underlay QP must be created with either
RQ or SQ, check that.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/20200427154636.381474-37-leon@kernel.org
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:46 -03:00
Leon Romanovsky
968f0b6f9c RDMA/mlx5: Consolidate into special function all create QP calls
Finish separation to blocks of mlx5_ib_create_qp() functions,
so all internal create QP implementation are located in one place.

Link: https://lore.kernel.org/r/20200427154636.381474-36-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:46 -03:00
Leon Romanovsky
6367da46d3 RDMA/mlx5: Remove redundant destroy QP call
After major refactoring in create QP flow, it is no needed to call
to destroy QP in XRC_TGT flow.

Link: https://lore.kernel.org/r/20200427154636.381474-35-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:46 -03:00
Leon Romanovsky
08d5397660 RDMA/mlx5: Copy response to the user in one place
Update all the places in create QP flows to copy response
to the user in one place.

Link: https://lore.kernel.org/r/20200427154636.381474-34-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:46 -03:00
Leon Romanovsky
6f2cf76e6e RDMA/mlx5: Handle udate outlen checks in one place
Place in one function all udata size checks. This will allow
us move ib_copy_to_udata() in general place and ensure that
it will be performed after call to the FW.

Link: https://lore.kernel.org/r/20200427154636.381474-33-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:45 -03:00
Leon Romanovsky
5d6fffed1c RDMA/mlx5: Promote RSS RAW QP flags check to higher level
Move check that user didn't supplied RSS RAW QP unsupported
command flags to the function that checks all such flags.

Link: https://lore.kernel.org/r/20200427154636.381474-32-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:45 -03:00
Leon Romanovsky
f78d358cec RDMA/mlx5: Group all create QP parameters to simplify in-kernel interfaces
The amount of parameters passed in and out between internal mlx5
create QP functions is too large to easily follow the flow. Change
it by grouping all create QP parameter into one structure.

Link: https://lore.kernel.org/r/20200427154636.381474-31-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:45 -03:00
Leon Romanovsky
747c519cdb RDMA/mlx5: Reduce amount of duplication in QP destroy
Delete both PD argument and checks if udata was provided, in favour
of unified destroy QP functions.

Link: https://lore.kernel.org/r/20200427154636.381474-30-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:45 -03:00
Leon Romanovsky
98fc1126c4 RDMA/mlx5: Separate to user/kernel create QP flows
The kernel and user create QP flows have very little common code,
separate them to simplify the future work of creating per-type
create_*_qp() functions.

Link: https://lore.kernel.org/r/20200427154636.381474-29-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:44 -03:00
Leon Romanovsky
04bcc1c2d0 RDMA/mlx5: Separate XRC_TGT QP creation from common flow
XRC_TGT QP doesn't fail into kernel or user flow separation. It is
initiated by the user, but is created through in-kernel verbs flow
and doesn't have PD and udata in similar way to kernel QPs.

So let's separate creation of that QP type from the common flow.

Link: https://lore.kernel.org/r/20200427154636.381474-28-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:44 -03:00
Leon Romanovsky
21aad80b17 RDMA/mlx5: Globally parse DEVX UID
Remove duplication in parsing of DEVX UID.

Link: https://lore.kernel.org/r/20200427154636.381474-27-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:44 -03:00
Leon Romanovsky
0ce300b15a RDMA/mlx5: Delete impossible inlen check
The inlen is set to be above zero in all flows before
and can't be negative at this stage.

Link: https://lore.kernel.org/r/20200427154636.381474-26-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:44 -03:00
Leon Romanovsky
03c4077b28 RDMA/mlx5: Rely on existence of udata to separate kernel/user flows
Instead of keeping special field to separate kernel/user create/destroy
flows, rely on existence of udata pointer. All allocation flows are
using kzalloc() and leave uninitialized pointers as NULL which makes
MLX5_QP_EMPTY and MLX5_QP_KERNEL flows to be the same.

Link: https://lore.kernel.org/r/20200427154636.381474-25-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:43 -03:00
Leon Romanovsky
76883a6cc1 RDMA/mlx5: Remove second user copy in create_user_qp
Combine copy_from_user() from create_user_qp() and general code.

Link: https://lore.kernel.org/r/20200427154636.381474-24-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:43 -03:00
Leon Romanovsky
5ce0592b0e RDMA/mlx5: Combine copy of create QP command in RSS RAW QP
Change the create QP flow to handle all copy_from_user() operations in
one place.

Link: https://lore.kernel.org/r/20200427154636.381474-23-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:43 -03:00
Leon Romanovsky
266424eba6 RDMA/mlx5: Promote RSS RAW QP attribute check in higher level
Perform check of attributes of RAW PACKET QP in separate function.

Link: https://lore.kernel.org/r/20200427154636.381474-22-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:43 -03:00
Leon Romanovsky
7aede1a25f RDMA/mlx5: Store QP type in the vendor QP structure
QP type is stored in the IB/core QP struct, but it doesn't have all the
needed information, like internal QP type used in the driver itself.
Update mlx5_ib to have cached QP type which includes both IBTA and
Mellanox specific one.

Such change allows us to make even further cleanup of QP creation flow.

Link: https://lore.kernel.org/r/20200427154636.381474-21-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:42 -03:00
Leon Romanovsky
3ae7e66a01 RDMA/mlx5: Delete unsupported QP types
There is no need to explicitly check unsupported QP types,
rely on  "default" keyword in switch-case to catch them.

Link: https://lore.kernel.org/r/20200427154636.381474-20-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:42 -03:00
Jason Gunthorpe
dfb25edd97 Merge branch 'mlx5_ib_qp_refactor_1' into rdma.git for-next
Leon Romanovsky says:

====================
This is first part of series which tries to return some sanity to
mlx5_ib_create_qp() function. Such refactoring is required to make
extension of that function with less worries of breaking driver.

Extra goal of such refactoring is to ensure that QP is allocated at the
beginning of function and released at the end. It will allow us to move QP
allocation to be under IB/core responsibility.
====================

Based on the mlx5-next branch at
 git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Due to dependencies

* branch 'mlx5_ib_qp_refactor_1': (66 commits)
  RDMA/mlx5: Process all vendor flags in one place
  RDMA/mlx5: Return all configured create flags through query QP
  RDMA/mlx5: Change scatter CQE flag to be set like other vendor flags
  RDMA/mlx5: Use flags_en mechanism to mark QP created with WQE signature
  RDMA/mlx5: Process create QP flags in one place
  RDMA/mlx5: Delete create QP flags obfuscation
  RDMA/mlx5: Initial separation of RAW_PACKET QP from common flow
  RDMA/mlx5: Remove second copy from user for non RSS RAW QPs
  RDMA/mlx5: Move DRIVER QP flags check into separate function
  RDMA/mlx5: Update all DRIVER QP places to use QP subtype
  RDMA/mlx5: Split scatter CQE configuration for DCT QP
  RDMA/mlx5: Separate create QP flows to be based on type
  RDMA/mlx5: Set QP subtype immediately when it is known
  RDMA/mlx5: Avoid setting redundant NULL for XRC QPs
  RDMA/mlx5: Prepare QP allocation for future removal
  RDMA/mlx5: Perform check if QP creation flow is valid
  RDMA/mlx5: Delete impossible GSI port check
  RDMA/mlx5: Organize QP types checks in one place

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 21:44:51 -03:00
Leon Romanovsky
37518fa49f RDMA/mlx5: Process all vendor flags in one place
Check that vendor flags provided through ucmd are valid.

Link: https://lore.kernel.org/r/20200427154636.381474-19-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:24 -03:00
Leon Romanovsky
a8f3ea61e1 RDMA/mlx5: Return all configured create flags through query QP
The "flags" field in struct mlx5_ib_qp contains all UAPI flags
configured at the create QP stage. Return all the data as is
without masking.

Link: https://lore.kernel.org/r/20200427154636.381474-18-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:24 -03:00
Leon Romanovsky
90ecb37a75 RDMA/mlx5: Change scatter CQE flag to be set like other vendor flags
In similar way to wqe_sig, the scat_cqe was treated differently from
other create QP vendor flags. Change it to be similar to other flags
and use flags_en mechanism.

Link: https://lore.kernel.org/r/20200427154636.381474-17-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:23 -03:00
Leon Romanovsky
c95e6d5397 RDMA/mlx5: Use flags_en mechanism to mark QP created with WQE signature
MLX5_QP_FLAG_SIGNATURE is exposed to the users but in the kernel
the create_qp flow treated it differently from other MLX5_QP_FLAG_*s.
Fix it by ditching wq_sig boolean variable and use general flag_en
mechanism.

Link: https://lore.kernel.org/r/20200427154636.381474-16-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:23 -03:00
Leon Romanovsky
2978975ce7 RDMA/mlx5: Process create QP flags in one place
create_flags is checked in too many places and scattered across all
the code, consolidate all the checks inside one function, so we will
be easily see the flow. As part of such change, delete unreachable code,
because IB/core is responsible sanitize the input.

Link: https://lore.kernel.org/r/20200427154636.381474-15-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:23 -03:00
Leon Romanovsky
2be08c308f RDMA/mlx5: Delete create QP flags obfuscation
There is no point in redefinition of stable and exposed to users create
flags. Their values won't be changed and it is equal to used by the
mlx5. Delete the mlx5 definitions and use IB/core fields.

Link: https://lore.kernel.org/r/20200427154636.381474-14-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:23 -03:00
Leon Romanovsky
5d0dc3d96c RDMA/mlx5: Initial separation of RAW_PACKET QP from common flow
Create initial function for IB_QPT_RAW_PACKET flow.

Link: https://lore.kernel.org/r/20200427154636.381474-13-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:22 -03:00
Leon Romanovsky
2dfac92dbb RDMA/mlx5: Remove second copy from user for non RSS RAW QPs
Change the common code to use already copied user command buffer.

Link: https://lore.kernel.org/r/20200427154636.381474-12-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:22 -03:00
Leon Romanovsky
2fdddbd5c9 RDMA/mlx5: Move DRIVER QP flags check into separate function
Perform validation of DRIVER QP in relevant function.

Link: https://lore.kernel.org/r/20200427154636.381474-11-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:22 -03:00
Leon Romanovsky
8bde2c509e RDMA/mlx5: Update all DRIVER QP places to use QP subtype
Instead of overwriting QP init attributes with driver QP subtype,
use that subtype directly. This change will allow us to remove
logic which cached QP init attributes.

Link: https://lore.kernel.org/r/20200427154636.381474-10-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:22 -03:00
Leon Romanovsky
fd9dab7edc RDMA/mlx5: Split scatter CQE configuration for DCT QP
DCT QPs have separate creation flow and can be easily extracted
from configure_responder_scat_cqe(), this makes both updated
functions more clear.

Link: https://lore.kernel.org/r/20200427154636.381474-9-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:22 -03:00
Leon Romanovsky
47c806121a RDMA/mlx5: Separate create QP flows to be based on type
Move driver QP creation flow to separate functions to simplify
the create_qp() and allow future separation of create_qp_common()
to subtypes.

Link: https://lore.kernel.org/r/20200427154636.381474-8-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:21 -03:00
Leon Romanovsky
318d2b06fb RDMA/mlx5: Set QP subtype immediately when it is known
There is no need to delay QP subtype assignment to the end of the
create_qp() function and it is better to move it to be immediately
after it is checked so we would be able to rewrite later checks
to be based on it and not on over-written struct ib_qp_init_attr.

Link: https://lore.kernel.org/r/20200427154636.381474-7-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:21 -03:00
Leon Romanovsky
c86936e6eb RDMA/mlx5: Avoid setting redundant NULL for XRC QPs
There is no need to set NULL in recv_cq and send_cq, they are already
set to NULL by the IB/core logic.

Link: https://lore.kernel.org/r/20200427154636.381474-6-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:21 -03:00
Leon Romanovsky
9c2ba4ede4 RDMA/mlx5: Prepare QP allocation for future removal
Unify the QP memory allocation across different paths,
so it will be in one place.

Link: https://lore.kernel.org/r/20200427154636.381474-5-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:21 -03:00
Leon Romanovsky
2242cc25ce RDMA/mlx5: Perform check if QP creation flow is valid
Fast check that kernel and user flows provides enough
data to create QP.

Link: https://lore.kernel.org/r/20200427154636.381474-4-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:20 -03:00
Leon Romanovsky
1265d9f7a5 RDMA/mlx5: Delete impossible GSI port check
GSI QP is created in the kernel with very strict parameters,
there is no possible way that port number will be wrong in
such flow.

Link: https://lore.kernel.org/r/20200427154636.381474-3-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:20 -03:00
Leon Romanovsky
6eb7edffb2 RDMA/mlx5: Organize QP types checks in one place
Perform check if QP type is supported in one place at the beginning of
the create_qp function instead of current implementation with checks
buried inside of the code.

Link: https://lore.kernel.org/r/20200427154636.381474-2-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:20 -03:00
Raed Salem
244faedfd4 net/mlx5: Refactor imm_inval_pkey field in cqe struct
The imm_inval_pkey field can hold four different types of data,
depends on the usage, the data could be one of the below:
- Immediate field of the received message
- Invalidate rkey
- Pkey of the packet
- Flow table metadata

Current implementation doesn't reflect the intended usage of the
field at usage time.

Reflect the different types by replace this field with a union,
modify code where this field is used to reflect its intended
usage.

Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-28 12:45:15 -07:00
Erez Shitrit
dff8e2d152 net/mlx5: Use aligned variable while allocating ICM memory
The alignment value is part of the input structure, so use it and spare
extra memory allocation when is not needed.
Now, using the new ability when allocating icm for Direct-Rule
insertion.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-28 12:45:10 -07:00
Huy Nguyen
d65dbedfd2 net/mlx5: Add support for COPY steering action
Add COPY type to modify_header action. IPsec feature is the first
feature that needs COPY steering action.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-28 12:44:44 -07:00
Leon Romanovsky
f0abc761bb RDMA/core: Fix race between destroy and release FD object
The call to ->lookup_put() was too early and it caused an unlock of the
read/write protection of the uobject after the FD was put. This allows a
race:

     CPU1                                 CPU2
 rdma_lookup_put_uobject()
   lookup_put_fd_uobject()
     fput()
				   fput()
				     uverbs_uobject_fd_release()
				       WARN_ON(uverbs_try_lock_object(uobj,
					       UVERBS_LOOKUP_WRITE));
   atomic_dec(usecnt)

Fix the code by changing the order, first unlock and call to
->lookup_put() after that.

Fixes: 3832125624 ("IB/core: Add support for idr types")
Link: https://lore.kernel.org/r/20200423060122.6182-1-leon@kernel.org
Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-24 15:40:41 -03:00
Sudip Mukherjee
47c370c1a5 IB/rdmavt: Always return ERR_PTR from rvt_create_mmap_info()
The commit below modified rvt_create_mmap_info() to return ERR_PTR's but
didn't update the callers to handle them. Modify rvt_create_mmap_info() to
only return ERR_PTR and fix all error checking after
rvt_create_mmap_info() was called.

Fixes: ff23dfa134 ("IB: Pass only ib_udata in function prototypes")
Link: https://lore.kernel.org/r/20200424173146.10970-1-sudipm.mukherjee@gmail.com
Cc: stable@vger.kernel.org [5.4+]
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-24 15:31:26 -03:00
Lang Cheng
a97bf49f82 RDMA/hns: Simplify the status judgment code of hns_roce_v1_m_qp()
Use status table to reduce cyclomatic complexity.

Link: https://lore.kernel.org/r/1586938475-37049-7-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-24 10:19:11 -03:00
Lang Cheng
357f342946 RDMA/hns: Simplify the state judgment code of qp
Use state table to make the qp state migrate code more readable.

Link: https://lore.kernel.org/r/1586938475-37049-6-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-24 10:19:11 -03:00
Lang Cheng
7c044adca2 RDMA/hns: Simplify the cqe code of poll cq
Encapsulate codes to get status of cqe into a function and use map table
instead of switch-case to reduce cyclomatic complexity of
hns_roce_v2_poll_one().

Link: https://lore.kernel.org/r/1586938475-37049-5-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-24 10:19:10 -03:00
Lang Cheng
a3de9e8381 RDMA/hns: Simplify the qp state convert code
Use type map table to reduce the cyclomatic complexity.

Link: https://lore.kernel.org/r/1586938475-37049-4-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-24 10:19:10 -03:00
Lijun Ou
375898e83d RDMA/hns: Optimize hns_roce_v2_set_mac()
Removes the unnecessary memset opertaion and adjust style of some lines in
hns_roce_v2_set_mac().

Link: https://lore.kernel.org/r/1586938475-37049-3-git-send-email-liweihang@huawei.com
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-24 10:19:10 -03:00
Lijun Ou
9976ea27b5 RDMA/hns: Optimize hns_roce_config_link_table()
Remove the unnecessary memset operation and adjust style of some lines in
hns_roce_config_link_table().

Link: https://lore.kernel.org/r/1586938475-37049-2-git-send-email-liweihang@huawei.com
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-24 10:19:10 -03:00
Leon Romanovsky
e0b4b4722d net/mlx5: Update transobj.c new cmd interface
Do mass update of transobj.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-23 21:42:16 +03:00
Leon Romanovsky
5d1c9a114a net/mlx5: Update vport.c to new cmd interface
Do mass update of vport.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-23 21:42:02 +03:00
Leon Romanovsky
322f3d45a1 RDMA/bnxt: Delete 'nq_ptr' variable which is not used
The variable "nq_ptr" is set but never used, this generates the following
warning while compiling kernel with W=1 option.

drivers/infiniband/hw/bnxt_re/qplib_fp.c: In function 'bnxt_qplib_service_nq':
drivers/infiniband/hw/bnxt_re/qplib_fp.c:303:25: warning:
   variable 'nq_ptr' set but not used [-Wunused-but-set-variable]
303 |  struct nq_base *nqe, **nq_ptr;
    |

Fixes: fddcbbb02a ("RDMA/bnxt_re: Simplify obtaining queue entry from hw ring")
Link: https://lore.kernel.org/r/20200419132046.123887-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-22 17:05:13 -03:00
Xi Wang
744b7bdfa7 RDMA/hns: Support 0 hop addressing for CQE buffer
Add the zero hop addressing support by using mtr interface for CQE buffer,
so the hns driver can support addressing hopnum between 0 to 3 for CQE.

Link: https://lore.kernel.org/r/1586779091-51410-7-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-22 16:22:11 -03:00
Xi Wang
6fd610c573 RDMA/hns: Support 0 hop addressing for SRQ buffer
Add the zero hop addressing support by using mtr interface for SRQ buffer,
so the hns driver can support addressing hopnum between 0 to 3 for SRQ.

Link: https://lore.kernel.org/r/1586779091-51410-6-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-22 16:21:37 -03:00
Xi Wang
d563099e3e RDMA/hns: Support 0 hop addressing for WQE buffer
Add the zero hop addressing support by using new mtr interface for WQE
buffer and simple mtr invoking process, so WQE buffer can support hopnum
between 0 to 3.

Link: https://lore.kernel.org/r/1586779091-51410-5-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-22 15:59:54 -03:00
Xi Wang
477a0a3870 RDMA/hns: Optimize 0 hop addressing for EQE buffer
Use the new mtr interface to simple the hop 0 addressing and multihop
addressing process.

Link: https://lore.kernel.org/r/1586779091-51410-4-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-22 15:59:54 -03:00
Xi Wang
cc23267aed RDMA/hns: Optimize hns buffer allocation flow
When the value of nbufs is 1, the buffer is in direct mode, which may cause
confusion. So optimizes current codes to make it easier to maintain and
understand.

Link: https://lore.kernel.org/r/1586779091-51410-3-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-22 15:59:54 -03:00
Xi Wang
3c873161a0 RDMA/hns: Add support for addressing when hopnum is 0
Currently, WQE and EQE table have already used the mtr interface to config
and access memory by multi-hop addressing when hopnum is from 1 to 3. But
if hopnum is 0, each table need write its own but repetitive logic, and
many duplicate code exists in the mtr interfaces invoke process.

So wraps the public logic as 3 functions: hns_roce_mtr_create(),
hns_roce_mtr_destroy() and hns_roce_mtr_map() to support hopnum ranges from
0 to 3. In addition, makes the mtr interfaces easier to use.

Link: https://lore.kernel.org/r/1586779091-51410-2-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-22 15:59:53 -03:00
Leon Romanovsky
83a2670212 RDMA/core: Fix overwriting of uobj in case of error
In case of failure to get file, the uobj is overwritten and causes to
supply bad pointer as an input to uverbs_uobject_put().

  BUG: KASAN: null-ptr-deref in atomic_fetch_sub include/asm-generic/atomic-instrumented.h:199 [inline]
  BUG: KASAN: null-ptr-deref in refcount_sub_and_test include/linux/refcount.h:253 [inline]
  BUG: KASAN: null-ptr-deref in refcount_dec_and_test include/linux/refcount.h:281 [inline]
  BUG: KASAN: null-ptr-deref in kref_put include/linux/kref.h:64 [inline]
  BUG: KASAN: null-ptr-deref in uverbs_uobject_put+0x22/0x90 drivers/infiniband/core/rdma_core.c:57
  Write of size 4 at addr 0000000000000030 by task syz-executor.4/1691

  CPU: 1 PID: 1691 Comm: syz-executor.4 Not tainted 5.6.0 #17
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  Call Trace:
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0x94/0xce lib/dump_stack.c:118
   __kasan_report+0x10c/0x190 mm/kasan/report.c:515
   kasan_report+0x32/0x50 mm/kasan/common.c:625
   check_memory_region_inline mm/kasan/generic.c:187 [inline]
   check_memory_region+0x16d/0x1c0 mm/kasan/generic.c:193
   atomic_fetch_sub include/asm-generic/atomic-instrumented.h:199 [inline]
   refcount_sub_and_test include/linux/refcount.h:253 [inline]
   refcount_dec_and_test include/linux/refcount.h:281 [inline]
   kref_put include/linux/kref.h:64 [inline]
   uverbs_uobject_put+0x22/0x90 drivers/infiniband/core/rdma_core.c:57
   alloc_begin_fd_uobject+0x1d0/0x250 drivers/infiniband/core/rdma_core.c:486
   rdma_alloc_begin_uobject+0xa8/0xf0 drivers/infiniband/core/rdma_core.c:509
   __uobj_alloc include/rdma/uverbs_std_types.h:117 [inline]
   ib_uverbs_create_comp_channel+0x16d/0x230 drivers/infiniband/core/uverbs_cmd.c:982
   ib_uverbs_write+0xaa5/0xdf0 drivers/infiniband/core/uverbs_main.c:665
   __vfs_write+0x7c/0x100 fs/read_write.c:494
   vfs_write+0x168/0x4a0 fs/read_write.c:558
   ksys_write+0xc8/0x200 fs/read_write.c:611
   do_syscall_64+0x9c/0x390 arch/x86/entry/common.c:295
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x466479
  Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007efe9f6a7c48 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000466479
  RDX: 0000000000000018 RSI: 0000000020000040 RDI: 0000000000000003
  RBP: 00007efe9f6a86bc R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000005
  R13: 0000000000000bf2 R14: 00000000004cb80a R15: 00000000006fefc0

Fixes: 849e149063 ("RDMA/core: Do not allow alloc_commit to fail")
Link: https://lore.kernel.org/r/20200421082929.311931-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-22 15:53:42 -03:00
Leon Romanovsky
0fb00941dc RDMA/core: Prevent mixed use of FDs between shared ufiles
FDs can only be used on the ufile that created them, they cannot be mixed
to other ufiles. We are lacking a check to prevent it.

  BUG: KASAN: null-ptr-deref in atomic64_sub_and_test include/asm-generic/atomic-instrumented.h:1547 [inline]
  BUG: KASAN: null-ptr-deref in atomic_long_sub_and_test include/asm-generic/atomic-long.h:460 [inline]
  BUG: KASAN: null-ptr-deref in fput_many+0x1a/0x140 fs/file_table.c:336
  Write of size 8 at addr 0000000000000038 by task syz-executor179/284

  CPU: 0 PID: 284 Comm: syz-executor179 Not tainted 5.5.0-rc5+ #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  Call Trace:
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0x94/0xce lib/dump_stack.c:118
   __kasan_report+0x18f/0x1b7 mm/kasan/report.c:510
   kasan_report+0xe/0x20 mm/kasan/common.c:639
   check_memory_region_inline mm/kasan/generic.c:185 [inline]
   check_memory_region+0x15d/0x1b0 mm/kasan/generic.c:192
   atomic64_sub_and_test include/asm-generic/atomic-instrumented.h:1547 [inline]
   atomic_long_sub_and_test include/asm-generic/atomic-long.h:460 [inline]
   fput_many+0x1a/0x140 fs/file_table.c:336
   rdma_lookup_put_uobject+0x85/0x130 drivers/infiniband/core/rdma_core.c:692
   uobj_put_read include/rdma/uverbs_std_types.h:96 [inline]
   _ib_uverbs_lookup_comp_file drivers/infiniband/core/uverbs_cmd.c:198 [inline]
   create_cq+0x375/0xba0 drivers/infiniband/core/uverbs_cmd.c:1006
   ib_uverbs_create_cq+0x114/0x140 drivers/infiniband/core/uverbs_cmd.c:1089
   ib_uverbs_write+0xaa5/0xdf0 drivers/infiniband/core/uverbs_main.c:769
   __vfs_write+0x7c/0x100 fs/read_write.c:494
   vfs_write+0x168/0x4a0 fs/read_write.c:558
   ksys_write+0xc8/0x200 fs/read_write.c:611
   do_syscall_64+0x9c/0x390 arch/x86/entry/common.c:294
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x44ef99
  Code: 00 b8 00 01 00 00 eb e1 e8 74 1c 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c4 ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007ffc0b74c028 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 00007ffc0b74c030 RCX: 000000000044ef99
  RDX: 0000000000000040 RSI: 0000000020000040 RDI: 0000000000000005
  RBP: 00007ffc0b74c038 R08: 0000000000401830 R09: 0000000000401830
  R10: 00007ffc0b74c038 R11: 0000000000000246 R12: 0000000000000000
  R13: 0000000000000000 R14: 00000000006be018 R15: 0000000000000000

Fixes: cf8966b347 ("IB/core: Add support for fd objects")
Link: https://lore.kernel.org/r/20200421082929.311931-2-leon@kernel.org
Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-22 15:53:23 -03:00
Jason Gunthorpe
39c011a538 RDMA/uverbs: Fix a race with disassociate and exit_mmap()
If uverbs_user_mmap_disassociate() is called while the mmap is
concurrently doing exit_mmap then the ordering of the
rdma_user_mmap_entry_put() is not reliable.

The put must be done before uvers_user_mmap_disassociate() returns,
otherwise there can be a use after free on the ucontext, and a left over
entry in the xarray. If the put is not done here then it is done during
rdma_umap_close() later.

Add the missing put to the error exit path.

  WARNING: CPU: 7 PID: 7111 at drivers/infiniband/core/rdma_core.c:810 uverbs_destroy_ufile_hw+0x2a5/0x340 [ib_uverbs]
  Modules linked in: bonding ipip tunnel4 geneve ip6_udp_tunnel udp_tunnel ip6_gre ip6_tunnel tunnel6 ip_gre ip_tunnel gre mlx5_ib mlx5_core mlxfw pci_hyperv_intf act_ct nf_flow_table ptp pps_core rdma_ucm ib_uverbs ib_ipoib ib_umad 8021q garp mrp openvswitch nsh nf_conncount nfsv3 nfs_acl xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype iptable_filter xt_conntrack br_netfilter bridge stp llc rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache overlay rpcrdma ib_isert iscsi_target_mod ib_iser kvm_intel ib_srpt iTCO_wdt target_core_mod iTCO_vendor_support kvm ib_srp nf_nat irqbypass crc32_pclmul crc32c_intel nf_conntrack rfkill nf_defrag_ipv6 virtio_net nf_defrag_ipv4 pcspkr ghash_clmulni_intel i2c_i801 net_failover failover i2c_core lpc_ich mfd_core rdma_cm ib_cm iw_cm button ib_core sunrpc sch_fq_codel ip_tables serio_raw [last unloaded: tunnel4]
  CPU: 7 PID: 7111 Comm: python3 Tainted: G        W         5.6.0-rc6-for-upstream-dbg-2020-03-21_06-41-26-18 #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  RIP: 0010:uverbs_destroy_ufile_hw+0x2a5/0x340 [ib_uverbs]
  Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 74 49 8b 84 24 08 01 00 00 48 85 c0 0f 84 13 ff ff ff 48 89 ef ff d0 e9 09 ff ff ff <0f> 0b e9 77 ff ff ff e8 0f d8 fa e0 e9 c5 fd ff ff e8 05 d8 fa e0
  RSP: 0018:ffff88840e0779a0 EFLAGS: 00010286
  RAX: dffffc0000000000 RBX: ffff8882a7721c00 RCX: 0000000000000000
  RDX: 1ffff11054ee469f RSI: ffffffff8446d7e0 RDI: ffff8882a77234f8
  RBP: ffff8882a7723400 R08: ffffed1085c0112c R09: 0000000000000001
  R10: 0000000000000001 R11: ffffed1085c0112b R12: ffff888403c30000
  R13: 0000000000000002 R14: ffff8882a7721cb0 R15: ffff8882a7721cd0
  FS:  00007f2046089700(0000) GS:ffff88842de00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f7cfe9a6e20 CR3: 000000040b8ac006 CR4: 0000000000360ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   ib_uverbs_remove_one+0x273/0x480 [ib_uverbs]
   ? up_write+0x15c/0x4a0
   remove_client_context+0xa6/0xf0 [ib_core]
   disable_device+0x12d/0x200 [ib_core]
   ? remove_client_context+0xf0/0xf0 [ib_core]
   ? mnt_get_count+0x1d0/0x1d0
   __ib_unregister_device+0x79/0x150 [ib_core]
   ib_unregister_device+0x21/0x30 [ib_core]
   __mlx5_ib_remove+0x91/0x110 [mlx5_ib]
   ? __mlx5_ib_remove+0x110/0x110 [mlx5_ib]
   mlx5_remove_device+0x241/0x310 [mlx5_core]
   mlx5_unregister_device+0x4d/0x1e0 [mlx5_core]
   mlx5_unload_one+0xc0/0x260 [mlx5_core]
   remove_one+0x5c/0x160 [mlx5_core]
   pci_device_remove+0xef/0x2a0
   ? pcibios_free_irq+0x10/0x10
   device_release_driver_internal+0x1d8/0x470
   unbind_store+0x152/0x200
   ? sysfs_kf_write+0x3b/0x180
   ? sysfs_file_ops+0x160/0x160
   kernfs_fop_write+0x284/0x460
   ? __sb_start_write+0x243/0x3a0
   vfs_write+0x197/0x4a0
   ksys_write+0x156/0x1e0
   ? __x64_sys_read+0xb0/0xb0
   ? do_syscall_64+0x73/0x1330
   ? do_syscall_64+0x73/0x1330
   do_syscall_64+0xe7/0x1330
   ? down_write_nested+0x3e0/0x3e0
   ? syscall_return_slowpath+0x970/0x970
   ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe
   ? lockdep_hardirqs_off+0x1de/0x2d0
   ? trace_hardirqs_off_thunk+0x1a/0x1c
   entry_SYSCALL_64_after_hwframe+0x49/0xbe
  RIP: 0033:0x7f20a3ff0cdb
  Code: 53 48 89 d5 48 89 f3 48 83 ec 18 48 89 7c 24 08 e8 5a fd ff ff 48 89 ea 41 89 c0 48 89 de 48 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 90 fd ff ff 48
  RSP: 002b:00007f2046087040 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 00007f2038016df0 RCX: 00007f20a3ff0cdb
  RDX: 000000000000000d RSI: 00007f2038016df0 RDI: 0000000000000018
  RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000100 R11: 0000000000000293 R12: 00007f2046e29630
  R13: 00007f20280035a0 R14: 0000000000000018 R15: 00007f2038016df0

Fixes: c043ff2cfb ("RDMA: Connect between the mmap entry and the umap_priv structure")
Link: https://lore.kernel.org/r/20200413132136.930388-1-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-22 15:43:46 -03:00
Aharon Landau
2d7e3ff7b6 RDMA/mlx5: Set GRH fields in query QP on RoCE
GRH fields such as sgid_index, hop limit, et. are set in the QP context
when QP is created/modified.

Currently, when query QP is performed, we fill the GRH fields only if the
GRH bit is set in the QP context, but this bit is not set for RoCE. Adjust
the check so we will set all relevant data for the RoCE too.

Since this data is returned to userspace, the below is an ABI regression.

Fixes: d8966fcd4c ("IB/core: Use rdma_ah_attr accessor functions")
Link: https://lore.kernel.org/r/20200413132028.930109-1-leon@kernel.org
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-22 15:43:46 -03:00
Leon Romanovsky
333fbaa025 net/mlx5: Move QP logic to mlx5_ib
The mlx5_core doesn't need any functionality coded in qp.c, so move
that file to drivers/infiniband/ be under mlx5_ib responsibility.

Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:53:21 +03:00
Leon Romanovsky
42f9bbd112 RDMA/mlx5: Alphabetically sort build artifacts
Sort .o objects in makefile to make addition of new object
less cumbersome.

Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:53:20 +03:00
Leon Romanovsky
9c275ee4ad net/mlx5: Delete not-used cmd header
The structures defined in the cmd header are not used and can be safely
removed from the driver. This patch removes that file and deletes all
relevant includes.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:53:20 +03:00
Leon Romanovsky
bfd745f8f3 RDMA/mlx5: Delete Q counter allocations command
Remove mlx5_ib implementation of Q counter allocation logic
together with cleaning boolean which controlled validity of the
counter. It is not needed, because counter_id == 0 means that
counter is not valid.

Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:53:20 +03:00
Leon Romanovsky
66247fbb28 net/mlx5: Remove Q counter low level helper APIs
mlx5 core users are encouraged to use low level API (mlx5_cmd_exec)
without the need of helper functions, do this for q counters, remove
helper functions and call mlx5_cmd_exec directly from users.

This will help reduce the total amount of code and reduction of the
mlx5_core symbol table.

Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:53:20 +03:00
Max Gurtovoy
95a776e8a6 RDMA/rw: use DIV_ROUND_UP to calculate nr_ops
Don't open-code DIV_ROUND_UP() kernel macro.

Link: https://lore.kernel.org/r/20200413133905.933343-1-leon@kernel.org
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-15 11:34:49 -03:00
Jason Gunthorpe
6e051971b0 RDMA/siw: Fix potential siw_mem refcnt leak in siw_fastreg_mr()
siw_fastreg_mr() invokes siw_mem_id2obj(), which returns a local reference
of the siw_mem object to "mem" with increased refcnt.  When
siw_fastreg_mr() returns, "mem" becomes invalid, so the refcount should be
decreased to keep refcount balanced.

The issue happens in one error path of siw_fastreg_mr(). When "base_mr"
equals to NULL but "mem" is not NULL, the function forgets to decrease the
refcnt increased by siw_mem_id2obj() and causes a refcnt leak.

Reorganize the flow so that the goto unwind can be used as expected.

Fixes: b9be6f18cf ("rdma/siw: transmit path")
Link: https://lore.kernel.org/r/1586939949-69856-1-git-send-email-xiyuyang19@fudan.edu.cn
Reported-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-15 11:26:51 -03:00
Alaa Hleihel
c08cfb2d8d RDMA/mlx4: Initialize ib_spec on the stack
Initialize ib_spec on the stack before using it, otherwise we will have
garbage values that will break creating default rules with invalid parsing
error.

Fixes: a37a1a4284 ("IB/mlx4: Add mechanism to support flow steering over IB links")
Link: https://lore.kernel.org/r/20200413132235.930642-1-leon@kernel.org
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-14 20:42:31 -03:00
Leon Romanovsky
dd302ee41e RDMA/cma: Limit the scope of rdma_is_consumer_reject function
The function is local to cma.c, so let's limit its scope.

Link: https://lore.kernel.org/r/20200413132323.930869-1-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-14 19:44:12 -03:00
Devesh Sharma
8ce111d00e RDMA/bnxt_re: Remove dead code from rcfw
In the previous refactoring serise there were few leftover functions which
are not is use anymore.  Removed them as it is a dead code.

Fixes: 6f53196bc5 ("RDMA/bnxt_re: Refactor doorbell management functions")
Link: https://lore.kernel.org/r/1585851136-2316-5-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-14 16:39:35 -03:00
Devesh Sharma
fddcbbb02a RDMA/bnxt_re: Simplify obtaining queue entry from hw ring
Restructring the data path and control path queue management code to
simplify the way a queue element is extracted from the hardware ring.

Introduced a new function which will give a pointer to the next ring item
depending upon the current cons/prod index in the hardware queue.

Further, there are hardcoding when size of queue entry is calculated,
replacing it with an inline function. This function would be easier to
expand if need going forward.

The code section to initialize the PSN search areas has also been
restructured and couple of functions has been added there.

Link: https://lore.kernel.org/r/1585851136-2316-4-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-14 16:39:35 -03:00
Devesh Sharma
c78671a4e6 RDMA/bnxt_re: Update missing hsi data structures
Adding fast path support data structure into hardware HSI. These
structures are header only definition of RQE/SRQE/SQE. This is to help
calculating the size of hardware wqe size.

Link: https://lore.kernel.org/r/1585851136-2316-3-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-14 16:39:35 -03:00
Devesh Sharma
99bf84e24e RDMA/bnxt_re: Reduce device page size detection code
Getting rid of the repeated code in the driver when deciding on the page
size of the hardware ring memory. A new common function would translate
the ring page size into device specific page size.

Link: https://lore.kernel.org/r/1585851136-2316-2-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-14 16:39:34 -03:00
Zou Wei
4f95308911 IB/qib: Remove unused variable ret
This patch fixes below warnings reported by coccicheck

drivers/infiniband/hw/qib/qib_iba7322.c:6878:8-11:
Unneeded variable: "ret". Return "0" on line 6907
drivers/infiniband/hw/qib/qib_iba7322.c:2378:5-8:
Unneeded variable: "ret". Return "0" on line 2513

Link: https://lore.kernel.org/r/1586745724-107477-1-git-send-email-zou_wei@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-14 16:34:17 -03:00
Mauro Carvalho Chehab
255e636df4 IB: Fix some documentation warnings
Parsing verbs.c with kernel-doc produce some warnings:

	./drivers/infiniband/core/verbs.c:2579: WARNING: Unexpected indentation.
	./drivers/infiniband/core/verbs.c:2581: WARNING: Block quote ends without a blank line; unexpected unindent.
	./drivers/infiniband/core/verbs.c:2613: WARNING: Unexpected indentation.
	./drivers/infiniband/core/verbs.c:2579: WARNING: Unexpected indentation.
	./drivers/infiniband/core/verbs.c:2581: WARNING: Block quote ends without a blank line; unexpected unindent.
	./drivers/infiniband/core/verbs.c:2613: WARNING: Unexpected indentation.

Address them by adding an extra blank line and converting the parameters
on one of the arguments to a table.

Link: https://lore.kernel.org/r/4c5466d0f450c5a9952138150c3485740b37f9c5.1586881715.git.mchehab+huawei@kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-14 16:31:28 -03:00
Jason Gunthorpe
1587982e70 RDMA: Remove a few extra calls to ib_get_client_data()
These four places already have easy access to the client data, just use
that instead.

Link: https://lore.kernel.org/r/0-v1-fae83f600b4a+68-less_get_client_data%25jgg@mellanox.com
Acked-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-14 16:05:08 -03:00
Dan Carpenter
9836535158 RDMA/cm: Fix an error check in cm_alloc_id_priv()
The xa_alloc_cyclic_irq() function returns either 0 or 1 on success and
negatives on error.  This code treats 1 as an error and returns ERR_PTR(1)
which will cause an Oops in the caller.

Fixes: ae78ff3a0f ("RDMA/cm: Convert local_id_table to XArray")
Link: https://lore.kernel.org/r/20200407093714.GA80285@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-14 16:01:38 -03:00
Jason Gunthorpe
eb356e6dc1 RDMA/uverbs: Make the event_queue fds return POLLERR when disassociated
If is_closed is set, and the event list is empty, then read() will return
-EIO without blocking. After setting is_closed in
ib_uverbs_free_event_queue(), we do trigger a wake_up on the poll_wait,
but the fops->poll() function does not check it, so poll will continue to
sleep on an empty list.

Fixes: 14e23bd6d2 ("RDMA/core: Fix locking in ib_uverbs_event_read")
Link: https://lore.kernel.org/r/0-v1-ace813388969+48859-uverbs_poll_fix%25jgg@mellanox.com
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-14 15:56:34 -03:00
Yishai Hadas
cf26deff90 RDMA/mlx5: Fix udata response upon SRQ creation
Fix udata response upon SRQ creation to use the UAPI structure (i.e.
mlx5_ib_create_srq_resp). It did not zero the reserved field in userspace.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/20200406173540.1466477-1-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-14 15:56:34 -03:00
Zhu Yanjun
0184afd15a RDMA/rxe: Set default vendor ID
The RXE driver doesn't set vendor_id and user space applications see
zeros. This causes to pyverbs tests to fail with the following traceback,
because the expectation is to have valid vendor_id.

Traceback (most recent call last):
  File "tests/test_device.py", line 51, in test_query_device
    self.verify_device_attr(attr)
  File "tests/test_device.py", line 77, in verify_device_attr
    assert attr.vendor_id != 0

In order to fix it, we will set vendor_id 0XFFFFFF, according to the IBTA
v1.4 A3.3.1 VENDOR INFORMATION section.

"""
A vendor that produces a generic controller (i.e., one that supports a
standard I/O protocol such as SRP), which does not have vendor specific
device drivers, may use the value of 0xFFFFFF in the VendorID field.
"""

Before:

hca_id: rxe0
        transport:                      InfiniBand (0)
        fw_ver:                         0.0.0
        node_guid:                      5054:00ff:feaa:5363
        sys_image_guid:                 5054:00ff:feaa:5363
        vendor_id:                      0x0000

After:

hca_id: rxe0
        transport:                      InfiniBand (0)
        fw_ver:                         0.0.0
        node_guid:                      5054:00ff:feaa:5363
        sys_image_guid:                 5054:00ff:feaa:5363
        vendor_id:                      0xffffff

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200406173501.1466273-1-leon@kernel.org
Signed-off-by: Zhu Yanjun <yanjunz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-14 15:52:38 -03:00
Leon Romanovsky
0c6949c3d1 RDMA/cm: Fix missing RDMA_CM_EVENT_REJECTED event after receiving REJ message
The cm_reset_to_idle() call before formatting event changed the CM_ID
state from IB_CM_REQ_RCVD to be IB_CM_IDLE. It caused to wrong value of
CM_REJ_MESSAGE_REJECTED field.

The result of that was that rdma_reject() calls in the passive side didn't
generate RDMA_CM_EVENT_REJECTED event in the active side.

Fixes: 81ddb41f87 ("RDMA/cm: Allow ib_send_cm_rej() to be done under lock")
Link: https://lore.kernel.org/r/20200406173242.1465911-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-14 15:50:49 -03:00
Colin Ian King
f70968f05d i40iw: fix null pointer dereference on a null wqe pointer
Currently the null check for wqe is incorrect and lets a null wqe
be passed to set_64bit_val and this indexes into the null pointer
causing a null pointer dereference.  Fix this by fixing the null
pointer check to return an error if wqe is null.

Link: https://lore.kernel.org/r/20200401224921.405279-1-colin.king@canonical.com
Addresses-Coverity: ("dereference after a null check")
Fixes: 4b34e23f4e ("i40iw: Report correct firmware version")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-14 15:44:50 -03:00
Linus Torvalds
919dce2470 RDMA 5.7 pull request
The majority of the patches are cleanups, refactorings and clarity
 improvements
 
 - Various driver updates for siw, bnxt_re, rxe, efa, mlx5, hfi1
 
 - Lots of cleanup patches for hns
 
 - Convert more places to use refcount
 
 - Aggressively lock the RDMA CM code that syzkaller says isn't working
 
 - Work to clarify ib_cm
 
 - Use the new ib_device lifecycle model in bnxt_re
 
 - Fix mlx5's MR cache which seems to be failing more often with the new
   ODP code
 
 - mlx5 'dynamic uar' and 'tx steering' user interfaces
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl6CSr0ACgkQOG33FX4g
 mxrtKg//XovbOfYAO7nC05FtGz9iEkIUBiwQOjUojgNSi6RMNDqRW1bmqKUugm1o
 9nXA6tw+fueEvUNSD541SCcxkUZJzWvubO9wHB6N3Fgy68N3Vf2rKV3EBBTh99rK
 Cb7rnmTTN6izRyI1wdyP2sjDJGyF8zvsgIrG2sibzLnqlyScrnD98YS0FdPZUfOa
 a1mmXBN/T7eaQ4TbE3lLLzGnifRlGmZ5vxEvOQmAlOdqAfIKQdbbW7oCRLVIleso
 gfQlOOvIgzHRwQ3VrFa3i6ETYtywXq7EgmQxCjqPVJQjCA79n5TLBkP1iRhvn8xi
 3+LO4YCkiSJ/NjTA2d9KwT6K4djj3cYfcueuqo2MoXXr0YLiY6TLv1OffKcUIq7c
 LM3d4CSwIAG+C2FZwaQrdSGa2h/CNfLAEeKxv430zggeDNKlwHJPV5w3rUJ8lT56
 wlyT7Lzosl0O9Z/1BSLYckTvbBCtYcmanVyCfHG8EJKAM1/tXy5LS8btJ3e51rPu
 XekR9ELrTTA2CTuoSCQGP6J0dBD2U7qO4XRCQ9N5BYLrI6RdP7Z4xYzzey49Z3Cx
 JaF86eurM7nS5biUszTtwww8AJMyYicB+0VyjBfk+mhv90w8tS1vZ1aZKzaQ1L6Z
 jWn8WgIN4rWY0YGQs6PiovT1FplyGs3p1wNmjn92WO0wZZ3WsmQ=
 =ae+a
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "The majority of the patches are cleanups, refactorings and clarity
  improvements.

  This cycle saw some more activity from Syzkaller, I think we are now
  clean on all but one of those bugs, including the long standing and
  obnoxious rdma_cm locking design defect. Continue to see many drivers
  getting cleanups, with a few new user visible features.

  Summary:

   - Various driver updates for siw, bnxt_re, rxe, efa, mlx5, hfi1

   - Lots of cleanup patches for hns

   - Convert more places to use refcount

   - Aggressively lock the RDMA CM code that syzkaller says isn't
     working

   - Work to clarify ib_cm

   - Use the new ib_device lifecycle model in bnxt_re

   - Fix mlx5's MR cache which seems to be failing more often with the
     new ODP code

   - mlx5 'dynamic uar' and 'tx steering' user interfaces"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (144 commits)
  RDMA/bnxt_re: make bnxt_re_ib_init static
  IB/qib: Delete struct qib_ivdev.qp_rnd
  RDMA/hns: Fix uninitialized variable bug
  RDMA/hns: Modify the mask of QP number for CQE of hip08
  RDMA/hns: Reduce the maximum number of extend SGE per WQE
  RDMA/hns: Reduce PFC frames in congestion scenarios
  RDMA/mlx5: Add support for RDMA TX flow table
  net/mlx5: Add support for RDMA TX steering
  IB/hfi1: Call kobject_put() when kobject_init_and_add() fails
  IB/hfi1: Fix memory leaks in sysfs registration and unregistration
  IB/mlx5: Move to fully dynamic UAR mode once user space supports it
  IB/mlx5: Limit the scope of struct mlx5_bfreg_info to mlx5_ib
  IB/mlx5: Extend QP creation to get uar page index from user space
  IB/mlx5: Extend CQ creation to get uar page index from user space
  IB/mlx5: Expose UAR object and its alloc/destroy commands
  IB/hfi1: Get rid of a warning
  RDMA/hns: Remove redundant judgment of qp_type
  RDMA/hns: Remove redundant assignment of wc->smac when polling cq
  RDMA/hns: Remove redundant qpc setup operations
  RDMA/hns: Remove meaningless prints
  ...
2020-04-01 18:18:18 -07:00
Linus Torvalds
29d9f30d4c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Fix the iwlwifi regression, from Johannes Berg.

   2) Support BSS coloring and 802.11 encapsulation offloading in
      hardware, from John Crispin.

   3) Fix some potential Spectre issues in qtnfmac, from Sergey
      Matyukevich.

   4) Add TTL decrement action to openvswitch, from Matteo Croce.

   5) Allow paralleization through flow_action setup by not taking the
      RTNL mutex, from Vlad Buslov.

   6) A lot of zero-length array to flexible-array conversions, from
      Gustavo A. R. Silva.

   7) Align XDP statistics names across several drivers for consistency,
      from Lorenzo Bianconi.

   8) Add various pieces of infrastructure for offloading conntrack, and
      make use of it in mlx5 driver, from Paul Blakey.

   9) Allow using listening sockets in BPF sockmap, from Jakub Sitnicki.

  10) Lots of parallelization improvements during configuration changes
      in mlxsw driver, from Ido Schimmel.

  11) Add support to devlink for generic packet traps, which report
      packets dropped during ACL processing. And use them in mlxsw
      driver. From Jiri Pirko.

  12) Support bcmgenet on ACPI, from Jeremy Linton.

  13) Make BPF compatible with RT, from Thomas Gleixnet, Alexei
      Starovoitov, and your's truly.

  14) Support XDP meta-data in virtio_net, from Yuya Kusakabe.

  15) Fix sysfs permissions when network devices change namespaces, from
      Christian Brauner.

  16) Add a flags element to ethtool_ops so that drivers can more simply
      indicate which coalescing parameters they actually support, and
      therefore the generic layer can validate the user's ethtool
      request. Use this in all drivers, from Jakub Kicinski.

  17) Offload FIFO qdisc in mlxsw, from Petr Machata.

  18) Support UDP sockets in sockmap, from Lorenz Bauer.

  19) Fix stretch ACK bugs in several TCP congestion control modules,
      from Pengcheng Yang.

  20) Support virtual functiosn in octeontx2 driver, from Tomasz
      Duszynski.

  21) Add region operations for devlink and use it in ice driver to dump
      NVM contents, from Jacob Keller.

  22) Add support for hw offload of MACSEC, from Antoine Tenart.

  23) Add support for BPF programs that can be attached to LSM hooks,
      from KP Singh.

  24) Support for multiple paths, path managers, and counters in MPTCP.
      From Peter Krystad, Paolo Abeni, Florian Westphal, Davide Caratti,
      and others.

  25) More progress on adding the netlink interface to ethtool, from
      Michal Kubecek"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2121 commits)
  net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline
  cxgb4/chcr: nic-tls stats in ethtool
  net: dsa: fix oops while probing Marvell DSA switches
  net/bpfilter: remove superfluous testing message
  net: macb: Fix handling of fixed-link node
  net: dsa: ksz: Select KSZ protocol tag
  netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write
  net: stmmac: add EHL 2.5Gbps PCI info and PCI ID
  net: stmmac: add EHL PSE0 & PSE1 1Gbps PCI info and PCI ID
  net: stmmac: create dwmac-intel.c to contain all Intel platform
  net: dsa: bcm_sf2: Support specifying VLAN tag egress rule
  net: dsa: bcm_sf2: Add support for matching VLAN TCI
  net: dsa: bcm_sf2: Move writing of CFP_DATA(5) into slicing functions
  net: dsa: bcm_sf2: Check earlier for FLOW_EXT and FLOW_MAC_EXT
  net: dsa: bcm_sf2: Disable learning for ASP port
  net: dsa: b53: Deny enslaving port 7 for 7278 into a bridge
  net: dsa: b53: Prevent tagged VLAN on port 7 for 7278
  net: dsa: b53: Restore VLAN entries upon (re)configuration
  net: dsa: bcm_sf2: Fix overflow checks
  hv_netvsc: Remove unnecessary round_up for recv_completion_cnt
  ...
2020-03-31 17:29:33 -07:00
Linus Torvalds
a776c270a0 Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
 "The EFI changes in this cycle are much larger than usual, for two
  (positive) reasons:

   - The GRUB project is showing signs of life again, resulting in the
     introduction of the generic Linux/UEFI boot protocol, instead of
     x86 specific hacks which are increasingly difficult to maintain.
     There's hope that all future extensions will now go through that
     boot protocol.

   - Preparatory work for RISC-V EFI support.

  The main changes are:

   - Boot time GDT handling changes

   - Simplify handling of EFI properties table on arm64

   - Generic EFI stub cleanups, to improve command line handling, file
     I/O, memory allocation, etc.

   - Introduce a generic initrd loading method based on calling back
     into the firmware, instead of relying on the x86 EFI handover
     protocol or device tree.

   - Introduce a mixed mode boot method that does not rely on the x86
     EFI handover protocol either, and could potentially be adopted by
     other architectures (if another one ever surfaces where one
     execution mode is a superset of another)

   - Clean up the contents of 'struct efi', and move out everything that
     doesn't need to be stored there.

   - Incorporate support for UEFI spec v2.8A changes that permit
     firmware implementations to return EFI_UNSUPPORTED from UEFI
     runtime services at OS runtime, and expose a mask of which ones are
     supported or unsupported via a configuration table.

   - Partial fix for the lack of by-VA cache maintenance in the
     decompressor on 32-bit ARM.

   - Changes to load device firmware from EFI boot service memory
     regions

   - Various documentation updates and minor code cleanups and fixes"

* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
  efi/libstub/arm: Fix spurious message that an initrd was loaded
  efi/libstub/arm64: Avoid image_base value from efi_loaded_image
  partitions/efi: Fix partition name parsing in GUID partition entry
  efi/x86: Fix cast of image argument
  efi/libstub/x86: Use ULONG_MAX as upper bound for all allocations
  efi: Fix a mistype in comments mentioning efivar_entry_iter_begin()
  efi/libstub: Avoid linking libstub/lib-ksyms.o into vmlinux
  efi/x86: Preserve %ebx correctly in efi_set_virtual_address_map()
  efi/x86: Ignore the memory attributes table on i386
  efi/x86: Don't relocate the kernel unless necessary
  efi/x86: Remove extra headroom for setup block
  efi/x86: Add kernel preferred address to PE header
  efi/x86: Decompress at start of PE image load address
  x86/boot/compressed/32: Save the output address instead of recalculating it
  efi/libstub/x86: Deal with exit() boot service returning
  x86/boot: Use unsigned comparison for addresses
  efi/x86: Avoid using code32_start
  efi/x86: Make efi32_pe_entry() more readable
  efi/x86: Respect 32-bit ABI in efi32_pe_entry()
  efi/x86: Annotate the LOADED_IMAGE_PROTOCOL_GUID with SYM_DATA
  ...
2020-03-30 16:13:08 -07:00
YueHaibing
b4d8ddf835 RDMA/bnxt_re: make bnxt_re_ib_init static
Fix sparse warning:

drivers/infiniband/hw/bnxt_re/main.c:1313:5:
 warning: symbol 'bnxt_re_ib_init' was not declared. Should it be static?

Link: https://lore.kernel.org/r/20200330110219.24448-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-30 15:03:19 -03:00
Saeed Mahameed
e999a7343d Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  mlx5: Remove uninitialized use of key in mlx5_core_create_mkey
  {IB,net}/mlx5: Move asynchronous mkey creation to mlx5_ib
  {IB,net}/mlx5: Assign mkey variant in mlx5_ib only
  {IB,net}/mlx5: Setup mkey variant before mr create command invocation

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-29 23:42:11 -07:00
David S. Miller
f0b5989745 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor comment conflict in mac80211.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 21:25:29 -07:00
George Spelvin
3e87f43130 IB/qib: Delete struct qib_ivdev.qp_rnd
I was checking the field to see if it needed the full get_random_bytes()
and discovered it's unused.

Only compile-tested, as I don't have the hardware, but I'm still pretty
confident.

Link: https://lore.kernel.org/r/202003281643.02SGh6eG002694@sdf.org
Signed-off-by: George Spelvin <lkml@sdf.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-29 11:15:16 -03:00
Gustavo A. R. Silva
d35dc58dd2 RDMA/hns: Fix uninitialized variable bug
There is a potential execution path in which variable *ret* is returned
without being properly initialized, previously.

Fix this by initializing variable *ret* to 0.

Link: https://lore.kernel.org/r/20200328023539.GA32016@embeddedor
Addresses-Coverity-ID: 1491917 ("Uninitialized scalar variable")
Fixes: 2f49de21f3 ("RDMA/hns: Optimize mhop get flow for multi-hop addressing")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-29 11:09:55 -03:00
Lang Cheng
90e735aecc RDMA/hns: Modify the mask of QP number for CQE of hip08
The hip08 supports up to 1M QPs, so the qpn mask of cqe should be
modified.

Link: https://lore.kernel.org/r/1585194018-4381-4-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-29 11:04:21 -03:00
Lang Cheng
019cd05ce5 RDMA/hns: Reduce the maximum number of extend SGE per WQE
Just reduce the default number to 64 for backward compatibility, the
driver can still get this configuration from the firmware.

Link: https://lore.kernel.org/r/1585194018-4381-3-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-29 11:04:21 -03:00
Jihua Tao
9d04d56c47 RDMA/hns: Reduce PFC frames in congestion scenarios
The original value means sending 16 packets at a time, and it should be
configured to 0 which means sending 1 packet instead. It is modified to
reduce the number of PFC frames to make sure the performance meets
expectations when flow control is enabled on hip08.

Link: https://lore.kernel.org/r/1585194018-4381-2-git-send-email-liweihang@huawei.com
Signed-off-by: Jihua Tao <taojihua4@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-29 11:04:21 -03:00
Jason Gunthorpe
dbdf8909d0 Merge branch 'mlx5_tx_steering' into rdma.git for-next
Leon Romanovsky says:

====================
Those two patches from Michael extends mlx5_core and mlx5_ib flow steering
to support RDMA TX in similar way to already supported RDMA RX.
====================

Based on the mlx5-next branch at
 git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Due to dependencies

* branch 'mlx5_tx_steering':
  RDMA/mlx5: Add support for RDMA TX flow table
  net/mlx5: Add support for RDMA TX steering
2020-03-27 13:26:59 -03:00
Michael Guralnik
af9c38411d RDMA/mlx5: Add support for RDMA TX flow table
Enable user application to add rules for RDMA TX steering table.
Rules in this steering table will allow to steer transmitted RDMA
traffic.

Link: https://lore.kernel.org/r/20200324061425.1570190-3-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-27 13:24:48 -03:00
Kaike Wan
dfb5394f80 IB/hfi1: Call kobject_put() when kobject_init_and_add() fails
When kobject_init_and_add() returns an error in the function
hfi1_create_port_files(), the function kobject_put() is not called for the
corresponding kobject, which potentially leads to memory leak.

This patch fixes the issue by calling kobject_put() even if
kobject_init_and_add() fails.

Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200326163813.21129.44280.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-27 13:13:36 -03:00
Kaike Wan
5c15abc432 IB/hfi1: Fix memory leaks in sysfs registration and unregistration
When the hfi1 driver is unloaded, kmemleak will report the following
issue:

unreferenced object 0xffff8888461a4c08 (size 8):
comm "kworker/0:0", pid 5, jiffies 4298601264 (age 2047.134s)
hex dump (first 8 bytes):
73 64 6d 61 30 00 ff ff sdma0...
backtrace:
[<00000000311a6ef5>] kvasprintf+0x62/0xd0
[<00000000ade94d9f>] kobject_set_name_vargs+0x1c/0x90
[<0000000060657dbb>] kobject_init_and_add+0x5d/0xb0
[<00000000346fe72b>] 0xffffffffa0c5ecba
[<000000006cfc5819>] 0xffffffffa0c866b9
[<0000000031c65580>] 0xffffffffa0c38e87
[<00000000e9739b3f>] local_pci_probe+0x41/0x80
[<000000006c69911d>] work_for_cpu_fn+0x16/0x20
[<00000000601267b5>] process_one_work+0x171/0x380
[<0000000049a0eefa>] worker_thread+0x1d1/0x3f0
[<00000000909cf2b9>] kthread+0xf8/0x130
[<0000000058f5f874>] ret_from_fork+0x35/0x40

This patch fixes the issue by:

- Releasing dd->per_sdma[i].kobject in hfi1_unregister_sysfs().
  - This will fix the memory leak.

- Calling kobject_put() to unwind operations only for those entries in
   dd->per_sdma[] whose operations have succeeded (including the current
   one that has just failed) in hfi1_verbs_register_sysfs().

Cc: <stable@vger.kernel.org>
Fixes: 0cb2aa690c ("IB/hfi1: Add sysfs interface for affinity setup")
Link: https://lore.kernel.org/r/20200326163807.21129.27371.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-27 13:13:36 -03:00
Yishai Hadas
0a2fd01c28 IB/mlx5: Move to fully dynamic UAR mode once user space supports it
Move to fully dynamic UAR mode once user space supports it.  In this case
we prevent any legacy mode of UARs on the allocated context and prevent
redundant allocation of the static ones.

Link: https://lore.kernel.org/r/20200324060143.1569116-6-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-27 12:59:05 -03:00
Leon Romanovsky
2152862298 IB/mlx5: Limit the scope of struct mlx5_bfreg_info to mlx5_ib
struct mlx5_bfreg_info is used by mlx5_ib only but is exposed to both RDMA
and netdev parts of mlx5 driver. Move that struct to mlx5_ib namespace,
clean vertical space alignment and convert lib_uar_4k from bool to
bitfield.

Link: https://lore.kernel.org/r/20200324060143.1569116-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-27 12:59:04 -03:00
Yishai Hadas
ac42a5ee92 IB/mlx5: Extend QP creation to get uar page index from user space
Extend QP creation to get uar page index from user space, this mode can be
used with the UAR dynamic mode APIs to allocate/destroy a UAR object.

As part of enabling this option blocked the weird/un-supported cross
channel option which uses index 0 hard-coded.

This QP flag wasn't exposed to user space as part of any formal upstream
release, the dynamic option can allow having valid UAR page index instead.

Link: https://lore.kernel.org/r/20200324060143.1569116-4-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-27 12:59:04 -03:00
Yishai Hadas
64d99f6a62 IB/mlx5: Extend CQ creation to get uar page index from user space
Extend CQ creation to get uar page index from user space, this mode can be
used with the UAR dynamic mode APIs to allocate/destroy a UAR object.

Link: https://lore.kernel.org/r/20200324060143.1569116-3-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-27 12:59:04 -03:00
Yishai Hadas
342ee59de9 IB/mlx5: Expose UAR object and its alloc/destroy commands
Expose UAR object and its alloc/destroy commands to be used over the ioctl
interface by user space applications.

This API supports both BF & NC modes and enables a dynamic allocation of
UARs once really needed.

As the number of driver objects were limited by the core ones when the
merged tree is prepared, had to decrease the number of core objects to
enable the new UAR object usage.

Link: https://lore.kernel.org/r/20200324060143.1569116-2-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-27 12:59:04 -03:00
Mauro Carvalho Chehab
a4da83c215 IB/hfi1: Get rid of a warning
The right markup for a variable is @foo, and not @foo[].

Using a wrong markup caused this warning:

	./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:243: WARNING: Inline strong start-string without end-string.

Link: https://lore.kernel.org/r/9dce702510505556d75a13d9641e09218a4b4a65.1584456635.git.mchehab+huawei@kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-27 12:51:04 -03:00
Weihang Li
e0b0722643 RDMA/hns: Remove redundant judgment of qp_type
Type of qp has been checked in check_send_valid(), so this judgment should
be removed.

Link: https://lore.kernel.org/r/1584674622-52773-11-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-26 16:52:29 -03:00
Weihang Li
cd4a70bb7d RDMA/hns: Remove redundant assignment of wc->smac when polling cq
The field smac in ib_wc was used for create AH and then it will be treated
as destination mac address in UD sqwqe, but related code about filling smac
into AH has been removed in core. Actually, the dmac in UD sqwqe is parsed
from the dgid in grh which is passed in by ULP now, so this assignment
should be removed.

Link: https://lore.kernel.org/r/1584674622-52773-10-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-26 16:52:29 -03:00
Lang Cheng
f4c5d869c8 RDMA/hns: Remove redundant qpc setup operations
Before calling modify_qp_reset_to_init(), the entire qpc mask has been
cleared, so it is no longer necessary to clear the specific fields in the
mask.

Link: https://lore.kernel.org/r/1584674622-52773-9-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-26 16:52:28 -03:00
Wenpeng Liang
bceda6e67b RDMA/hns: Remove meaningless prints
ceq and aeq is a ring buffer, consumer index of them will be set to zero
after reaching the maximum value. The warning should be removed or it may
mislead the users.

Link: https://lore.kernel.org/r/1584674622-52773-8-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-26 16:52:28 -03:00
Lang Cheng
f91b919687 RDMA/hns: Remove definition of cq doorbell structure
The struct hns_roce_v2_cq_db is unused, it should be removed.

Link: https://lore.kernel.org/r/1584674622-52773-7-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-26 16:52:28 -03:00
Lang Cheng
fd72926c33 RDMA/hns: Adjust the qp status value sequence of the hardware
Interchange SQD and SQE to match the protocol.

Link: https://lore.kernel.org/r/1584674622-52773-6-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-26 16:52:28 -03:00
Lijun Ou
99e713f8da RDMA/hns: Optimize hns_roce_alloc_vf_resource()
The capbilities of hardware should be got at first and then used in
hns_roce_alloc_vf_resource(). Also removes an unnecessary if ... else
condition in it.

Link: https://lore.kernel.org/r/1584674622-52773-5-git-send-email-liweihang@huawei.com
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-26 16:52:27 -03:00
Lang Cheng
d398d4ca5f RDMA/hns: Simplify attribute judgment code
Combine attribute flags before masking them.

Link: https://lore.kernel.org/r/1584674622-52773-4-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-26 16:52:27 -03:00
Weihang Li
30d41e18c3 RDMA/hns: Fix a wrong judgment of return value
hns_roce_alloc_mtt_range() never return -1, ret should be checked
whether it is zero instead of -1.

Fixes: 1ceb0b11a8 ("RDMA/hns: Fix non-standard error codes")
Link: https://lore.kernel.org/r/1584674622-52773-3-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-26 16:52:27 -03:00
Lijun Ou
ae1c61489c RDMA/hns: Unify format of prints
Use ibdev_err/dbg/warn() instead of dev_err/dbg/warn(), and modify some
prints into format of "failed to do something, ret = n".

Link: https://lore.kernel.org/r/1584674622-52773-2-git-send-email-liweihang@huawei.com
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-26 16:52:26 -03:00
Sergey Gorenko
26e28deb81 IB/iser: Always check sig MR before putting it to the free pool
libiscsi calls the check_protection transport handler only if SCSI-Respose
is received. So, the handler is never called if iSCSI task is completed
for some other reason like a timeout or error handling. And this behavior
looks correct. But the iSER does not handle this case properly because it
puts a non-checked signature MR to the free pool. Then the error occurs at
reusing the MR because it is not allowed to invalidate a signature MR
without checking.

This commit adds an extra check to iser_unreg_mem_fastreg(), which is a
part of the task cleanup flow. Now the signature MR is checked there if it
is needed.

Link: https://lore.kernel.org/r/20200325151210.1548-1-sergeygo@mellanox.com
Signed-off-by: Sergey Gorenko <sergeygo@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-26 16:46:54 -03:00
Zhu Yanjun
d0ca2c35dd RDMA/rxe: Set sys_image_guid to be aligned with HW IB devices
The RXE driver doesn't set sys_image_guid and user space applications see
zeros. This causes to pyverbs tests to fail with the following traceback,
because the IBTA spec requires to have valid sys_image_guid.

 Traceback (most recent call last):
   File "./tests/test_device.py", line 51, in test_query_device
     self.verify_device_attr(attr)
   File "./tests/test_device.py", line 74, in verify_device_attr
     assert attr.sys_image_guid != 0

In order to fix it, set sys_image_guid to be equal to node_guid.

Before:
 5: rxe0: ... node_guid 5054:00ff:feaa:5363 sys_image_guid
 0000:0000:0000:0000

After:
 5: rxe0: ... node_guid 5054:00ff:feaa:5363 sys_image_guid
 5054:00ff:feaa:5363

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200323112800.1444784-1-leon@kernel.org
Signed-off-by: Zhu Yanjun <yanjunz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-26 16:45:29 -03:00
Takashi Iwai
23ab5261e2 IB/hfi1: Use scnprintf() for avoiding potential buffer overflow
Since snprintf() returns the would-be-output size instead of the actual
output size, the succeeding calls may go beyond the given buffer limit.
Fix it by replacing with scnprintf().

Link: https://lore.kernel.org/r/20200319154641.23711-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-26 15:06:14 -03:00
Avihai Horon
987914ab84 RDMA/cm: Update num_paths in cma_resolve_iboe_route error flow
After a successful allocation of path_rec, num_paths is set to 1, but any
error after such allocation will leave num_paths uncleared.

This causes to de-referencing a NULL pointer later on. Hence, num_paths
needs to be set back to 0 if such an error occurs.

The following crash from syzkaller revealed it.

  kasan: CONFIG_KASAN_INLINE enabled
  kasan: GPF could be caused by NULL-ptr deref or user memory access
  general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
  CPU: 0 PID: 357 Comm: syz-executor060 Not tainted 4.18.0+ #311
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
  rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
  RIP: 0010:ib_copy_path_rec_to_user+0x94/0x3e0
  Code: f1 f1 f1 f1 c7 40 0c 00 00 f4 f4 65 48 8b 04 25 28 00 00 00 48 89
  45 c8 31 c0 e8 d7 60 24 ff 48 8d 7b 4c 48 89 f8 48 c1 e8 03 <42> 0f b6
  14 30 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85
  RSP: 0018:ffff88006586f980 EFLAGS: 00010207
  RAX: 0000000000000009 RBX: 0000000000000000 RCX: 1ffff1000d5fe475
  RDX: ffff8800621e17c0 RSI: ffffffff820d45f9 RDI: 000000000000004c
  RBP: ffff88006586fa50 R08: ffffed000cb0df73 R09: ffffed000cb0df72
  R10: ffff88006586fa70 R11: ffffed000cb0df73 R12: 1ffff1000cb0df30
  R13: ffff88006586fae8 R14: dffffc0000000000 R15: ffff88006aff2200
  FS: 00000000016fc880(0000) GS:ffff88006d000000(0000)
  knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000020000040 CR3: 0000000063fec000 CR4: 00000000000006b0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
  ? ib_copy_path_rec_from_user+0xcc0/0xcc0
  ? __mutex_unlock_slowpath+0xfc/0x670
  ? wait_for_completion+0x3b0/0x3b0
  ? ucma_query_route+0x818/0xc60
  ucma_query_route+0x818/0xc60
  ? ucma_listen+0x1b0/0x1b0
  ? sched_clock_cpu+0x18/0x1d0
  ? sched_clock_cpu+0x18/0x1d0
  ? ucma_listen+0x1b0/0x1b0
  ? ucma_write+0x292/0x460
  ucma_write+0x292/0x460
  ? ucma_close_id+0x60/0x60
  ? sched_clock_cpu+0x18/0x1d0
  ? sched_clock_cpu+0x18/0x1d0
  __vfs_write+0xf7/0x620
  ? ucma_close_id+0x60/0x60
  ? kernel_read+0x110/0x110
  ? time_hardirqs_on+0x19/0x580
  ? lock_acquire+0x18b/0x3a0
  ? finish_task_switch+0xf3/0x5d0
  ? _raw_spin_unlock_irq+0x29/0x40
  ? _raw_spin_unlock_irq+0x29/0x40
  ? finish_task_switch+0x1be/0x5d0
  ? __switch_to_asm+0x34/0x70
  ? __switch_to_asm+0x40/0x70
  ? security_file_permission+0x172/0x1e0
  vfs_write+0x192/0x460
  ksys_write+0xc6/0x1a0
  ? __ia32_sys_read+0xb0/0xb0
  ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe
  ? do_syscall_64+0x1d/0x470
  do_syscall_64+0x9e/0x470
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 3c86aa70bf ("RDMA/cm: Add RDMA CM support for IBoE devices")
Link: https://lore.kernel.org/r/20200318101741.47211-1-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-26 14:43:12 -03:00
Maor Gottlieb
ba80013fba RDMA/mlx5: Block delay drop to unprivileged users
It has been discovered that this feature can globally block the RX port,
so it should be allowed for highly privileged users only.

Fixes: 03404e8ae652("IB/mlx5: Add support to dropless RQ")
Link: https://lore.kernel.org/r/20200322124906.1173790-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-25 09:56:30 -03:00
Yishai Hadas
1f3db16188 IB/mlx5: Generally use the WC auto detection test result
Now that we have direct and reliable detection of WC support by the
system, use is broadly. The only case we have to worry about is when the
WC autodetector cannot run.

For this fringe case generally assume that that WC is available, except in
the well defined case of no PAT support on x86 which is tested by calling
arch_can_pci_mmap_wc().

If WC is wrongly assumed to be available then it causes a small
performance hit on paths in userspace that are tuned to the assumption
that WC is available. There is no functional loss.

It is very unlikely that any platforms exist that lack WC and also care
about the micro optimization of WC in the fringe case where autodetection
does not work.

By removing the fairly bogus CONFIG tests this makes WC work broadly on
all arches and all platforms.

Link: https://lore.kernel.org/r/20200318100323.46659-1-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-24 20:22:21 -03:00
Xi Wang
38dcb35048 RDMA/hns: Optimize mhop put flow for multi-hop addressing
Optimizes hns_roce_table_mhop_get() by encapsulating code about clearing
hem into clear_mhop_hem(), which will make the code flow clearer.

Link: https://lore.kernel.org/r/1584417324-2255-3-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-24 20:18:56 -03:00
Xi Wang
2f49de21f3 RDMA/hns: Optimize mhop get flow for multi-hop addressing
Splits hns_roce_table_mhop_get() into 4 sub-functions to make the code flow
clearer.

Link: https://lore.kernel.org/r/1584417324-2255-2-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-24 20:18:56 -03:00
Selvin Xavier
b1d56fdcb6 RDMA/bnxt_re: Wait for all the CQ events before freeing CQ data structures
Destroy CQ command to firmware returns the num_cnq_events as a
response. This indicates the driver about the number of CQ events
generated for this CQ. Driver should wait for all these events before
freeing the CQ host structures.  Also, add routine to clean all the
pending notification for the CQs getting destroyed. This avoids the
possibility of accessing the CQ data structures after its freed.

Fixes: 1ac5a40479 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Link: https://lore.kernel.org/r/1584120842-3200-1-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-24 20:15:36 -03:00
Leon Romanovsky
950bf4f177 RDMA/mlx5: Fix access to wrong pointer while performing flush due to error
The main difference between send and receive SW completions is related to
separate treatment of WQ queue. For receive completions, the initial index
to be flushed is stored in "tail", while for send completions, it is in
deleted "last_poll".

  CPU: 54 PID: 53405 Comm: kworker/u161:0 Kdump: loaded Tainted: G           OE    --------- -t - 4.18.0-147.el8.ppc64le #1
  Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
  NIP:  c000003c7c00a000 LR: c00800000e586af4 CTR: c000003c7c00a000
  REGS: c0000036cc9db940 TRAP: 0400   Tainted: G           OE    --------- -t -  (4.18.0-147.el8.ppc64le)
  MSR:  9000000010009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24004488  XER: 20040000
  CFAR: c00800000e586af0 IRQMASK: 0
  GPR00: c00800000e586ab4 c0000036cc9dbbc0 c00800000e5f1a00 c0000037d8433800
  GPR04: c000003895a26800 c0000037293f2000 0000000000000201 0000000000000011
  GPR08: c000003895a26c80 c000003c7c00a000 0000000000000000 c00800000ed30438
  GPR12: c000003c7c00a000 c000003fff684b80 c00000000017c388 c00000396ec4be40
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: c00000000151e498 0000000000000010 c000003895a26848 0000000000000010
  GPR24: 0000000000000010 0000000000010000 c000003895a26800 0000000000000000
  GPR28: 0000000000000010 c0000037d8433800 c000003895a26c80 c000003895a26800
  NIP [c000003c7c00a000] 0xc000003c7c00a000
  LR [c00800000e586af4] __ib_process_cq+0xec/0x1b0 [ib_core]
  Call Trace:
  [c0000036cc9dbbc0] [c00800000e586ab4] __ib_process_cq+0xac/0x1b0 [ib_core] (unreliable)
  [c0000036cc9dbc40] [c00800000e586c88] ib_cq_poll_work+0x40/0xb0 [ib_core]
  [c0000036cc9dbc70] [c000000000171f44] process_one_work+0x2f4/0x5c0
  [c0000036cc9dbd10] [c000000000172a0c] worker_thread+0xcc/0x760
  [c0000036cc9dbdc0] [c00000000017c52c] kthread+0x1ac/0x1c0
  [c0000036cc9dbe30] [c00000000000b75c] ret_from_kernel_thread+0x5c/0x80

Fixes: 8e3b688301 ("RDMA/mlx5: Delete unreachable handle_atomic code by simplifying SW completion")
Link: https://lore.kernel.org/r/20200318091640.44069-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-24 19:54:57 -03:00
Mike Marciniszyn
2d47fbacf2 RDMA/core: Ensure security pkey modify is not lost
The following modify sequence (loosely based on ipoib) will lose a pkey
modifcation:

- Modify (pkey index, port)
- Modify (new pkey index, NO port)

After the first modify, the qp_pps list will have saved the pkey and the
unit on the main list.

During the second modify, get_new_pps() will fetch the port from qp_pps
and read the new pkey index from qp_attr->pkey_index.  The state will
still be zero, or IB_PORT_PKEY_NOT_VALID. Because of the invalid state,
the new values will never replace the one in the qp pps list, losing the
new pkey.

This happens because the following if statements will never correct the
state because the first term will be false. If the code had been executed,
it would incorrectly overwrite valid values.

  if ((qp_attr_mask & IB_QP_PKEY_INDEX) && (qp_attr_mask & IB_QP_PORT))
	  new_pps->main.state = IB_PORT_PKEY_VALID;

  if (!(qp_attr_mask & (IB_QP_PKEY_INDEX | IB_QP_PORT)) && qp_pps) {
	  new_pps->main.port_num = qp_pps->main.port_num;
	  new_pps->main.pkey_index = qp_pps->main.pkey_index;
	  if (qp_pps->main.state != IB_PORT_PKEY_NOT_VALID)
		  new_pps->main.state = IB_PORT_PKEY_VALID;
  }

Fix by joining the two if statements with an or test to see if qp_pps is
non-NULL and in the correct state.

Fixes: 1dd017882e ("RDMA/core: Fix protection fault in get_pkey_idx_qp_list")
Link: https://lore.kernel.org/r/20200313124704.14982.55907.stgit@awfm-01.aw.intel.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-24 19:53:25 -03:00
Dan Carpenter
a766fa8473 IB/mlx5: Fix a NULL vs IS_ERR() check
The kzalloc() function returns NULL, not error pointers.

Fixes: 30f2fe40c7 ("IB/mlx5: Introduce UAPIs to manage packet pacing")
Link: https://lore.kernel.org/r/20200320132641.GF95012@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-24 19:47:55 -03:00
Andrew Morton
5fb5186383 RDMA/siw: Suppress uninitialized var warning
drivers/infiniband/sw/siw/siw_qp_rx.c: In function siw_proc_send:
./include/linux/spinlock.h:288:3: warning: flags may be used uninitialized in this function [-Wmaybe-uninitialized]
   _raw_spin_unlock_irqrestore(lock, flags); \
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/infiniband/sw/siw/siw_qp_rx.c:335:16: note: flags was declared here
  unsigned long flags;

Link: https://lore.kernel.org/r/20200323184627.ZWPg91uin%akpm@linux-foundation.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-23 22:22:37 -03:00
Mike Marciniszyn
9a293d1e21 IB/hfi1: Ensure pq is not left on waitlist
The following warning can occur when a pq is left on the dmawait list and
the pq is then freed:

  WARNING: CPU: 47 PID: 3546 at lib/list_debug.c:29 __list_add+0x65/0xc0
  list_add corruption. next->prev should be prev (ffff939228da1880), but was ffff939cabb52230. (next=ffff939cabb52230).
  Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ast ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev drm_panel_orientation_quirks i2c_i801 mei_me lpc_ich mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables
  nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci libahci i2c_algo_bit dca libata ptp pps_core crc32c_intel [last unloaded: i2c_algo_bit]
  CPU: 47 PID: 3546 Comm: wrf.exe Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.41.1.el7.x86_64 #1
  Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019
  Call Trace:
  [<ffffffff91f65ac0>] dump_stack+0x19/0x1b
  [<ffffffff91898b78>] __warn+0xd8/0x100
  [<ffffffff91898bff>] warn_slowpath_fmt+0x5f/0x80
  [<ffffffff91a1dabe>] ? ___slab_alloc+0x24e/0x4f0
  [<ffffffff91b97025>] __list_add+0x65/0xc0
  [<ffffffffc03926a5>] defer_packet_queue+0x145/0x1a0 [hfi1]
  [<ffffffffc0372987>] sdma_check_progress+0x67/0xa0 [hfi1]
  [<ffffffffc03779d2>] sdma_send_txlist+0x432/0x550 [hfi1]
  [<ffffffff91a20009>] ? kmem_cache_alloc+0x179/0x1f0
  [<ffffffffc0392973>] ? user_sdma_send_pkts+0xc3/0x1990 [hfi1]
  [<ffffffffc0393e3a>] user_sdma_send_pkts+0x158a/0x1990 [hfi1]
  [<ffffffff918ab65e>] ? try_to_del_timer_sync+0x5e/0x90
  [<ffffffff91a3fe1a>] ? __check_object_size+0x1ca/0x250
  [<ffffffffc0395546>] hfi1_user_sdma_process_request+0xd66/0x1280 [hfi1]
  [<ffffffffc034e0da>] hfi1_aio_write+0xca/0x120 [hfi1]
  [<ffffffff91a4245b>] do_sync_readv_writev+0x7b/0xd0
  [<ffffffff91a4409e>] do_readv_writev+0xce/0x260
  [<ffffffff918df69f>] ? pick_next_task_fair+0x5f/0x1b0
  [<ffffffff918db535>] ? sched_clock_cpu+0x85/0xc0
  [<ffffffff91f6b16a>] ? __schedule+0x13a/0x860
  [<ffffffff91a442c5>] vfs_writev+0x35/0x60
  [<ffffffff91a4447f>] SyS_writev+0x7f/0x110
  [<ffffffff91f78ddb>] system_call_fastpath+0x22/0x27

The issue happens when wait_event_interruptible_timeout() returns a value
<= 0.

In that case, the pq is left on the list. The code continues sending
packets and potentially can complete the current request with the pq still
on the dmawait list provided no descriptor shortage is seen.

If the pq is torn down in that state, the sdma interrupt handler could
find the now freed pq on the list with list corruption or memory
corruption resulting.

Fix by adding a flush routine to ensure that the pq is never on a list
after processing a request.

A follow-up patch series will address issues with seqlock surfaced in:
https://lore.kernel.org/r/20200320003129.GP20941@ziepe.ca

The seqlock use for sdma will then be converted to a spin lock since the
list_empty() doesn't need the protection afforded by the sequence lock
currently in use.

Fixes: a0d406934a ("staging/rdma/hfi1: Add page lock limit check for SDMA requests")
Link: https://lore.kernel.org/r/20200320200200.23203.37777.stgit@awfm-01.aw.intel.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-23 21:57:57 -03:00
Leon Romanovsky
fa8a44f6b2 RDMA/efa: Use in-kernel offsetofend() to check field availability
Remove custom and duplicated variant of offsetofend().

Link: https://lore.kernel.org/r/20200310091438.248429-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-18 21:06:37 -03:00
Kaike Wan
5ab17a24cb IB/hfi1: Remove kobj from hfi1_devdata
The field kobj was added to hfi1_devdata structure to manage the life time
of the hfi1_devdata structure for PSM accesses:

commit e11ffbd575 ("IB/hfi1: Do not free hfi1 cdev parent structure early")

Later another mechanism user_refcount/user_comp was introduced to provide
the same functionality:

commit acd7c8fe14 ("IB/hfi1: Fix an Oops on pci device force remove")

This patch will remove this kobj field, as it is no longer needed.

Link: https://lore.kernel.org/r/20200316210500.7753.4145.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-18 19:53:47 -03:00
Mike Marciniszyn
d61ba1b9ae IB/rdmavt: Delete unused routine
This routine was obsoleted by the patch below.

Delete it.

Fixes: a2a074ef39 ("RDMA: Handle ucontext allocations by IB/core")
Link: https://lore.kernel.org/r/20200316210454.7753.94689.stgit@awfm-01.aw.intel.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-18 19:53:47 -03:00
Lang Cheng
026ded3734 RDMA/hns: Check if depth of qp is 0 before configure
Depth of qp shouldn't be allowed to be set to zero, after ensuring that,
subsequent process can be simplified. And when qp is changed from reset to
reset, the capability of minimum qp depth was used to identify hardware of
hip06, it should be changed into a more readable form.

Link: https://lore.kernel.org/r/1584006624-11846-1-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-18 19:30:36 -03:00
Sindhu, Devale
4b34e23f4e i40iw: Report correct firmware version
The driver uses a hard-coded value for FW version and reports an
inconsistent FW version between ibv_devinfo and
/sys/class/infiniband/i40iw/fw_ver.

Retrieve the FW version via a Control QP (CQP) operation and report it
consistently across sysfs and query device.

Fixes: d374984179 ("i40iw: add files for iwarp interface")
Link: https://lore.kernel.org/r/20200313214406.2159-1-shiraz.saleem@intel.com
Reported-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Sindhu, Devale <sindhu.devale@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-18 13:53:44 -03:00
Xi Wang
d6a3627e31 RDMA/hns: Optimize wqe buffer set flow for post send
Splits hns_roce_v2_post_send() into three sub-functions: set_rc_wqe(),
set_ud_wqe() and update_sq_db() to simplify the code.

Link: https://lore.kernel.org/r/1583839084-31579-6-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-18 10:23:12 -03:00
Xi Wang
1133401412 RDMA/hns: Optimize base address table config flow for qp buffer
Currently, before the qp is created, a page size needs to be calculated
for the base address table to store all base addresses in the mtr. As a
result, the parameter configuration of the mtr is complex. So integrate
the process of calculating the base table page size into the hem related
interface to simplify the process of using mtr.

Link: https://lore.kernel.org/r/1583839084-31579-5-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-18 10:23:12 -03:00
Xi Wang
e363f7de4e RDMA/hns: Optimize the wr opcode conversion from ib to hns
Simplify the wr opcode conversion from ib to hns by using a map table
instead of the switch-case statement.

Link: https://lore.kernel.org/r/1583839084-31579-4-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-18 10:23:12 -03:00
Xi Wang
00a59d30f3 RDMA/hns: Optimize wqe buffer filling process for post send
Encapsulates the wqe buffer process details for datagram seg, fast mr seg
and atomic seg.

Link: https://lore.kernel.org/r/1583839084-31579-3-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-18 10:23:12 -03:00
Xi Wang
6c6e39212b RDMA/hns: Rename wqe buffer related functions
There are serval global functions related to wqe buffer in the hns driver
and are called in different files. These symbols cannot directly represent
the namespace they belong to. So add prefix 'hns_roce_' to 3 wqe buffer
related global functions: get_recv_wqe(), get_send_wqe(), and
get_send_extend_sge().

Link: https://lore.kernel.org/r/1583839084-31579-2-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-18 10:23:11 -03:00
Selvin Xavier
4e88cef11d RDMA/bnxt_re: Remove unnecessary sched count
Since the lifetime of bnxt_re_task is controlled by the kref of device,
sched_count is no longer required.  Remove it.

Link: https://lore.kernel.org/r/1584117207-2664-4-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-17 20:15:03 -03:00
Jason Gunthorpe
8a6c617047 RDMA/bnxt_re: Fix lifetimes in bnxt_re_task
A work queue cannot just rely on the ib_device not being freed, it must
hold a kref on the memory so that the BNXT_RE_FLAG_IBDEV_REGISTERED check
works.

Fixes: 1ac5a40479 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Link: https://lore.kernel.org/r/1584117207-2664-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-17 20:15:03 -03:00
Jason Gunthorpe
3cae58047c RDMA/bnxt_re: Use ib_device_try_get()
There are a couple places in this driver running from a work queue that
need the ib_device to be registered. Instead of using a broken internal
bit rely on the new core code to guarantee device registration.

Link: https://lore.kernel.org/r/1584117207-2664-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-17 20:15:03 -03:00
Jason Gunthorpe
67b3c8dcea RDMA/cm: Make sure the cm_id is in the IB_CM_IDLE state in destroy
The first switch statement in cm_destroy_id() tries to move the ID to
either IB_CM_IDLE or IB_CM_TIMEWAIT. Both states will block concurrent
MAD handlers from progressing.

Previous patches removed the unreliably lock/unlock sequences in this
flow, this patch removes the extra locking steps and adds the missing
parts to guarantee that destroy reaches IB_CM_IDLE. There is no point in
leaving the ID in the IB_CM_TIMEWAIT state the memory about to be kfreed.

Rework things to hold the lock across all the state transitions and
directly assert when done that it ended up in IB_CM_IDLE as expected.

This was accompanied by a careful audit of all the state transitions here,
which generally did end up in IDLE on their success and non-racy paths.

Link: https://lore.kernel.org/r/20200310092545.251365-16-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-17 17:05:54 -03:00
Jason Gunthorpe
6a8824a74b RDMA/cm: Allow ib_send_cm_sidr_rep() to be done under lock
The first thing ib_send_cm_sidr_rep() does is obtain the lock, so use the
usual unlocked wrapper, locked actor pattern here.

Get rid of the cm_reject_sidr_req() wrapper so each call site can call the
locked or unlocked version as required.

This avoids a sketchy lock/unlock sequence (which could allow state to
change) during cm_destroy_id().

Link: https://lore.kernel.org/r/20200310092545.251365-15-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-17 17:05:54 -03:00
Jason Gunthorpe
81ddb41f87 RDMA/cm: Allow ib_send_cm_rej() to be done under lock
The first thing ib_send_cm_rej() does is obtain the lock, so use the usual
unlocked wrapper, locked actor pattern here.

This avoids a sketchy lock/unlock sequence (which could allow state to
change) during cm_destroy_id().

While here simplify some of the logic in the implementation.

Link: https://lore.kernel.org/r/20200310092545.251365-14-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-17 17:05:54 -03:00
Jason Gunthorpe
87cabf3e09 RDMA/cm: Allow ib_send_cm_drep() to be done under lock
The first thing ib_send_cm_drep() does is obtain the lock, so use the
usual unlocked wrapper, locked actor pattern here.

This avoids a sketchy lock/unlock sequence (which could allow state to
change) during cm_destroy_id().

Link: https://lore.kernel.org/r/20200310092545.251365-13-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-17 17:05:53 -03:00
Jason Gunthorpe
e029fdc068 RDMA/cm: Allow ib_send_cm_dreq() to be done under lock
The first thing ib_send_cm_dreq() does is obtain the lock, so use the
usual unlocked wrapper, locked actor pattern here.

This avoids a sketchy lock/unlock sequence (which could allow state to
change) during cm_destroy_id().

Link: https://lore.kernel.org/r/20200310092545.251365-12-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-17 17:05:53 -03:00
Jason Gunthorpe
00777a68ae RDMA/cm: Add some lockdep assertions for cm_id_priv->lock
These functions all touch state, so must be called under the lock.
Inspection shows this is currently true.

Link: https://lore.kernel.org/r/20200310092545.251365-11-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-17 17:05:53 -03:00
Jason Gunthorpe
d1de9a8807 RDMA/cm: Add missing locking around id.state in cm_dup_req_handler
All accesses to id.state must be done under the spinlock.

Fixes: a977049dac ("[PATCH] IB: Add the kernel CM implementation")
Link: https://lore.kernel.org/r/20200310092545.251365-10-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-17 17:05:53 -03:00
Jason Gunthorpe
c206f8bad1 RDMA/cm: Make it clearer how concurrency works in cm_req_handler()
ib_crate_cm_id() immediately places the id in the xarray, and publishes it
into the remote_id and remote_qpn rbtrees. This makes it visible to other
threads before it is fully set up.

It appears the thinking here was that the states IB_CM_IDLE and
IB_CM_REQ_RCVD do not allow any MAD handler or lookup in the remote_id and
remote_qpn rbtrees to advance.

However, cm_rej_handler() does take an action on IB_CM_REQ_RCVD, which is
not really expected by the design.

Make the whole thing clearer:
 - Keep the new cm_id out of the xarray until it is completely set up.
   This directly prevents MAD handlers and all rbtree lookups from seeing
   the pointer.
 - Move all the trivial setup right to the top so it is obviously done
   before any concurrency begins
 - Move the mutation of the cm_id_priv out of cm_match_id() and into the
   caller so the state transition is obvious
 - Place the manipulation of the work_list at the end, under lock, after
   the cm_id is placed in the xarray. The work_count cannot change on an
   ID outside the xarray.
 - Add some comments

Link: https://lore.kernel.org/r/20200310092545.251365-9-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-17 17:05:52 -03:00
Jason Gunthorpe
083bfdbfd5 RDMA/cm: Make it clear that there is no concurrency in cm_sidr_req_handler()
ib_create_cm_id() immediately places the id in the xarray, so it is visible
to network traffic.

The state is initially set to IB_CM_IDLE and all the MAD handlers will
test this state under lock and refuse to advance from IDLE, so adding to
the xarray is harmless.

Further, the set to IB_CM_SIDR_REQ_RCVD also excludes all MAD handlers.

However, the local_id isn't even used for SIDR mode, and there will be no
input MADs related to the newly created ID.

So, make the whole flow simpler so it can be understood:
 - Do not put the SIDR cm_id in the xarray. This directly shows that there
   is no concurrency
 - Delete the confusing work_count and pending_list manipulations. This
   mechanism is only used by MAD handlers and timewait, neither of which
   apply to SIDR.
 - Add a few comments and rename 'cur_cm_id_priv' to 'listen_cm_id_priv'
 - Move other loose sets up to immediately after cm_id creation so that
   the cm_id is fully configured right away. This fixes an oversight where
   the service_id will not be returned back on a IB_SIDR_UNSUPPORTED
   reject.

Link: https://lore.kernel.org/r/20200310092545.251365-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-17 17:05:52 -03:00
Jason Gunthorpe
153a2e432e RDMA/cm: Read id.state under lock when doing pr_debug()
The lock should not be dropped before doing the pr_debug() print as it is
accessing data protected by the lock, such as id.state.

Fixes: 119bf81793 ("IB/cm: Add debug prints to ib_cm")
Link: https://lore.kernel.org/r/20200310092545.251365-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-17 17:05:52 -03:00
Jason Gunthorpe
98f67156a8 RDMA/cm: Simplify establishing a listen cm_id
Any manipulation of cm_id->state must be done under the cm_id_priv->lock,
the two routines that added listens did not follow this rule, because they
never participate in any concurrent access around the state.

However, since this exception makes the code hard to understand, simplify
the flow so that it can be fully locked:
 - Move manipulation of listen_sharecount into cm_insert_listen() so it is
   trivially under the cm.lock without having to expose the cm.lock to the
   caller.
 - Push the cm.lock down into cm_insert_listen() and have the function
   increment the reference count before returning an existing pointer.
 - Split ib_cm_listen() into an cm_init_listen() and do not call
   ib_cm_listen() from ib_cm_insert_listen()
 - Make both ib_cm_listen() and ib_cm_insert_listen() directly call
   cm_insert_listen() under their cm_id_priv->lock which does both a
   collision detect and, if needed, the insert (atomically)
 - Enclose all state manipulation within the cm_id_priv->lock, notice this
   set can be done safely after cm_insert_listen() as no reader is allowed
   to read the state without holding the lock.
 - Do not set the listen cm_id in the xarray, as it is never correct to
   look it up. This makes the concurrency simpler to understand.

Many needless error unwinds are removed in the process.

Link: https://lore.kernel.org/r/20200310092545.251365-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-17 17:05:52 -03:00
Jason Gunthorpe
2305d6864a RDMA/cm: Make the destroy_id flow more robust
Too much of the destruction is very carefully sensitive to the state
and various other things. Move more code to the unconditional path and
add several WARN_ONs to check consistency.

Link: https://lore.kernel.org/r/20200310092545.251365-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-17 17:05:52 -03:00
Jason Gunthorpe
bede86a39d RDMA/cm: Remove a race freeing timewait_info
When creating a cm_id during REQ the id immediately becomes visible to the
other MAD handlers, and shortly after the state is moved to IB_CM_REQ_RCVD

This allows cm_rej_handler() to run concurrently and free the work:

        CPU 0                                CPU1
 cm_req_handler()
  ib_create_cm_id()
  cm_match_req()
    id_priv->state = IB_CM_REQ_RCVD
                                       cm_rej_handler()
                                         cm_acquire_id()
                                         spin_lock(&id_priv->lock)
                                         switch (id_priv->state)
  					   case IB_CM_REQ_RCVD:
                                            cm_reset_to_idle()
                                             kfree(id_priv->timewait_info);
   goto destroy
  destroy:
    kfree(id_priv->timewait_info);
                                             id_priv->timewait_info = NULL

Causing a double free or worse.

Do not free the timewait_info without also holding the
id_priv->lock. Simplify this entire flow by making the free unconditional
during cm_destroy_id() and removing the confusing special case error
unwind during creation of the timewait_info.

This also fixes a leak of the timewait if cm_destroy_id() is called in
IB_CM_ESTABLISHED with an XRC TGT QP. The state machine will be left in
ESTABLISHED while it needed to transition through IB_CM_TIMEWAIT to
release the timewait pointer.

Also fix a leak of the timewait_info if the caller mis-uses the API and
does ib_send_cm_reqs().

Fixes: a977049dac ("[PATCH] IB: Add the kernel CM implementation")
Link: https://lore.kernel.org/r/20200310092545.251365-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-17 17:05:51 -03:00
Jason Gunthorpe
ca21cb7fb1 RDMA/cm: Fix checking for allowed duplicate listens
The test here typod the cm_id_priv to use, it used the one that was
freshly allocated. By definition the allocated one has the matching
cm_handler and zero context, so the condition was always true.

Instead check that the existing listening ID is compatible with the
proposed handler so that it can be shared, as was originally intended.

Fixes: 067b171b86 ("IB/cm: Share listening CM IDs")
Link: https://lore.kernel.org/r/20200310092545.251365-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-17 17:05:51 -03:00
Jason Gunthorpe
e8dc4e885c RDMA/cm: Fix ordering of xa_alloc_cyclic() in ib_create_cm_id()
xa_alloc_cyclic() is a SMP release to be paired with some later acquire
during xa_load() as part of cm_acquire_id().

As such, xa_alloc_cyclic() must be done after the cm_id is fully
initialized, in particular, it absolutely must be after the
refcount_set(), otherwise the refcount_inc() in cm_acquire_id() may not
see the set.

As there are several cases where a reader will be able to use the
id.local_id after cm_acquire_id in the IB_CM_IDLE state there needs to be
an unfortunate split into a NULL allocate and a finalizing xa_store.

Fixes: a977049dac ("[PATCH] IB: Add the kernel CM implementation")
Link: https://lore.kernel.org/r/20200310092545.251365-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-17 17:05:51 -03:00
Weihang Li
9e57a9aa69 RDMA/hns: Fix wrong judgments of udata->outlen
These judgments were used to keep the compatibility with older versions of
userspace that don't have the field named "cap_flags" in structure
hns_roce_ib_create_cq_resp. But it will be wrong to compare outlen with
the size of resp if another new field were added in resp. oulen should be
compared with the end offset of cap_flags in resp.

Fixes: 4f8f0d5e33 ("RDMA/hns: Package the flow of creating cq")
Link: https://lore.kernel.org/r/1583845569-47257-1-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 11:36:58 -03:00
Kaike Wan
941224e094 IB/rdmavt: Free kernel completion queue when done
When a kernel ULP requests the rdmavt to create a completion queue, it
allocated the queue and set cq->kqueue to point to it. However, when the
completion queue is destroyed, cq->queue is freed instead, leading to a
memory leak:

https://lore.kernel.org/r/215235485.15264050.1583334487658.JavaMail.zimbra@redhat.com

 unreferenced object 0xffffc90006639000 (size 12288):
 comm "kworker/u128:0", pid 8, jiffies 4295777598 (age 589.085s)
    hex dump (first 32 bytes):
      4d 00 00 00 4d 00 00 00 00 c0 08 ac 8b 88 ff ff  M...M...........
      00 00 00 00 80 00 00 00 00 00 00 00 10 00 00 00  ................
    backtrace:
      [<0000000035a3d625>] __vmalloc_node_range+0x361/0x720
      [<000000002942ce4f>] __vmalloc_node.constprop.30+0x63/0xb0
      [<00000000f228f784>] rvt_create_cq+0x98a/0xd80 [rdmavt]
      [<00000000b84aec66>] __ib_alloc_cq_user+0x281/0x1260 [ib_core]
      [<00000000ef3764be>] nvme_rdma_cm_handler+0xdb7/0x1b80 [nvme_rdma]
      [<00000000936b401c>] cma_cm_event_handler+0xb7/0x550 [rdma_cm]
      [<00000000d9c40b7b>] addr_handler+0x195/0x310 [rdma_cm]
      [<00000000c7398a03>] process_one_req+0xdd/0x600 [ib_core]
      [<000000004d29675b>] process_one_work+0x920/0x1740
      [<00000000efedcdb5>] worker_thread+0x87/0xb40
      [<000000005688b340>] kthread+0x327/0x3f0
      [<0000000043a168d6>] ret_from_fork+0x3a/0x50

This patch fixes the issue by freeing cq->kqueue instead.

Fixes: 239b0e52d8 ("IB/hfi1: Move rvt_cq_wc struct into uapi directory")
Link: https://lore.kernel.org/r/20200313123957.14343.43879.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org> # 5.4.x
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 11:29:59 -03:00
Jason Gunthorpe
d613bd64c6 Merge branch 'mlx5_mr_cache' into rdma.git for-next
Leon Romanovsky says:

====================
This series fixes various corner cases in the mlx5_ib MR cache
implementation, see specific commit messages for more information.
====================

Based on the mlx5-next branch at
 git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Due to dependencies

* branch 'mlx5_mr-cache':
  RDMA/mlx5: Allow MRs to be created in the cache synchronously
  RDMA/mlx5: Revise how the hysteresis scheme works for cache filling
  RDMA/mlx5: Fix locking in MR cache work queue
  RDMA/mlx5: Lock access to ent->available_mrs/limit when doing queue_work
  RDMA/mlx5: Fix MR cache size and limit debugfs
  RDMA/mlx5: Always remove MRs from the cache before destroying them
  RDMA/mlx5: Simplify how the MR cache bucket is located
  RDMA/mlx5: Rename the tracking variables for the MR cache
  RDMA/mlx5: Replace spinlock protected write with atomic var
  {IB,net}/mlx5: Move asynchronous mkey creation to mlx5_ib
  {IB,net}/mlx5: Assign mkey variant in mlx5_ib only
  {IB,net}/mlx5: Setup mkey variant before mr create command invocation
2020-03-13 11:11:07 -03:00
Jason Gunthorpe
aad719dcf3 RDMA/mlx5: Allow MRs to be created in the cache synchronously
If the cache is completely out of MRs, and we are running in cache mode,
then directly, and synchronously, create an MR that is compatible with the
cache bucket using a sleeping mailbox command. This ensures that the
thread that is waiting for the MR absolutely will get one.

When a MR allocated in this way becomes freed then it is compatible with
the cache bucket and will be recycled back into it.

Deletes the very buggy ent->compl scheme to create a synchronous MR
allocation.

Link: https://lore.kernel.org/r/20200310082238.239865-13-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 11:08:02 -03:00
Jason Gunthorpe
1c78a21a0c RDMA/mlx5: Revise how the hysteresis scheme works for cache filling
Currently if the work queue is running then it is in 'hysteresis' mode and
will fill until the cache reaches the high water mark. This implicit state
is very tricky and doesn't interact with pending very well.

Instead of self re-scheduling the work queue after the add_keys() has
started to create the new MR, have the queue scheduled from
reg_mr_callback() only after the requested MR has been added.

This avoids the bad design of an in-rush of queue'd work doing back to
back add_keys() until EAGAIN then sleeping. The add_keys() will be paced
one at a time as they complete, slowly filling up the cache.

Also, fix pending to be only manipulated under lock.

Link: https://lore.kernel.org/r/20200310082238.239865-12-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 11:08:02 -03:00
Jason Gunthorpe
b9358bdbc7 RDMA/mlx5: Fix locking in MR cache work queue
All of the members of mlx5_cache_ent must be accessed while holding the
spinlock, add the missing spinlock in the __cache_work_func().

Using cache->stopped and flush_workqueue() is an inherently racy way to
shutdown self-scheduling work on a queue. Replace it with ent->disabled
under lock, and always check disabled before queuing any new work. Use
cancel_work_sync() to shutdown the queue.

Use READ_ONCE/WRITE_ONCE for dev->last_add to manage concurrency as
coherency is less important here.

Split fill_delay from the bitfield. C bitfield updates are not atomic and
this is just a mess. Use READ_ONCE/WRITE_ONCE, but this could also use
test_bit()/set_bit().

Link: https://lore.kernel.org/r/20200310082238.239865-11-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 11:08:02 -03:00
Jason Gunthorpe
ad2d3ef46d RDMA/mlx5: Lock access to ent->available_mrs/limit when doing queue_work
Accesses to these members needs to be locked. There is no reason not to
hold a spinlock while calling queue_work(), so move the tests into a
helper and always call it under lock.

The helper should be called when available_mrs is adjusted.

Link: https://lore.kernel.org/r/20200310082238.239865-10-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 11:08:01 -03:00
Jason Gunthorpe
a1d8854aae RDMA/mlx5: Fix MR cache size and limit debugfs
The size_write function is supposed to adjust the total_mr's to match the
user's request, but lacks locking and safety checking.

total_mrs can only be adjusted by at most available_mrs. mrs already
assigned to users cannot be revoked. Ensure that the user provides a
target value within the range of available_mrs and within the high/low
water mark.

limit_write has confusing and wrong sanity checking, and doesn't have the
ability to deallocate on limit reduction.

Since both functions use the same algorithm to adjust the available_mrs,
consolidate it into one function and write it correctly. Fix the locking
and by holding the spinlock for all accesses to ent->X.

Always fail if the user provides a malformed string.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/20200310082238.239865-9-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 11:08:01 -03:00
Jason Gunthorpe
1769c4c575 RDMA/mlx5: Always remove MRs from the cache before destroying them
The cache bucket tracks the total number of MRs that exists, both inside
and outside of the cache. Removing a MR from the cache (by setting
cache_ent to NULL) without updating total_mrs will cause the tracking to
leak and be inflated.

Further fix the rereg_mr path to always destroy the MR. reg_create will
always overwrite all the MR data in mlx5_ib_mr, so the MR must be
completely destroyed, in all cases, before this function can be
called. Detach the MR from the cache and unconditionally destroy it to
avoid leaking HW mkeys.

Fixes: afd1417404 ("IB/mlx5: Use direct mkey destroy command upon UMR unreg failure")
Fixes: 56e11d628c ("IB/mlx5: Added support for re-registration of MRs")
Link: https://lore.kernel.org/r/20200310082238.239865-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 11:08:01 -03:00
Jason Gunthorpe
b91e1751fb RDMA/mlx5: Simplify how the MR cache bucket is located
There are many bad APIs here that are accepting a cache bucket index
instead of a bucket pointer. Many of the callers already have a bucket
pointer, so this results in a lot of confusing uses of order2idx().

Pass the struct mlx5_cache_ent into add_keys(), remove_keys(), and
alloc_cached_mr().

Once the MR is in the cache, store the cache bucket pointer directly in
the MR, replacing the 'bool allocated_from cache'.

In the end there is only one place that needs to form index from order,
alloc_mr_from_cache(). Increase the safety of this function by disallowing
it from accessing cache entries in the ODP special area.

Link: https://lore.kernel.org/r/20200310082238.239865-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 11:08:01 -03:00
Jason Gunthorpe
7c8691a396 RDMA/mlx5: Rename the tracking variables for the MR cache
The old names do not clearly indicate the intent.

Link: https://lore.kernel.org/r/20200310082238.239865-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 11:08:01 -03:00
Saeed Mahameed
f743ff3b37 RDMA/mlx5: Replace spinlock protected write with atomic var
mkey variant calculation was spinlock protected to make it atomic, replace
that with one atomic variable.

Link: https://lore.kernel.org/r/20200310082238.239865-4-leon@kernel.org
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 11:08:00 -03:00
Michael Guralnik
a3cfdd3928 {IB,net}/mlx5: Move asynchronous mkey creation to mlx5_ib
As mlx5_ib is the only user of the mlx5_core_create_mkey_cb, move the
logic inside mlx5_ib and cleanup the code in mlx5_core.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-03-13 15:48:10 +02:00
Saeed Mahameed
fc6a9f86f0 {IB,net}/mlx5: Assign mkey variant in mlx5_ib only
mkey variant is not required for mlx5_core use, move the mkey variant
counter to mlx5_ib.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-03-13 15:48:04 +02:00
Saeed Mahameed
54c62e13ad {IB,net}/mlx5: Setup mkey variant before mr create command invocation
On reg_mr_callback() mlx5_ib is recalculating the mkey variant which is
wrong and will lead to using a different key variant than the one
submitted to firmware on create mkey command invocation.

To fix this, we store the mkey variant before invoking the firmware
command and use it later on completion (reg_mr_callback).

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-03-13 15:48:00 +02:00
Leon Romanovsky
a4f994a059 RDMA/cm: Delete not implemented CM peer to peer communication
Peer to peer support was never implemented, so delete it to make code less
clutter.

Link: https://lore.kernel.org/r/20200310091438.248429-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 10:46:53 -03:00
Leon Romanovsky
a762d460a0 RDMA/mlx5: Use offsetofend() instead of duplicated variant
Convert mlx5 driver to use offsetofend() instead of its duplicated
variant.

Link: https://lore.kernel.org/r/20200310091438.248429-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 10:45:12 -03:00
Leon Romanovsky
282e79c1c6 RDMA/mlx4: Delete duplicated offsetofend implementation
Convert mlx4 to use in-kernel offsetofend() instead
of its duplicated implementation.

Link: https://lore.kernel.org/r/20200310091438.248429-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 10:42:48 -03:00
Jason Gunthorpe
5bdfa85401 RDMA/mad: Do not crash if the rdma device does not have a umad interface
Non-IB devices do not have a umad interface and the client_data will be
left set to NULL. In this case calling get_nl_info() will try to kref a
NULL cdev causing a crash:

  general protection fault, probably for non-canonical address 0xdffffc00000000ba: 0000 [#1] PREEMPT SMP KASAN
  KASAN: null-ptr-deref in range [0x00000000000005d0-0x00000000000005d7]
  CPU: 0 PID: 20851 Comm: syz-executor.0 Not tainted 5.6.0-rc2-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:kobject_get+0x35/0x150 lib/kobject.c:640
  Code: 53 e8 3f b0 8b f9 4d 85 e4 0f 84 a2 00 00 00 e8 31 b0 8b f9 49 8d 7c 24 3c 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f  b6 04 02 48 89 fa
+83 e2 07 38 d0 7f 08 84 c0 0f 85 eb 00 00 00
  RSP: 0018:ffffc9000946f1a0 EFLAGS: 00010203
  RAX: dffffc0000000000 RBX: ffffffff85bdbbb0 RCX: ffffc9000bf22000
  RDX: 00000000000000ba RSI: ffffffff87e9d78f RDI: 00000000000005d4
  RBP: ffffc9000946f1b8 R08: ffff8880581a6440 R09: ffff8880581a6cd0
  R10: fffffbfff154b838 R11: ffffffff8aa5c1c7 R12: 0000000000000598
  R13: 0000000000000000 R14: ffffc9000946f278 R15: ffff88805cb0c4d0
  FS:  00007faa9e8af700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000001b30121000 CR3: 000000004515d000 CR4: 00000000001406f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   get_device+0x25/0x40 drivers/base/core.c:2574
   __ib_get_client_nl_info+0x205/0x2e0 drivers/infiniband/core/device.c:1861
   ib_get_client_nl_info+0x35/0x180 drivers/infiniband/core/device.c:1881
   nldev_get_chardev+0x575/0xac0 drivers/infiniband/core/nldev.c:1621
   rdma_nl_rcv_msg drivers/infiniband/core/netlink.c:195 [inline]
   rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
   rdma_nl_rcv+0x5d9/0x980 drivers/infiniband/core/netlink.c:259
   netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
   netlink_unicast+0x59e/0x7e0 net/netlink/af_netlink.c:1329
   netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1918
   sock_sendmsg_nosec net/socket.c:652 [inline]
   sock_sendmsg+0xd7/0x130 net/socket.c:672
   ____sys_sendmsg+0x753/0x880 net/socket.c:2343
   ___sys_sendmsg+0x100/0x170 net/socket.c:2397
   __sys_sendmsg+0x105/0x1d0 net/socket.c:2430
   __do_sys_sendmsg net/socket.c:2439 [inline]
   __se_sys_sendmsg net/socket.c:2437 [inline]
   __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2437
   do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

Cc: stable@kernel.org
Fixes: 8f71bb0030 ("RDMA: Report available cdevs through RDMA_NLDEV_CMD_GET_CHARDEV")
Link: https://lore.kernel.org/r/20200310075339.238090-1-leon@kernel.org
Reported-by: syzbot+46fe08363dbba223dec5@syzkaller.appspotmail.com
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 10:32:02 -03:00
Jason Gunthorpe
f2f2b3bbf0 RDMA/core: Fix missing error check on dev_set_name()
If name memory allocation fails the name will be left empty and
device_add_one() will crash:

  kobject: (0000000004952746): attempted to be registered with empty name!
  WARNING: CPU: 0 PID: 329 at lib/kobject.c:234 kobject_add_internal+0x7ac/0x9a0 lib/kobject.c:234
  Kernel panic - not syncing: panic_on_warn set ...
  CPU: 0 PID: 329 Comm: syz-executor.5 Not tainted 5.6.0-rc2-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Call Trace:
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0x197/0x210 lib/dump_stack.c:118
   panic+0x2e3/0x75c kernel/panic.c:221
   __warn.cold+0x2f/0x3e kernel/panic.c:582
   report_bug+0x289/0x300 lib/bug.c:195
   fixup_bug arch/x86/kernel/traps.c:174 [inline]
   fixup_bug arch/x86/kernel/traps.c:169 [inline]
   do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:267
   do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:286
   invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
  RIP: 0010:kobject_add_internal+0x7ac/0x9a0 lib/kobject.c:234
  Code: 1a 98 ca f9 e9 f0 f8 ff ff 4c 89 f7 e8 6d 98 ca f9 e9 95 f9 ff ff e8 c3 f0 8b f9 4c 89 e6 48 c7 c7 a0 0e 1a 89 e8 e3 41 5c f9 <0f> 0b 41 bd ea ff ff ff e9 52 ff ff ff e8 a2 f0 8b f9 0f 0b e8 9b
  RSP: 0018:ffffc90005b27908 EFLAGS: 00010286
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
  RDX: 0000000000040000 RSI: ffffffff815eae46 RDI: fffff52000b64f13
  RBP: ffffc90005b27960 R08: ffff88805aeba480 R09: ffffed1015d06659
  R10: ffffed1015d06658 R11: ffff8880ae8332c7 R12: ffff8880a37fd000
  R13: 0000000000000000 R14: ffff888096691780 R15: 0000000000000001
   kobject_add_varg lib/kobject.c:390 [inline]
   kobject_add+0x150/0x1c0 lib/kobject.c:442
   device_add+0x3be/0x1d00 drivers/base/core.c:2412
   add_one_compat_dev drivers/infiniband/core/device.c:901 [inline]
   add_one_compat_dev+0x46a/0x7e0 drivers/infiniband/core/device.c:857
   rdma_dev_init_net+0x2eb/0x490 drivers/infiniband/core/device.c:1120
   ops_init+0xb3/0x420 net/core/net_namespace.c:137
   setup_net+0x2d5/0x8b0 net/core/net_namespace.c:327
   copy_net_ns+0x29e/0x5a0 net/core/net_namespace.c:468
   create_new_namespaces+0x403/0xb50 kernel/nsproxy.c:108
   unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:229
   ksys_unshare+0x444/0x980 kernel/fork.c:2955
   __do_sys_unshare kernel/fork.c:3023 [inline]
   __se_sys_unshare kernel/fork.c:3021 [inline]
   __x64_sys_unshare+0x31/0x40 kernel/fork.c:3021
   do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

Link: https://lore.kernel.org/r/20200309193200.GA10633@ziepe.ca
Cc: stable@kernel.org
Fixes: 4e0f7b9070 ("RDMA/core: Implement compat device/sysfs tree in net namespace")
Reported-by: syzbot+ab4dae63f7d310641ded@syzkaller.appspotmail.com
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 10:30:11 -03:00
Jason Gunthorpe
7aefa6237c RDMA/nl: Do not permit empty devices names during RDMA_NLDEV_CMD_NEWLINK/SET
Empty device names cannot be added to sysfs and crash with:

  kobject: (00000000f9de3792): attempted to be registered with empty name!
  WARNING: CPU: 1 PID: 10856 at lib/kobject.c:234 kobject_add_internal+0x7ac/0x9a0 lib/kobject.c:234
  Kernel panic - not syncing: panic_on_warn set ...
  CPU: 1 PID: 10856 Comm: syz-executor459 Not tainted 5.6.0-rc3-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Call Trace:
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0x197/0x210 lib/dump_stack.c:118
   panic+0x2e3/0x75c kernel/panic.c:221
   __warn.cold+0x2f/0x3e kernel/panic.c:582
   report_bug+0x289/0x300 lib/bug.c:195
   fixup_bug arch/x86/kernel/traps.c:174 [inline]
   fixup_bug arch/x86/kernel/traps.c:169 [inline]
   do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:267
   do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:286
   invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
  RIP: 0010:kobject_add_internal+0x7ac/0x9a0 lib/kobject.c:234
  Code: 7a ca ca f9 e9 f0 f8 ff ff 4c 89 f7 e8 cd ca ca f9 e9 95 f9 ff ff e8 13 25 8c f9 4c 89 e6 48 c7 c7 a0 08 1a 89 e8 a3 76 5c f9 <0f> 0b 41 bd ea ff ff ff e9 52 ff ff ff e8 f2 24 8c f9 0f 0b e8 eb
  RSP: 0018:ffffc90002006eb0 EFLAGS: 00010286
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: ffffffff815eae46 RDI: fffff52000400dc8
  RBP: ffffc90002006f08 R08: ffff8880972ac500 R09: ffffed1015d26659
  R10: ffffed1015d26658 R11: ffff8880ae9332c7 R12: ffff888093034668
  R13: 0000000000000000 R14: ffff8880a69d7600 R15: 0000000000000001
   kobject_add_varg lib/kobject.c:390 [inline]
   kobject_add+0x150/0x1c0 lib/kobject.c:442
   device_add+0x3be/0x1d00 drivers/base/core.c:2412
   ib_register_device drivers/infiniband/core/device.c:1371 [inline]
   ib_register_device+0x93e/0xe40 drivers/infiniband/core/device.c:1343
   rxe_register_device+0x52e/0x655 drivers/infiniband/sw/rxe/rxe_verbs.c:1231
   rxe_add+0x122b/0x1661 drivers/infiniband/sw/rxe/rxe.c:302
   rxe_net_add+0x91/0xf0 drivers/infiniband/sw/rxe/rxe_net.c:539
   rxe_newlink+0x39/0x90 drivers/infiniband/sw/rxe/rxe.c:318
   nldev_newlink+0x28a/0x430 drivers/infiniband/core/nldev.c:1538
   rdma_nl_rcv_msg drivers/infiniband/core/netlink.c:195 [inline]
   rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
   rdma_nl_rcv+0x5d9/0x980 drivers/infiniband/core/netlink.c:259
   netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
   netlink_unicast+0x59e/0x7e0 net/netlink/af_netlink.c:1329
   netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1918
   sock_sendmsg_nosec net/socket.c:652 [inline]
   sock_sendmsg+0xd7/0x130 net/socket.c:672
   ____sys_sendmsg+0x753/0x880 net/socket.c:2343
   ___sys_sendmsg+0x100/0x170 net/socket.c:2397
   __sys_sendmsg+0x105/0x1d0 net/socket.c:2430
   __do_sys_sendmsg net/socket.c:2439 [inline]
   __se_sys_sendmsg net/socket.c:2437 [inline]
   __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2437
   do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

Prevent empty names when checking the name provided from userspace during
newlink and rename.

Fixes: 3856ec4b93 ("RDMA/core: Add RDMA_NLDEV_CMD_NEWLINK/DELLINK support")
Fixes: 05d940d3a3 ("RDMA/nldev: Allow IB device rename through RDMA netlink")
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20200309191648.GA30852@ziepe.ca
Reported-and-tested-by: syzbot+da615ac67d4dbea32cbc@syzkaller.appspotmail.com
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 10:23:38 -03:00
David S. Miller
1d34357931 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Minor overlapping changes, nothing serious.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 22:34:48 -07:00
David S. Miller
bf3347c4d1 Merge branch 'ct-offload' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux 2020-03-12 12:34:23 -07:00
Alex Vesker
41e684ef3f IB/mlx5: Replace tunnel mpls capability bits for tunnel_offloads
Until now the flex parser capability was used in ib_query_device() to
indicate tunnel_offloads_caps support for mpls_over_gre/mpls_over_udp.

Newer devices and firmware will have configurations with the flexparser
but without mpls support.

Testing for the flex parser capability was a mistake, the tunnel_stateless
capability was intended for detecting mpls and was introduced at the same
time as the flex parser capability.

Otherwise userspace will be incorrectly informed that a future device
supports MPLS when it does not.

Link: https://lore.kernel.org/r/20200305123841.196086-1-leon@kernel.org
Cc: <stable@vger.kernel.org> # 4.17
Fixes: e818e255a5 ("IB/mlx5: Expose MPLS related tunneling offloads")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-10 14:41:35 -03:00
Erez Shitrit
0897f301bc RDMA/mlx5: Remove duplicate definitions of SW_ICM macros
Those macros are already defined in include/linux/mlx5/driver.h, so delete
their duplicate variants.

Link: https://lore.kernel.org/r/20200310075706.238592-1-leon@kernel.org
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-10 14:39:09 -03:00
Mark Zhang
ec16b6bbda RDMA/mlx5: Fix the number of hwcounters of a dynamic counter
When we read the global counter and there's any dynamic counter allocated,
the value of a hwcounter is the sum of the default counter and all dynamic
counters. So the number of hwcounters of a dynamically allocated counter
must be same as of the default counter, otherwise there will be read
violations.

This fixes the KASAN slab-out-of-bounds bug:

  BUG: KASAN: slab-out-of-bounds in rdma_counter_get_hwstat_value+0x36d/0x390 [ib_core]
  Read of size 8 at addr ffff8884192a5778 by task rdma/10138

  CPU: 7 PID: 10138 Comm: rdma Not tainted 5.5.0-for-upstream-dbg-2020-02-06_18-30-19-27 #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0xb7/0x10b
   print_address_description.constprop.4+0x1e2/0x400
   ? rdma_counter_get_hwstat_value+0x36d/0x390 [ib_core]
   __kasan_report+0x15c/0x1e0
   ? mlx5_ib_query_q_counters+0x13f/0x270 [mlx5_ib]
   ? rdma_counter_get_hwstat_value+0x36d/0x390 [ib_core]
   kasan_report+0xe/0x20
   rdma_counter_get_hwstat_value+0x36d/0x390 [ib_core]
   ? rdma_counter_query_stats+0xd0/0xd0 [ib_core]
   ? memcpy+0x34/0x50
   ? nla_put+0xe2/0x170
   nldev_stat_get_doit+0x9c7/0x14f0 [ib_core]
   ...
   do_syscall_64+0x95/0x490
   entry_SYSCALL_64_after_hwframe+0x49/0xbe
  RIP: 0033:0x7fcc457fe65a
  Code: bb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 8b 05 fa f1 2b 00 45 89 c9 4c 63 d1 48 63 ff 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 f3 c3 0f 1f 40 00 41 55 41 54 4d 89 c5 55
  RSP: 002b:00007ffc0586f868 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcc457fe65a
  RDX: 0000000000000020 RSI: 00000000013db920 RDI: 0000000000000003
  RBP: 00007ffc0586fa90 R08: 00007fcc45ac10e0 R09: 000000000000000c
  R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004089c0
  R13: 0000000000000000 R14: 00007ffc0586fab0 R15: 00000000013dc9a0

  Allocated by task 9700:
   save_stack+0x19/0x80
   __kasan_kmalloc.constprop.7+0xa0/0xd0
   mlx5_ib_counter_alloc_stats+0xd1/0x1d0 [mlx5_ib]
   rdma_counter_alloc+0x16d/0x3f0 [ib_core]
   rdma_counter_bind_qpn_alloc+0x216/0x4e0 [ib_core]
   nldev_stat_set_doit+0x8c2/0xb10 [ib_core]
   rdma_nl_rcv_msg+0x3d2/0x730 [ib_core]
   rdma_nl_rcv+0x2a8/0x400 [ib_core]
   netlink_unicast+0x448/0x620
   netlink_sendmsg+0x731/0xd10
   sock_sendmsg+0xb1/0xf0
   __sys_sendto+0x25d/0x2c0
   __x64_sys_sendto+0xdd/0x1b0
   do_syscall_64+0x95/0x490
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 18d422ce8c ("IB/mlx5: Add counter_alloc_stats() and counter_update_stats() support")
Link: https://lore.kernel.org/r/20200305124052.196688-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-10 14:36:47 -03:00
Zhu Yanjun
2d870c5bd0 RDMA/core: Remove the duplicate header file
The header file rdma_core.h is duplicate, so let's remove it.

Fixes: 622db5b643 ("RDMA/core: Add trace points to follow MR allocation")
Link: https://lore.kernel.org/r/20200310091656.249696-1-leon@kernel.org
Signed-off-by: Zhu Yanjun <yanjunz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-10 14:34:54 -03:00
Christophe JAILLET
24a5b0ce71 RDMA/bnxt_re: Remove a redundant 'memset'
'wqe' is already zeroed at the top of the 'while' loop, just a few lines
below, and is not used outside of the loop.

So there is no need to zero it again, or for the variable to be declared
outside the loop.

Link: https://lore.kernel.org/r/20200308065442.5415-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-10 14:33:21 -03:00
Jason Gunthorpe
0f9826f475 RDMA/odp: Fix leaking the tgid for implicit ODP
The tgid used to be part of ib_umem_free_notifier(), when it was reworked
it got moved to release, but it should have been unconditional as all umem
alloc paths get the tgid.

As is, creating an implicit ODP will leak the tgid reference.

Link: https://lore.kernel.org/r/20200304181607.GA22412@ziepe.ca
Cc: stable@kernel.org
Fixes: f25a546e65 ("RDMA/odp: Use mmu_interval_notifier_insert()")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-10 14:29:07 -03:00
Jason Gunthorpe
32ac9e4399 RDMA/cma: Teach lockdep about the order of rtnl and lock
This lock ordering only happens when bonding is enabled and a certain
bonding related event fires. However, since it can happen this is a global
restriction on lock ordering.

Teach lockdep about the order directly and unconditionally so bugs here
are found quickly.

See https://syzkaller.appspot.com/bug?extid=55de90ab5f44172b0c90

Link: https://lore.kernel.org/r/20200227203651.GA27185@ziepe.ca
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-10 14:27:00 -03:00
Max Gurtovoy
6798241483 RDMA/rw: map P2P memory correctly for signature operations
Since RDMA rw API support operations with P2P memory sg list, make sure to
map/unmap the scatter list for signature operation correctly.

Link: https://lore.kernel.org/r/20200220100819.41860-2-maxg@mellanox.com
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-10 12:56:17 -03:00
Jason Gunthorpe
6f00a54c2c Linux 5.6-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl5lkYceHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGpHQH/RJrzcaZHo4lw88m
 Jf7vBZ9DYUlRgqE0pxTHWmodNObKRqpwOUGflUcWbb/7GD2LQUfeqhSECVQyTID9
 N9y7FcPvx321Qhc3EkZ24DBYk0+DQ0K2FVUrSa/PxO0n7czxxXWaLRDmlSULEd3R
 D4pVs3zEWOBXJHUAvUQ5R+lKfkeWKNeeepeh+rezuhpdWFBRNz4Jjr5QUJ8od5xI
 sIwobYmESJqTRVBHqW8g2T2/yIsFJ78GCXs8DZLe1wxh40UbxdYDTA0NDDTHKzK6
 lxzBgcmKzuge+1OVmzxLouNWMnPcjFlVgXWVerpSy3/SIFFkzzUWeMbqm6hKuhOn
 wAlcIgI=
 =VQUc
 -----END PGP SIGNATURE-----

Merge tag 'v5.6-rc5' into rdma.git for-next

Required due to dependencies in following patches.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-10 12:49:09 -03:00
Jason Gunthorpe
3e3cf2e82c Merge branch 'mlx5_packet_pacing' into rdma.git for-next
Yishai Hadas Says:

====================
Expose raw packet pacing APIs to be used by DEVX based applications.  The
existing code was refactored to have a single flow with the new raw APIs.
====================

Based on the mlx5-next branch at
 git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Due to dependencies

* branch 'mlx5_packet_pacing':
  IB/mlx5: Introduce UAPIs to manage packet pacing
  net/mlx5: Expose raw packet pacing APIs
2020-03-10 11:54:17 -03:00
Yishai Hadas
30f2fe40c7 IB/mlx5: Introduce UAPIs to manage packet pacing
Introduce packet pacing uobject and its alloc and destroy
methods.

This uobject holds mlx5 packet pacing context according to the device
specification and enables managing packet pacing device entries that are
needed by DEVX applications.

Link: https://lore.kernel.org/r/20200219190518.200912-3-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-10 11:53:52 -03:00
Ingo Molnar
6120681bdf Merge branch 'efi/urgent' into efi/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-08 09:57:58 +01:00
Linus Torvalds
61a09258f2 Second RDMA 5.6 pull request
- Fix busted syzkaller fix in 'get_new_pps' - this turned out to crash on
   certain HW configurations
 
 - Bug fixes for various missed things in error unwinds
 
 - Add a missing rcu_read_lock annotation in hfi/qib
 
 - Fix two ODP related regressions from the recent mmu notifier changes
 
 - Several more syzkaller bugs in siw, RDMA netlink, verbs and iwcm
 
 - Revert an old patch in CMA as it is now shown to not be allocating port
   numbers properly
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl5iWSQACgkQOG33FX4g
 mxoadw//ZkIcG25OMhgc4iqOXT+brCCYosdi1MB8ptcW/lx+t2jH8VD9cd8kOW4M
 VfFIpiuqVc6U06BpoRJkSV3Ix5Hiw0nQVD9q1mNiqSs0fyAuJG0NGtVeqWWXSFFC
 ptHzn1z5Aw9GV2necS+nJcZ3NceMW/rP255LHioqVfj7xSFJiymXfncH7YwQZOop
 S88Dr3m+DibW+ueVwvtLPvSPaWL40NGZo4sNuITrfiJuHYvstWedUMtYkGCGjrmT
 bUI7lpYgsakVTlM2LTtlAFrAoL/adkfrNbiCVLqGLpoy3DIdXVscQzt9CRnCP1iF
 t1l0jY+2YNAMMfjktLDnhUU7wfAwgw/XTNoqzlRCAAiTp7D8+eo560Txj9xyjGw+
 spxGOWuDEVWlBOFHHltRbQ13QZ06vA7yg0YqoIuEg86c+X38NoVEA3sRf59v05qM
 XqPcdIBusjRfd8kZsk07uYbp5VQsNHSfL2ZtxAFwiWFr4stjBcwqrx3sFw5610uZ
 Pt6uWN6JlGRb7A35I0ZuRwWhN1HTFkd7rIKK3d5hTWcqefH6JAkZldMsG0qt/YW2
 nRnoZhUNwtP2YI6eOTpskQCyK41tqP5tC84k1GMBuAxMYw40FFqN9/M7v0h9NWq7
 Eq8BMjbLB6DDR8cBJk7uoYfpYM6slnGLlDGfrLRR9j1oWv6iuCY=
 =SFSu
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Nothing particularly exciting, some small ODP regressions from the mmu
  notifier rework, another bunch of syzkaller fixes, and a bug fix for a
  botched syzkaller fix in the first rc pull request.

   - Fix busted syzkaller fix in 'get_new_pps' - this turned out to
     crash on certain HW configurations

   - Bug fixes for various missed things in error unwinds

   - Add a missing rcu_read_lock annotation in hfi/qib

   - Fix two ODP related regressions from the recent mmu notifier
     changes

   - Several more syzkaller bugs in siw, RDMA netlink, verbs and iwcm

   - Revert an old patch in CMA as it is now shown to not be allocating
     port numbers properly"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/iwcm: Fix iwcm work deallocation
  RDMA/siw: Fix failure handling during device creation
  RDMA/nldev: Fix crash when set a QP to a new counter but QPN is missing
  RDMA/odp: Ensure the mm is still alive before creating an implicit child
  RDMA/core: Fix protection fault in ib_mr_pool_destroy
  IB/mlx5: Fix implicit ODP race
  IB/hfi1, qib: Ensure RCU is locked when accessing list
  RDMA/core: Fix pkey and port assignment in get_new_pps
  RMDA/cm: Fix missing ib_cm_destroy_id() in ib_cm_insert_listen()
  RDMA/rw: Fix error flow during RDMA context initialization
  RDMA/core: Fix use of logical OR in get_new_pps
  Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow"
2020-03-07 19:52:55 -06:00
Jakub Kicinski
524250a324 RDMA/ipoib: reject unsupported coalescing params
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-06 22:45:55 -08:00
Colin Ian King
0aeb3622ea RDMA/hns: fix spelling mistake "attatch" -> "attach"
There is a spelling mistake in an error message. Fix it.

Link: https://lore.kernel.org/r/20200304081045.81164-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04 14:36:13 -04:00
Bernard Metzler
810dbc6908 RDMA/iwcm: Fix iwcm work deallocation
The dealloc_work_entries() function must update the work_free_list pointer
while freeing its entries, since potentially called again on same list. A
second iteration of the work list caused system crash. This happens, if
work allocation fails during cma_iw_listen() and free_cm_id() tries to
free the list again during cleanup.

Fixes: 922a8e9fb2 ("RDMA: iWARP Connection Manager.")
Link: https://lore.kernel.org/r/20200302181614.17042-1-bmt@zurich.ibm.com
Reported-by: syzbot+cb0c054eabfba4342146@syzkaller.appspotmail.com
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04 14:28:25 -04:00
Bernard Metzler
12e5eef0f4 RDMA/siw: Fix failure handling during device creation
A failing call to ib_device_set_netdev() during device creation caused
system crash due to xa_destroy of uninitialized xarray hit by device
deallocation. Fixed by moving xarray initialization before potential
device deallocation.

Fixes: bdcf26bf9b ("rdma/siw: network and RDMA core interface")
Link: https://lore.kernel.org/r/20200302155814.9896-1-bmt@zurich.ibm.com
Reported-by: syzbot+2e80962bedd9559fe0b3@syzkaller.appspotmail.com
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04 14:26:23 -04:00
Bernard Metzler
33fb27fd54 RDMA/siw: Fix passive connection establishment
Holding the rtnl_lock while iterating a devices interface address list
potentially causes deadlocks with the cma_netdev_callback. While this was
implemented to limit the scope of a wildcard listen to addresses of the
current device only, a better solution limits the scope of the socket to
the device. This completely avoiding locking, and also results in
significant code simplification.

Fixes: c421651fa2 ("RDMA/siw: Add missing rtnl_lock around access to ifa")
Link: https://lore.kernel.org/r/20200228173534.26815-1-bmt@zurich.ibm.com
Reported-by: syzbot+55de90ab5f44172b0c90@syzkaller.appspotmail.com
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04 14:22:46 -04:00
Parav Pandit
79db784e79 IB/mlx5: Fix missing congestion control debugfs on rep rdma device
Cited commit missed to include low level congestion control related
debugfs stage initialization.  This resulted in missing debugfs entries
for cc_params of a RDMA device.

Add them back.

Fixes: b5ca15ad7e ("IB/mlx5: Add proper representors support")
Link: https://lore.kernel.org/r/20200227125407.99803-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04 14:20:14 -04:00
Parav Pandit
9e3aaf6883 IB/mlx5: Add np_min_time_between_cnps and rp_max_rate debug params
Add two debugfs parameters described below.

np_min_time_between_cnps - Minimum time between sending CNPs from the
                           port.
                           Unit = microseconds.
                           Default = 0 (no min wait time; generated
                           based on incoming ECN marked packets).

rp_max_rate - Maximum rate at which reaction point node can transmit.
              Once this limit is reached, RP is no longer rate limited.
              Unit = Mbits/sec
              Default = 0 (full speed)

Link: https://lore.kernel.org/r/20200227125246.99472-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04 14:20:03 -04:00
Mark Zhang
78f34a16c2 RDMA/nldev: Fix crash when set a QP to a new counter but QPN is missing
This fixes the kernel crash when a RDMA_NLDEV_CMD_STAT_SET command is
received, but the QP number parameter is not available.

  iwpm_register_pid: Unable to send a nlmsg (client = 2)
  infiniband syz1: RDMA CMA: cma_listen_on_dev, error -98
  general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
  KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
  CPU: 0 PID: 9754 Comm: syz-executor069 Not tainted 5.6.0-rc2-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:nla_get_u32 include/net/netlink.h:1474 [inline]
  RIP: 0010:nldev_stat_set_doit+0x63c/0xb70 drivers/infiniband/core/nldev.c:1760
  Code: fc 01 0f 84 58 03 00 00 e8 41 83 bf fb 4c 8b a3 58 fd ff ff 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 6d
  RSP: 0018:ffffc900068bf350 EFLAGS: 00010247
  RAX: dffffc0000000000 RBX: ffffc900068bf728 RCX: ffffffff85b60470
  RDX: 0000000000000000 RSI: ffffffff85b6047f RDI: 0000000000000004
  RBP: ffffc900068bf750 R08: ffff88808c3ee140 R09: ffff8880a25e6010
  R10: ffffed10144bcddc R11: ffff8880a25e6ee3 R12: 0000000000000000
  R13: ffff88809acb0000 R14: ffff888092a42c80 R15: 000000009ef2e29a
  FS:  0000000001ff0880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f4733e34000 CR3: 00000000a9b27000 CR4: 00000000001406f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
    rdma_nl_rcv_msg drivers/infiniband/core/netlink.c:195 [inline]
    rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
    rdma_nl_rcv+0x5d9/0x980 drivers/infiniband/core/netlink.c:259
    netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
    netlink_unicast+0x59e/0x7e0 net/netlink/af_netlink.c:1329
    netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1918
    sock_sendmsg_nosec net/socket.c:652 [inline]
    sock_sendmsg+0xd7/0x130 net/socket.c:672
    ____sys_sendmsg+0x753/0x880 net/socket.c:2343
    ___sys_sendmsg+0x100/0x170 net/socket.c:2397
    __sys_sendmsg+0x105/0x1d0 net/socket.c:2430
    __do_sys_sendmsg net/socket.c:2439 [inline]
    __se_sys_sendmsg net/socket.c:2437 [inline]
    __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2437
    do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
  RIP: 0033:0x4403d9
  Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
  RSP: 002b:00007ffc0efbc5c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
  RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004403d9
  RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000004
  RBP: 00000000006ca018 R08: 0000000000000008 R09: 00000000004002c8
  R10: 000000000000004a R11: 0000000000000246 R12: 0000000000401c60
  R13: 0000000000401cf0 R14: 0000000000000000 R15: 0000000000000000

Fixes: b389327df9 ("RDMA/nldev: Allow counter manual mode configration through RDMA netlink")
Link: https://lore.kernel.org/r/20200227125111.99142-1-leon@kernel.org
Reported-by: syzbot+bd4af81bc51ee0283445@syzkaller.appspotmail.com
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04 14:17:10 -04:00
Jason Gunthorpe
a4e63bce14 RDMA/odp: Ensure the mm is still alive before creating an implicit child
Registration of a mmu_notifier requires the caller to hold a mmget() on
the mm as registration is not permitted to race with exit_mmap(). There is
a BUG_ON inside the mmu_notifier to guard against this.

Normally creating a umem is done against current which implicitly holds
the mmget(), however an implicit ODP child is created from a pagefault
work queue and is not guaranteed to have a mmget().

Call mmget() around this registration and abort faulting if the MM has
gone to exit_mmap().

Before the patch below the notifier was registered when the implicit ODP
parent was created, so there was no chance to register a notifier outside
of current.

Fixes: c571feca2d ("RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm'")
Link: https://lore.kernel.org/r/20200227114118.94736-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04 13:56:07 -04:00
Maor Gottlieb
e38b55ea04 RDMA/core: Fix protection fault in ib_mr_pool_destroy
Fix NULL pointer dereference in the error flow of ib_create_qp_user
when accessing to uninitialized list pointers - rdma_mrs and sig_mrs.
The following crash from syzkaller revealed it.

  kasan: GPF could be caused by NULL-ptr deref or user memory access
  general protection fault: 0000 [#1] SMP KASAN PTI
  CPU: 1 PID: 23167 Comm: syz-executor.1 Not tainted 5.5.0-rc5 #2
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
  rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  RIP: 0010:ib_mr_pool_destroy+0x81/0x1f0
  Code: 00 00 fc ff df 49 c1 ec 03 4d 01 fc e8 a8 ea 72 fe 41 80 3c 24 00
  0f 85 62 01 00 00 48 8b 13 48 89 d6 4c 8d 6a c8 48 c1 ee 03 <42> 80 3c
  3e 00 0f 85 34 01 00 00 48 8d 7a 08 4c 8b 02 48 89 fe 48
  RSP: 0018:ffffc9000951f8b0 EFLAGS: 00010046
  RAX: 0000000000040000 RBX: ffff88810f268038 RCX: ffffffff82c41628
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffc9000951f850
  RBP: ffff88810f268020 R08: 0000000000000004 R09: fffff520012a3f0a
  R10: 0000000000000001 R11: fffff520012a3f0a R12: ffffed1021e4d007
  R13: ffffffffffffffc8 R14: 0000000000000246 R15: dffffc0000000000
  FS:  00007f54bc788700(0000) GS:ffff88811b100000(0000)
  knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 0000000116920002 CR4: 0000000000360ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   rdma_rw_cleanup_mrs+0x15/0x30
   ib_destroy_qp_user+0x674/0x7d0
   ib_create_qp_user+0xb01/0x11c0
   create_qp+0x1517/0x2130
   ib_uverbs_create_qp+0x13e/0x190
   ib_uverbs_write+0xaa5/0xdf0
   __vfs_write+0x7c/0x100
   vfs_write+0x168/0x4a0
   ksys_write+0xc8/0x200
   do_syscall_64+0x9c/0x390
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x465b49
  Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89
  f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
  f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007f54bc787c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000465b49
  RDX: 0000000000000040 RSI: 0000000020000540 RDI: 0000000000000003
  RBP: 00007f54bc787c70 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 00007f54bc7886bc
  R13: 00000000004ca2ec R14: 000000000070ded0 R15: 0000000000000005

Fixes: a060b5629a ("IB/core: generic RDMA READ/WRITE API")
Link: https://lore.kernel.org/r/20200227112708.93023-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04 13:43:02 -04:00
Artemy Kovalyov
de5ed007a0 IB/mlx5: Fix implicit ODP race
Following race may occur because of the call_srcu and the placement of
the synchronize_srcu vs the xa_erase.

CPU0				   CPU1

mlx5_ib_free_implicit_mr:	   destroy_unused_implicit_child_mr:
 xa_erase(odp_mkeys)
 synchronize_srcu()
				    xa_lock(implicit_children)
				    if (still in xarray)
				       atomic_inc()
				       call_srcu()
				    xa_unlock(implicit_children)
 xa_erase(implicit_children):
   xa_lock(implicit_children)
   __xa_erase()
   xa_unlock(implicit_children)

 flush_workqueue()
				   [..]
				    free_implicit_child_mr_rcu:
				     (via call_srcu)
				      queue_work()

 WARN_ON(atomic_read())
				   [..]
				    free_implicit_child_mr_work:
				     (via wq)
				      free_implicit_child_mr()
 mlx5_mr_cache_invalidate()
				     mlx5_ib_update_xlt() <-- UMR QP fail
				     atomic_dec()

The wait_event() solves the race because it blocks until
free_implicit_child_mr_work() completes.

Fixes: 5256edcb98 ("RDMA/mlx5: Rework implicit ODP destroy")
Link: https://lore.kernel.org/r/20200227113918.94432-1-leon@kernel.org
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04 13:25:00 -04:00
Alexander Lobakin
91b74bf531 IB/mlx5: Optimize u64 division on 32-bit arches
Commit f164be8c03 ("IB/mlx5: Extend caps stage to handle VAR
capabilities") introduced a straight "/" division of the u64 variable
"bar_size".

This was fixed with commit 685eff5131 ("IB/mlx5: Use div64_u64 for
num_var_hw_entries calculation"). However, div64_u64() is redundant here
as mlx5_var_table::stride_size is of type u32.  Make the actual code way
more optimized on 32-bit kernels using div_u64() and fix 80 chars
break-through by the way.

Fixes: 685eff5131 ("IB/mlx5: Use div64_u64 for num_var_hw_entries calculation")
Link: https://lore.kernel.org/r/20200217073629.8051-1-alobakin@dlink.ru
Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04 13:11:49 -04:00
Jason Gunthorpe
c13cac2a21 Linux 5.6-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl5cOX8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGw0AH/0nex1uEpzUTm+Gw
 D8QPFr3y61sYLu7sIMVt39+Zl6OSxvOOX14QIM/mrNrzjjRI8EXGvYgES5gSO4on
 6NLS6/64c1oQThDHCsxusKoSWLZ9KqP2vRPt7tZjn7DZMzEsuLhlINKBeupcqALX
 FnOBr768P+if/j0WcDR2pBaMg3ch+XC5sfYav7kapjgWUqCx9BvrHKLXXdlEGUC0
 7Ku7PH+nF7CIHiTay+i89odvOd8aLGsa/SUf5XGauKkH65VgQkmksgPeZUPqTnyC
 MEsyLJLfn4AP3ySwqzfSLac8jqZG8FGBt4DgM2MQBHibctzfeMIznfcfh/A8+Edx
 jqLKLAs=
 =4075
 -----END PGP SIGNATURE-----

Merge tag 'v5.6-rc4' into rdma.git for-next

Required due to dependencies in following patches.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04 13:11:06 -04:00
Kamal Heib
bb8865f435 RDMA/providers: Fix return value when QP type isn't supported
The proper return code is "-EOPNOTSUPP" when the requested QP type is
not supported by the provider.

Link: https://lore.kernel.org/r/20200130082049.463-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04 12:13:42 -04:00
Michael Guralnik
5e29d1443c RDMA/mlx5: Prevent UMR usage with RO only when we have RO caps
Relaxed ordering is not supported in UMR so we are disabling UMR usage
when user passes relaxed ordering access flag.

Enable using UMR when user requested relaxed ordering but there are no
relaxed ordering capabilities.

This will prevent user from unnecessarily registering a new mkey.

Fixes: d6de0bb185 ("RDMA/mlx5: Set relaxed ordering when requested")
Link: https://lore.kernel.org/r/20200227113834.94233-1-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-02 11:10:38 -04:00
YueHaibing
75d0366508 RDMA/bnxt_re: Remove set but not used variables 'pg' and 'idx'
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/infiniband/hw/bnxt_re/qplib_rcfw.c: In function '__send_message':
drivers/infiniband/hw/bnxt_re/qplib_rcfw.c:101:10: warning:
 variable 'idx' set but not used [-Wunused-but-set-variable]
drivers/infiniband/hw/bnxt_re/qplib_rcfw.c:101:6: warning:
 variable 'pg' set but not used [-Wunused-but-set-variable]

commit cee0c7bba4 ("RDMA/bnxt_re: Refactor command queue management
code") involved this, but not used.

Link: https://lore.kernel.org/r/20200227064900.92255-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-02 11:10:38 -04:00
YueHaibing
a0b404a98e RDMA/bnxt_re: Remove set but not used variable 'dev_attr'
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/infiniband/hw/bnxt_re/ib_verbs.c: In function 'bnxt_re_create_gsi_qp':
drivers/infiniband/hw/bnxt_re/ib_verbs.c:1283:30: warning:
 variable 'dev_attr' set but not used [-Wunused-but-set-variable]

commit 8dae419f9e ("RDMA/bnxt_re: Refactor queue pair creation code")
involved this, but not used, so remove it.

Link: https://lore.kernel.org/r/20200227064542.91205-1-yuehaibing@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-02 11:10:38 -04:00
YueHaibing
6be2067d1e RDMA/bnxt_re: Remove set but not used variable 'pg_size'
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/infiniband/hw/bnxt_re/qplib_res.c: In function '__alloc_pbl':
drivers/infiniband/hw/bnxt_re/qplib_res.c:109:13: warning:
 variable 'pg_size' set but not used [-Wunused-but-set-variable]

commit 0c4dcd6028 ("RDMA/bnxt_re: Refactor hardware queue memory
allocation") involved this, but not used, so remove it.

Link: https://lore.kernel.org/r/20200227064209.87893-1-yuehaibing@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-02 11:10:38 -04:00
Selvin Xavier
66832705c4 RDMA/bnxt_re: Use driver_unregister and unregistration API
Using the new unregister APIs provided by the core.  Provide the
dealloc_driver hook for the core to callback at the time of device
un-registration.

bnxt_re VF resources are created by the corresponding PF driver.  During
ib_unregister_driver, PF might get removed before VF and this could cause
failure when VFs are removed. Driver is explicitly queuing the removal of
VF devices before calling ib_unregister_driver.

Link: https://lore.kernel.org/r/1582731932-26574-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-02 11:10:38 -04:00
Selvin Xavier
c2b777a959 RDMA/bnxt_re: Refactor device add/remove functionalities
- bnxt_re_ib_reg() handles two main functionalities - initializing the
   device and registering with the IB stack.  Split it into 2 functions
   i.e. bnxt_re_dev_init() and bnxt_re_ib_init() to account for the same
   thereby improve modularity. Do the same for
   bnxt_re_ib_unreg()i.e. split into two functions - bnxt_re_dev_uninit()
   and bnxt_re_ib_uninit().

 - Simplify the code by combining the different steps to add and remove
   the device into two functions.

 - Report correct netdev link state during device register

Link: https://lore.kernel.org/r/1582731932-26574-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-02 11:10:37 -04:00
Dennis Dalessandro
817a68a658 IB/hfi1, qib: Ensure RCU is locked when accessing list
The packet handling function, specifically the iteration of the qp list
for mad packet processing misses locking RCU before running through the
list. Not only is this incorrect, but the list_for_each_entry_rcu() call
can not be called with a conditional check for lock dependency. Remedy
this by invoking the rcu lock and unlock around the critical section.

This brings MAD packet processing in line with what is done for non-MAD
packets.

Fixes: 7724105686 ("IB/hfi1: add driver files")
Link: https://lore.kernel.org/r/20200225195445.140896.41873.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-02 11:10:21 -04:00
Gal Pressman
ff6629f88c RDMA/efa: Do not delay freeing of DMA pages
When destroying a DMA mmapped object, there is no need to artificially
delay the freeing of the pages to the mmap entry removal.  Since the vma
keeps a reference count on these pages, free_pages_exact can be called on
the destroy verb as it won't really free the pages until the reference
count is cleared (in case the user hasn't called munmap yet).

Remove the special handling of DMA pages and call free_pages_exact on
destroy_qp/cq. The mmap entry removal is moved to the beginning of the
destroy flows, so the driver can safely free the pages.

Link: https://lore.kernel.org/r/20200225114010.21790-4-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-28 12:12:12 -04:00
Gal Pressman
56a7a721dd RDMA/efa: Properly document the interrupt mask register
The fact that the LSB in the register is the enable bit should not be an
implicit assumption between the driver and the device, properly document
that in the register definition.

Link: https://lore.kernel.org/r/20200225114010.21790-3-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-28 12:12:12 -04:00
Gal Pressman
88d033077b RDMA/efa: Unified getters/setters for device structs bitmask access
Use unified macros for device structs access instead of open coding the
shifts and masks over and over again.

Link: https://lore.kernel.org/r/20200225114010.21790-2-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-28 12:12:04 -04:00
Xi Wang
cfec045b82 RDMA/hns: Optimize qp doorbell allocation flow
Encapsulate the kernel qp doorbell allocation related code into 2
functions: alloc_qp_db() and free_qp_db().

Link: https://lore.kernel.org/r/1582526258-13825-8-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-28 11:35:11 -04:00
Xi Wang
b37c413997 RDMA/hns: Optimize kernel qp wrid allocation flow
Encapsulate the kernel qp wrid allocation related code into 2 functions:
alloc_kernel_wrid() and free_kernel_wrid().

Link: https://lore.kernel.org/r/1582526258-13825-7-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-28 11:35:11 -04:00
Xi Wang
ae85bf92ef RDMA/hns: Optimize qp param setup flow
Encapsulate the qp param setup related code into set_qp_param().

Link: https://lore.kernel.org/r/1582526258-13825-6-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-28 11:35:11 -04:00
Xi Wang
24c22112b9 RDMA/hns: Optimize qp buffer allocation flow
Encapsulate qp buffer allocation related code into 3 functions:
alloc_qp_buf(), map_wqe_buf() and free_qp_buf().

Link: https://lore.kernel.org/r/1582526258-13825-5-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-28 11:35:11 -04:00
Xi Wang
df83a66e1b RDMA/hns: Optimize qp number assign flow
Encapsulate the code associated with the qp number assignment into
alloc_qpn() and free_qpn().

Link: https://lore.kernel.org/r/1582526258-13825-4-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-28 11:35:10 -04:00
Xi Wang
b71961d1da RDMA/hns: Optimize qp context create and destroy flow
Rename the qp context related functions and adjusts the code location to
distinguish between the qp context and the entire qp.

Link: https://lore.kernel.org/r/1582526258-13825-3-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-28 11:35:10 -04:00
Xi Wang
e365b26c6b RDMA/hns: Optimize qp destroy flow
Wrap the duplicate code in hip08 and hip06 qp destruction process as
hns_roce_qp_destroy() to simply the qp destroy flow.

Link: https://lore.kernel.org/r/1582526258-13825-2-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-28 11:35:10 -04:00
Yixian Liu
75c994e694 RDMA/hns: Stop doorbell update while qp state error
There are two paths to update qp producer index into hardware now, one
path is doorbell in post verbs (send and recv), the another is mailbox in
modify qp verb which is called by flush process. This will lead the
hardware to be broken to correctly generate flush cqe.  With stopping
doorbell update and holding qp spinlock in modify qp during flush process,
the problem can be solved.

Fixes: 0425e3e6e0 ("RDMA/hns: Support flush cqe for hip08 in kernel space")
Link: https://lore.kernel.org/r/1582367158-27030-3-git-send-email-liuyixian@huawei.com
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-28 11:28:31 -04:00
Yixian Liu
0fc99566f6 RDMA/hns: Use flush framework for the case in aeq
As now we already have flush framework, using it instead of current flush
process for qp error in asynchronized interrupt (aeq).

Link: https://lore.kernel.org/r/1582367158-27030-2-git-send-email-liuyixian@huawei.com
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-28 11:28:31 -04:00
Lang Cheng
dfaf2854b0 RDMA/hns: Treat revision HIP08_A as a special case
Set revisions that equal to or higher than HIP08_B as default to maintain
backward compatibility.

Link: https://lore.kernel.org/r/1582363039-10714-1-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-28 11:24:08 -04:00
Maor Gottlieb
801b67f3ea RDMA/core: Fix pkey and port assignment in get_new_pps
When port is part of the modify mask, then we should take it from the
qp_attr and not from the old pps. Same for PKEY. Otherwise there are
panics in some configurations:

  RIP: 0010:get_pkey_idx_qp_list+0x50/0x80 [ib_core]
  Code: c7 18 e8 13 04 30 ef 0f b6 43 06 48 69 c0 b8 00 00 00 48 03 85 a0 04 00 00 48 8b 50 20 48 8d 48 20 48 39 ca 74 1a 0f b7 73 04 <66> 39 72 10 75 08 eb 10 66 39 72 10 74 0a 48 8b 12 48 39 ca 75 f2
  RSP: 0018:ffffafb3480932f0 EFLAGS: 00010203
  RAX: ffff98059ababa10 RBX: ffff980d926e8cc0 RCX: ffff98059ababa30
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff98059ababa28
  RBP: ffff98059b940000 R08: 00000000000310c0 R09: ffff97fe47c07480
  R10: 0000000000000036 R11: 0000000000000200 R12: 0000000000000071
  R13: ffff98059b940000 R14: ffff980d87f948a0 R15: 0000000000000000
  FS:  00007f88deb31740(0000) GS:ffff98059f600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000010 CR3: 0000000853e26001 CR4: 00000000001606e0
  Call Trace:
   port_pkey_list_insert+0x3d/0x1b0 [ib_core]
   ? kmem_cache_alloc_trace+0x215/0x220
   ib_security_modify_qp+0x226/0x3a0 [ib_core]
   _ib_modify_qp+0xcf/0x390 [ib_core]
   ipoib_init_qp+0x7f/0x200 [ib_ipoib]
   ? rvt_modify_port+0xd0/0xd0 [rdmavt]
   ? ib_find_pkey+0x99/0xf0 [ib_core]
   ipoib_ib_dev_open_default+0x1a/0x200 [ib_ipoib]
   ipoib_ib_dev_open+0x96/0x130 [ib_ipoib]
   ipoib_open+0x44/0x130 [ib_ipoib]
   __dev_open+0xd1/0x160
   __dev_change_flags+0x1ab/0x1f0
   dev_change_flags+0x23/0x60
   do_setlink+0x328/0xe30
   ? __nla_validate_parse+0x54/0x900
   __rtnl_newlink+0x54e/0x810
   ? __alloc_pages_nodemask+0x17d/0x320
   ? page_fault+0x30/0x50
   ? _cond_resched+0x15/0x30
   ? kmem_cache_alloc_trace+0x1c8/0x220
   rtnl_newlink+0x43/0x60
   rtnetlink_rcv_msg+0x28f/0x350
   ? kmem_cache_alloc+0x1fb/0x200
   ? _cond_resched+0x15/0x30
   ? __kmalloc_node_track_caller+0x24d/0x2d0
   ? rtnl_calcit.isra.31+0x120/0x120
   netlink_rcv_skb+0xcb/0x100
   netlink_unicast+0x1e0/0x340
   netlink_sendmsg+0x317/0x480
   ? __check_object_size+0x48/0x1d0
   sock_sendmsg+0x65/0x80
   ____sys_sendmsg+0x223/0x260
   ? copy_msghdr_from_user+0xdc/0x140
   ___sys_sendmsg+0x7c/0xc0
   ? skb_dequeue+0x57/0x70
   ? __inode_wait_for_writeback+0x75/0xe0
   ? fsnotify_grab_connector+0x45/0x80
   ? __dentry_kill+0x12c/0x180
   __sys_sendmsg+0x58/0xa0
   do_syscall_64+0x5b/0x200
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f88de467f10

Link: https://lore.kernel.org/r/20200227125728.100551-1-leon@kernel.org
Cc: <stable@vger.kernel.org>
Fixes: 1dd017882e ("RDMA/core: Fix protection fault in get_pkey_idx_qp_list")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-28 11:16:08 -04:00
Jason Gunthorpe
c14dfddbd8 RMDA/cm: Fix missing ib_cm_destroy_id() in ib_cm_insert_listen()
The algorithm pre-allocates a cm_id since allocation cannot be done while
holding the cm.lock spinlock, however it doesn't free it on one error
path, leading to a memory leak.

Fixes: 067b171b86 ("IB/cm: Share listening CM IDs")
Link: https://lore.kernel.org/r/20200221152023.GA8680@ziepe.ca
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-27 16:43:29 -04:00
Leon Romanovsky
699d9e7542 RDMA/opa_vnic: Delete driver version
The default version provided by "ethtool -i" it the correct way
to identify Driver version. There is no need to overwrite it.

Link: https://lore.kernel.org/r/20200220071239.231800-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-27 16:40:40 -04:00
Leon Romanovsky
9687072071 RDMA/ipoib: Don't set constant driver version
There is no need to set driver version in in-tree kernel code.

Link: https://lore.kernel.org/r/20200220071239.231800-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-27 16:40:40 -04:00
Jason Gunthorpe
7c11910783 RDMA/ucma: Put a lock around every call to the rdma_cm layer
The rdma_cm must be used single threaded.

This appears to be a bug in the design, as it does have lots of locking
that seems like it should allow concurrency. However, when it is all said
and done every single place that uses the cma_exch() scheme is broken, and
all the unlocked reads from the ucma of the cm_id data are wrong too.

syzkaller has been finding endless bugs related to this.

Fixing this in any elegant way is some enormous amount of work. Take a
very big hammer and put a mutex around everything to do with the
ucma_context at the top of every syscall.

Fixes: 7521663857 ("RDMA/cma: Export rdma cm interface to userspace")
Link: https://lore.kernel.org/r/20200218210432.GA31966@ziepe.ca
Reported-by: syzbot+adb15cf8c2798e4e0db4@syzkaller.appspotmail.com
Reported-by: syzbot+e5579222b6a3edd96522@syzkaller.appspotmail.com
Reported-by: syzbot+4b628fcc748474003457@syzkaller.appspotmail.com
Reported-by: syzbot+29ee8f76017ce6cf03da@syzkaller.appspotmail.com
Reported-by: syzbot+6956235342b7317ec564@syzkaller.appspotmail.com
Reported-by: syzbot+b358909d8d01556b790b@syzkaller.appspotmail.com
Reported-by: syzbot+6b46b135602a3f3ac99e@syzkaller.appspotmail.com
Reported-by: syzbot+8458d13b13562abf6b77@syzkaller.appspotmail.com
Reported-by: syzbot+bd034f3fdc0402e942ed@syzkaller.appspotmail.com
Reported-by: syzbot+c92378b32760a4eef756@syzkaller.appspotmail.com
Reported-by: syzbot+68b44a1597636e0b342c@syzkaller.appspotmail.com
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-27 16:40:40 -04:00
Kamal Heib
25baba217c RDMA/siw: Fix setting active_{speed, width} attributes
Make sure to set the active_{speed, width} attributes to avoid reporting
the same values regardless of the underlying device.

Fixes: 303ae1cdfd ("rdma/siw: application interface")
Link: https://lore.kernel.org/r/20200218095911.26614-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Tested-by: Bernard Metzler <bmt@zurich.ibm.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-27 16:40:40 -04:00
Jason Gunthorpe
65a1662015 RDMA/bnxt_re: Using vmalloc requires including vmalloc.h
Add it

Fixes: 0c4dcd6028 ("RDMA/bnxt_re: Refactor hardware queue memory allocation")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-26 13:24:41 -04:00
Ingo Molnar
e9765680a3 EFI updates for v5.7:
This time, the set of changes for the EFI subsystem is much larger than
 usual. The main reasons are:
 - Get things cleaned up before EFI support for RISC-V arrives, which will
   increase the size of the validation matrix, and therefore the threshold to
   making drastic changes,
 - After years of defunct maintainership, the GRUB project has finally started
   to consider changes from the distros regarding UEFI boot, some of which are
   highly specific to the way x86 does UEFI secure boot and measured boot,
   based on knowledge of both shim internals and the layout of bootparams and
   the x86 setup header. Having this maintenance burden on other architectures
   (which don't need shim in the first place) is hard to justify, so instead,
   we are introducing a generic Linux/UEFI boot protocol.
 
 Summary of changes:
 - Boot time GDT handling changes (Arvind)
 - Simplify handling of EFI properties table on arm64
 - Generic EFI stub cleanups, to improve command line handling, file I/O,
   memory allocation, etc.
 - Introduce a generic initrd loading method based on calling back into
   the firmware, instead of relying on the x86 EFI handover protocol or
   device tree.
 - Introduce a mixed mode boot method that does not rely on the x86 EFI
   handover protocol either, and could potentially be adopted by other
   architectures (if another one ever surfaces where one execution mode
   is a superset of another)
 - Clean up the contents of struct efi, and move out everything that
   doesn't need to be stored there.
 - Incorporate support for UEFI spec v2.8A changes that permit firmware
   implementations to return EFI_UNSUPPORTED from UEFI runtime services at
   OS runtime, and expose a mask of which ones are supported or unsupported
   via a configuration table.
 - Various documentation updates and minor code cleanups (Heinrich)
 - Partial fix for the lack of by-VA cache maintenance in the decompressor
   on 32-bit ARM. Note that these patches were deliberately put at the
   beginning so they can be used as a stable branch that will be shared with
   a PR containing the complete fix, which I will send to the ARM tree.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEnNKg2mrY9zMBdeK7wjcgfpV0+n0FAl5S7WYACgkQwjcgfpV0
 +n1jmQgAmwV3V8pbgB4mi4P2Mv8w5Zj5feUe6uXnTR2AFv5nygLcTzdxN+TU/6lc
 OmZv2zzdsAscYlhuUdI/4t4cXIjHAZI39+UBoNRuMqKbkbvXCFscZANLxvJjHjZv
 FFbgUk0DXkF0BCEDuSLNavidAv4b3gZsOmHbPfwuB8xdP05LbvbS2mf+2tWVAi2z
 ULfua/0o9yiwl19jSS6iQEPCvvLBeBzTLW7x5Rcm/S0JnotzB59yMaeqD7jO8JYP
 5PvI4WM/l5UfVHnZp2k1R76AOjReALw8dQgqAsT79Q7+EH3sNNuIjU6omdy+DFf4
 tnpwYfeLOaZ1SztNNfU87Hsgnn2CGw==
 =pDE3
 -----END PGP SIGNATURE-----

Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/core

Pull EFI updates for v5.7 from Ard Biesheuvel:

This time, the set of changes for the EFI subsystem is much larger than
usual. The main reasons are:

 - Get things cleaned up before EFI support for RISC-V arrives, which will
   increase the size of the validation matrix, and therefore the threshold to
   making drastic changes,

 - After years of defunct maintainership, the GRUB project has finally started
   to consider changes from the distros regarding UEFI boot, some of which are
   highly specific to the way x86 does UEFI secure boot and measured boot,
   based on knowledge of both shim internals and the layout of bootparams and
   the x86 setup header. Having this maintenance burden on other architectures
   (which don't need shim in the first place) is hard to justify, so instead,
   we are introducing a generic Linux/UEFI boot protocol.

Summary of changes:

 - Boot time GDT handling changes (Arvind)

 - Simplify handling of EFI properties table on arm64

 - Generic EFI stub cleanups, to improve command line handling, file I/O,
   memory allocation, etc.

 - Introduce a generic initrd loading method based on calling back into
   the firmware, instead of relying on the x86 EFI handover protocol or
   device tree.

 - Introduce a mixed mode boot method that does not rely on the x86 EFI
   handover protocol either, and could potentially be adopted by other
   architectures (if another one ever surfaces where one execution mode
   is a superset of another)

 - Clean up the contents of struct efi, and move out everything that
   doesn't need to be stored there.

 - Incorporate support for UEFI spec v2.8A changes that permit firmware
   implementations to return EFI_UNSUPPORTED from UEFI runtime services at
   OS runtime, and expose a mask of which ones are supported or unsupported
   via a configuration table.

 - Various documentation updates and minor code cleanups (Heinrich)

 - Partial fix for the lack of by-VA cache maintenance in the decompressor
   on 32-bit ARM. Note that these patches were deliberately put at the
   beginning so they can be used as a stable branch that will be shared with
   a PR containing the complete fix, which I will send to the ARM tree.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-02-26 15:21:22 +01:00
Ard Biesheuvel
d79b348c35 infiniband: hfi1: Use EFI GetVariable only when available
Replace the EFI runtime services check with one that tells us whether
EFI GetVariable() is implemented by the firmware.

Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23 21:59:42 +01:00
Linus Torvalds
b98b809c0a SCSI fixes on 20200221
Four non-core fixes.  Two are reverts of target fixes which turned out
 to have unwanted side effects, one is a revert of an RDMA fix with the
 same problem and the final one fixes an incorrect warning about memory
 allocation failures in megaraid_sas (the driver actually reduces the
 allocation size until it succeeds).
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXlBhuyYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishdMGAQCR8Qi2
 m2kPgccUvJwVmnJ+DRJ3MRRX3Kn0IJIDoIc0IgEA6/W33+7xY8qQ0uahOyOT90tz
 g7Y2I7TxQ+dsL9pqs80=
 =JIpx
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Four non-core fixes.

  Two are reverts of target fixes which turned out to have unwanted side
  effects, one is a revert of an RDMA fix with the same problem and the
  final one fixes an incorrect warning about memory allocation failures
  in megaraid_sas (the driver actually reduces the allocation size until
  it succeeds)"

Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: Revert "target: iscsi: Wait for all commands to finish before freeing a session"
  scsi: Revert "RDMA/isert: Fix a recently introduced regression related to logout"
  scsi: megaraid_sas: silence a warning
  scsi: Revert "target/core: Inline transport_lun_remove_cmd()"
2020-02-22 11:00:52 -08:00
Devesh Sharma
6ccad8483b RDMA/bnxt_re: use ibdev based message printing functions
Replacing the dev_err/dbg/warn with ibdev_err/dbg/warn. In the IB device
provider driver these functions are recommended to use.

Currently qplib layer function calls has not been replaced due to
unavailability of ib_device pointer at that layer.

Link: https://lore.kernel.org/r/1581786665-23705-9-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-21 20:21:44 -04:00
Devesh Sharma
6f53196bc5 RDMA/bnxt_re: Refactor doorbell management functions
Moving all the fast path doorbell functions at one place under
qplib_res.h. To pass doorbell record information a new structure
bnxt_qplib_db_info has been introduced.  Every roce object holds an
instance of this structure and doorbell information is initialized during
resource creation.

When DB is rung only the current queue index is read from hardware ring
and rest of the data is taken from pre-initialized dbinfo structure.

Link: https://lore.kernel.org/r/1581786665-23705-8-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-21 20:21:44 -04:00
Devesh Sharma
9555352bac RDMA/bnxt_re: Refactor notification queue management code
Cleaning up the notification queue data structures and management
code. The CQ and SRQ event handlers have been type defined instead of
in-place declaration. NQ doorbell register descriptor has been added in
base NQ structure.  The nq->vector has been renamed to nq->msix_vec.

Link: https://lore.kernel.org/r/1581786665-23705-7-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-21 20:21:43 -04:00
Devesh Sharma
cee0c7bba4 RDMA/bnxt_re: Refactor command queue management code
Refactoring the command queue (rcfw) management code. A new data-structure
is introduced to describe the bar register.  each object which deals with
mmio space should have a descriptor structure. This structure specifically
hold DB register information.  Thus, slow path creq structure now hold a
bar register descriptor.

Further cleanup the rcfw structure to introduce the command queue context
and command response event queue context structures. Rest of the rcfw
related code has been touched to incorporate these three structures.

Link: https://lore.kernel.org/r/1581786665-23705-6-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-21 20:21:43 -04:00
Devesh Sharma
b08fe048a6 RDMA/bnxt_re: Refactor net ring allocation function
Introducing a new attribute structure to reduce the long list of arguments
passed in bnxt_re_net_ring_alloc() function.

The caller of bnxt_re_net_ring_alloc should fill in the list of attributes
in bnxt_re_ring_attr structure and then pass the pointer to the function.

Link: https://lore.kernel.org/r/1581786665-23705-5-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-21 20:21:43 -04:00
Devesh Sharma
0c4dcd6028 RDMA/bnxt_re: Refactor hardware queue memory allocation
At top level there are three major data structure addition.  viz
bnxt_qplib_hwq_attr, bnxt_qplib_sg_info and bnxt_qplib_tqm_ctx

Intorduction of first data structure reduces the arguments list to
bnxt_re_alloc_init_hwq() function. There are changes all over the driver
code to incorporate this new structure. The caller needs to fill the
attribute data structure and pass to this function.

The second data structure is to pass memory region description
viz. sghead, page_size and page_shift. There are changes all over the
driver code to initialize bnxt_re_sg_info data structure. The new data
structure helps to reduce the argument list of __alloc_pbl() function
call.

Till now the TQM rings related members were not collected under any
specific data-structure making it hard to manage. The third data
sctructure bnxt_qplib_tqm_ctx is added to refactor the TQM queue
allocation and initialization.

Link: https://lore.kernel.org/r/1581786665-23705-4-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-21 20:21:43 -04:00
Devesh Sharma
0cfb329db9 RDMA/bnxt_re: Replace chip context structure with pointer
The chip_ctx member in bnxt_re_dev structure is now a pointer to struct
bnxt_qplib_chip_ctx. Since the member type has changed there are changes
in rest of the code wherever dev->chip_ctx is used.

Link: https://lore.kernel.org/r/1581786665-23705-3-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-21 20:21:43 -04:00
Devesh Sharma
8dae419f9e RDMA/bnxt_re: Refactor queue pair creation code
Restructuring the bnxt_re_create_qp function. Listing below the major
changes:
 - Monolithic central part of create_qp where attributes are initialized
   is now enclosed in one function and this new function has few more
   sub-functions.
 - Top level qp limit checking code moved to a function.
 - GSI QP creation and GSI Shadow qp creation code is handled in a sub
   function.

Link: https://lore.kernel.org/r/1581786665-23705-2-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-21 20:21:42 -04:00
Gustavo A. R. Silva
5b361328ca RDMA: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Link: https://lore.kernel.org/r/20200213010425.GA13068@embeddedor.com
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> # added a few more
2020-02-20 13:33:51 -04:00
Lang Cheng
52c5e9e749 RDMA/hns: Initialize all fields of doorbells to zero
Prevent uninitialized fields when new fields are added, and make code look
simpler.

Link: https://lore.kernel.org/r/1582162471-50361-1-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-20 13:22:02 -04:00
Max Gurtovoy
6affca140c RDMA/rw: Fix error flow during RDMA context initialization
In case the SGL was mapped for P2P DMA operation, we must unmap it using
pci_p2pdma_unmap_sg during the error unwind of rdma_rw_ctx_init()

Fixes: 7f73eac3a7 ("PCI/P2PDMA: Introduce pci_p2pdma_unmap_sg()")
Link: https://lore.kernel.org/r/20200220100819.41860-1-maxg@mellanox.com
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-20 13:19:21 -04:00
Colin Ian King
8d8d2b76ac RDMA/hns: fix spelling mistake: "attatch" -> "attach"
There is a spelling mistake in a dev_err error message. Fix it.

Link: https://lore.kernel.org/r/20200214003338.6573-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-20 13:11:23 -04:00
Paul Blakey
0f0d3827c0 net/mlx5: E-Switch, Move source port on reg_c0 to the upper 16 bits
Multi chain support requires the miss path to continue the processing
from the last chain id, and for that we need to save the chain
miss tag (a mapping for 32bit chain id) on reg_c0 which will
come in a next patch.

Currently reg_c0 is exclusively used to store the source port
metadata, giving it 32bit, it is created from 16bits of vcha_id,
and 16bits of vport number.

We will move this source port metadata to upper 16bits, and leave the
lower bits for the chain miss tag. We compress the reg_c0 source port
metadata to 16bits by taking 8 bits from vhca_id, and 8bits from
the vport number.

Since we compress the vport number to 8bits statically, and leave two
top ids for special PF/ECPF numbers, we will only support a max of 254
vports with this strategy.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-19 17:49:48 -08:00
Bart Van Assche
fb3063d319 RDMA/rxe: Fix configuration of atomic queue pair attributes
From the comment above the definition of the roundup_pow_of_two() macro:

     The result is undefined when n == 0.

Hence only pass positive values to roundup_pow_of_two(). This patch fixes
the following UBSAN complaint:

  UBSAN: Undefined behaviour in ./include/linux/log2.h:57:13
  shift exponent 64 is too large for 64-bit type 'long unsigned int'
  Call Trace:
   dump_stack+0xa5/0xe6
   ubsan_epilogue+0x9/0x26
   __ubsan_handle_shift_out_of_bounds.cold+0x4c/0xf9
   rxe_qp_from_attr.cold+0x37/0x5d [rdma_rxe]
   rxe_modify_qp+0x59/0x70 [rdma_rxe]
   _ib_modify_qp+0x5aa/0x7c0 [ib_core]
   ib_modify_qp+0x3b/0x50 [ib_core]
   cma_modify_qp_rtr+0x234/0x260 [rdma_cm]
   __rdma_accept+0x1a7/0x650 [rdma_cm]
   nvmet_rdma_cm_handler+0x1286/0x14cd [nvmet_rdma]
   cma_cm_event_handler+0x6b/0x330 [rdma_cm]
   cma_ib_req_handler+0xe60/0x22d0 [rdma_cm]
   cm_process_work+0x30/0x140 [ib_cm]
   cm_req_handler+0x11f4/0x1cd0 [ib_cm]
   cm_work_handler+0xb8/0x344e [ib_cm]
   process_one_work+0x569/0xb60
   worker_thread+0x7a/0x5d0
   kthread+0x1e6/0x210
   ret_from_fork+0x24/0x30

Link: https://lore.kernel.org/r/20200217205714.26937-1-bvanassche@acm.org
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-19 16:56:57 -04:00
Selvin Xavier
0a01623b74 RDMA/bnxt_re: Use rdma_read_gid_hw_context to retrieve HW gid index
bnxt_re HW maintains a GID table with only a single entry for the two
duplicate GID entries (v1 and v2). Driver needs to map stack gid index to
the HW table gid index.  Use the new API rdma_read_gid_hw_context () to
retrieve the HW GID context to get the HW table index.

Link: https://lore.kernel.org/r/1582107594-5180-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-19 16:54:25 -04:00
Selvin Xavier
779820c2e1 RDMA/core: Add helper function to retrieve driver gid context from gid attr
Adding a helper function to retrieve the driver gid context from the gid
attr.

Link: https://lore.kernel.org/r/1582107594-5180-2-git-send-email-selvin.xavier@broadcom.com
Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-19 16:54:25 -04:00
Nathan Chancellor
4ca501d6aa RDMA/core: Fix use of logical OR in get_new_pps
Clang warns:

../drivers/infiniband/core/security.c:351:41: warning: converting the
enum constant to a boolean [-Wint-in-bool-context]
        if (!(qp_attr_mask & (IB_QP_PKEY_INDEX || IB_QP_PORT)) && qp_pps) {
                                               ^
1 warning generated.

A bitwise OR should have been used instead.

Fixes: 1dd017882e ("RDMA/core: Fix protection fault in get_pkey_idx_qp_list")
Link: https://lore.kernel.org/r/20200217204318.13609-1-natechancellor@gmail.com
Link: https://github.com/ClangBuiltLinux/linux/issues/889
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-19 16:42:12 -04:00
Jason Gunthorpe
167b95ec88 RDMA/ucma: Use refcount_t for the ctx->ref
Don't use an atomic as a refcount.

Link: https://lore.kernel.org/r/20200218191657.GA29724@ziepe.ca
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-19 16:41:00 -04:00
Parav Pandit
e4103312d7 Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow"
This reverts commit 219d2e9dfd.

The call chain below requires the cm_id_priv's destination address to be
setup before performing rdma_bind_addr(). Otherwise source port allocation
fails as cma_port_is_unique() no longer sees the correct tuple to allow
duplicate users of the source port.

rdma_resolve_addr()
  cma_bind_addr()
    rdma_bind_addr()
      cma_get_port()
        cma_alloc_any_port()
          cma_port_is_unique() <- compared with zero daddr

This can result in false failures to connect, particularly if the source
port range is restricted.

Fixes: 219d2e9dfd ("RDMA/cma: Simplify rdma_resolve_addr() error flow")
Link: https://lore.kernel.org/r/20200212072635.682689-4-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-19 14:25:52 -04:00
Jason Gunthorpe
b72bfc965e RDMA/core: Get rid of ib_create_qp_user
This function accepts a udata but does nothing with it, and is never
passed a !NULL udata. Rename it to ib_create_qp which was the only caller
and remove the udata.

Link: https://lore.kernel.org/r/20200213191911.GA9898@ziepe.ca
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-18 20:27:37 -04:00
Bart Van Assche
76261ada16 scsi: Revert "RDMA/isert: Fix a recently introduced regression related to logout"
Since commit 04060db411 introduces soft lockups when toggling network
interfaces, revert it.

Link: https://marc.info/?l=target-devel&m=158157054906196
Cc: Rahul Kundu <rahul.kundu@chelsio.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Reported-by: Dakshaja Uppalapati <dakshaja@chelsio.com>
Fixes: 04060db411 ("scsi: RDMA/isert: Fix a recently introduced regression related to logout")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-02-14 17:13:53 -05:00
Jason Gunthorpe
685eff5131 IB/mlx5: Use div64_u64 for num_var_hw_entries calculation
On i386:

ERROR: "__udivdi3" [drivers/infiniband/hw/mlx5/mlx5_ib.ko] undefined!
ERROR: "__divdi3" [drivers/infiniband/hw/mlx5/mlx5_ib.ko] undefined!

Fixes: f164be8c03 ("IB/mlx5: Extend caps stage to handle VAR capabilities")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Reported-by: Alexander Lobakin <alobakin@dlink.ru>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-14 15:21:52 -04:00
Yixian Liu
b53742865e RDMA/hns: Delayed flush cqe process with workqueue
HiP08 RoCE hardware lacks ability(a known hardware problem) to flush
outstanding WQEs if QP state gets into errored mode for some reason.  To
overcome this hardware problem and as a workaround, when QP is detected to
be in errored state during various legs like post send, post receive
etc[1], flush needs to be performed from the driver.

The earlier patch[1] sent to solve the hardware limitation explained in
the cover-letter had a bug in the software flushing leg. It acquired mutex
while modifying QP state to errored state and while conveying it to the
hardware using the mailbox. This caused leg to sleep while holding
spin-lock and caused crash.

Suggested Solution: we have proposed to defer the flushing of the QP in
the Errored state using the workqueue to get around with the limitation of
our hardware.

This patch specifically adds the calls to the flush handler from where
parts of the code like post_send/post_recv etc. when the QP state gets
into the errored mode.

[1] https://patchwork.kernel.org/patch/10534271/

Link: https://lore.kernel.org/r/1580983005-13899-3-git-send-email-liuyixian@huawei.com
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Reviewed-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13 16:44:21 -04:00
Yixian Liu
ffd541d457 RDMA/hns: Add the workqueue framework for flush cqe handler
HiP08 RoCE hardware lacks ability(a known hardware problem) to flush
outstanding WQEs if QP state gets into errored mode for some reason.  To
overcome this hardware problem and as a workaround, when QP is detected to
be in errored state during various legs like post send, post receive etc
[1], flush needs to be performed from the driver.

The earlier patch[1] sent to solve the hardware limitation explained in
the cover-letter had a bug in the software flushing leg. It acquired mutex
while modifying QP state to errored state and while conveying it to the
hardware using the mailbox. This caused leg to sleep while holding
spin-lock and caused crash.

Suggested Solution:
we have proposed to defer the flushing of the QP in the Errored state
using the workqueue to get around with the limitation of our hardware.

This patch adds the framework of the workqueue and the flush handler
function.

[1] https://patchwork.kernel.org/patch/10534271/

Link: https://lore.kernel.org/r/1580983005-13899-2-git-send-email-liuyixian@huawei.com
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Reviewed-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13 16:44:20 -04:00
Michael Guralnik
f03d9fadfe RDMA/core: Add weak ordering dma attr to dma mapping
For memory regions registered with IB_ACCESS_RELAXED_ORDERING will be dma
mapped with the DMA_ATTR_WEAK_ORDERING.

This will allow reads and writes to the mapping to be weakly ordered, such
change can enhance performance on some supporting architectures.

Link: https://lore.kernel.org/r/20200212073559.684139-1-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13 13:38:02 -04:00
Leon Romanovsky
1dd017882e RDMA/core: Fix protection fault in get_pkey_idx_qp_list
We don't need to set pkey as valid in case that user set only one of pkey
index or port number, otherwise it will be resulted in NULL pointer
dereference while accessing to uninitialized pkey list.  The following
crash from Syzkaller revealed it.

  kasan: CONFIG_KASAN_INLINE enabled
  kasan: GPF could be caused by NULL-ptr deref or user memory access
  general protection fault: 0000 [#1] SMP KASAN PTI
  CPU: 1 PID: 14753 Comm: syz-executor.2 Not tainted 5.5.0-rc5 #2
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
  rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  RIP: 0010:get_pkey_idx_qp_list+0x161/0x2d0
  Code: 01 00 00 49 8b 5e 20 4c 39 e3 0f 84 b9 00 00 00 e8 e4 42 6e fe 48
  8d 7b 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04
  02 84 c0 74 08 3c 01 0f 8e d0 00 00 00 48 8d 7d 04 48 b8
  RSP: 0018:ffffc9000bc6f950 EFLAGS: 00010202
  RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff82c8bdec
  RDX: 0000000000000002 RSI: ffffc900030a8000 RDI: 0000000000000010
  RBP: ffff888112c8ce80 R08: 0000000000000004 R09: fffff5200178df1f
  R10: 0000000000000001 R11: fffff5200178df1f R12: ffff888115dc4430
  R13: ffff888115da8498 R14: ffff888115dc4410 R15: ffff888115da8000
  FS:  00007f20777de700(0000) GS:ffff88811b100000(0000)
  knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000001b2f721000 CR3: 00000001173ca002 CR4: 0000000000360ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   port_pkey_list_insert+0xd7/0x7c0
   ib_security_modify_qp+0x6fa/0xfc0
   _ib_modify_qp+0x8c4/0xbf0
   modify_qp+0x10da/0x16d0
   ib_uverbs_modify_qp+0x9a/0x100
   ib_uverbs_write+0xaa5/0xdf0
   __vfs_write+0x7c/0x100
   vfs_write+0x168/0x4a0
   ksys_write+0xc8/0x200
   do_syscall_64+0x9c/0x390
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: d291f1a652 ("IB/core: Enforce PKey security on QPs")
Link: https://lore.kernel.org/r/20200212080651.GB679970@unreal
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Message-Id: <20200212080651.GB679970@unreal>
2020-02-13 12:31:56 -04:00
Leon Romanovsky
ca750d4a9c RDMA/ucma: Mask QPN to be 24 bits according to IBTA
IBTA declares QPN as 24bits, mask input to ensure that kernel
doesn't get higher bits and ensure by adding WANR_ONCE() that
other CM users do the same.

Link: https://lore.kernel.org/r/20200212072635.682689-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13 11:38:19 -04:00
Zhu Yanjun
8ac0e6641c RDMA/rxe: Fix soft lockup problem due to using tasklets in softirq
When run stress tests with RXE, the following Call Traces often occur

  watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/2:0]
  ...
  Call Trace:
  <IRQ>
  create_object+0x3f/0x3b0
  kmem_cache_alloc_node_trace+0x129/0x2d0
  __kmalloc_reserve.isra.52+0x2e/0x80
  __alloc_skb+0x83/0x270
  rxe_init_packet+0x99/0x150 [rdma_rxe]
  rxe_requester+0x34e/0x11a0 [rdma_rxe]
  rxe_do_task+0x85/0xf0 [rdma_rxe]
  tasklet_action_common.isra.21+0xeb/0x100
  __do_softirq+0xd0/0x298
  irq_exit+0xc5/0xd0
  smp_apic_timer_interrupt+0x68/0x120
  apic_timer_interrupt+0xf/0x20
  </IRQ>
  ...

The root cause is that tasklet is actually a softirq. In a tasklet
handler, another softirq handler is triggered. Usually these softirq
handlers run on the same cpu core. So this will cause "soft lockup Bug".

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200212072635.682689-8-leon@kernel.org
Signed-off-by: Zhu Yanjun <yanjunz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13 10:51:09 -04:00
Leon Romanovsky
9b6d3bbc13 RDMA/mlx5: Prevent overflow in mmap offset calculations
The cmd and index variables declared as u16 and the result is supposed to
be stored in u64. The C arithmetic rules doesn't promote "(index >> 8) <<
16" to be u64 and leaves the end result to be u16.

Fixes: 7be76bef32 ("IB/mlx5: Introduce VAR object and its alloc/destroy methods")
Link: https://lore.kernel.org/r/20200212072635.682689-10-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13 10:39:23 -04:00
Yonatan Cohen
9ea04d0df6 IB/umad: Fix kernel crash while unloading ib_umad
When disassociating a device from umad we must ensure that the sysfs
access is prevented before blocking the fops, otherwise assumptions in
syfs don't hold:

	    CPU0            	        CPU1
	 ib_umad_kill_port()        ibdev_show()
	    port->ib_dev = NULL
                                      dev_name(port->ib_dev)

The prior patch made an error in moving the device_destroy(), it should
have been split into device_del() (above) and put_device() (below). At
this point we already have the split, so move the device_del() back to its
original place.

  kernel stack
  PF: error_code(0x0000) - not-present page
  Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
  RIP: 0010:ibdev_show+0x18/0x50 [ib_umad]
  RSP: 0018:ffffc9000097fe40 EFLAGS: 00010282
  RAX: 0000000000000000 RBX: ffffffffa0441120 RCX: ffff8881df514000
  RDX: ffff8881df514000 RSI: ffffffffa0441120 RDI: ffff8881df1e8870
  RBP: ffffffff81caf000 R08: ffff8881df1e8870 R09: 0000000000000000
  R10: 0000000000001000 R11: 0000000000000003 R12: ffff88822f550b40
  R13: 0000000000000001 R14: ffffc9000097ff08 R15: ffff8882238bad58
  FS:  00007f1437ff3740(0000) GS:ffff888236940000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000000004e8 CR3: 00000001e0dfc001 CR4: 00000000001606e0
  Call Trace:
   dev_attr_show+0x15/0x50
   sysfs_kf_seq_show+0xb8/0x1a0
   seq_read+0x12d/0x350
   vfs_read+0x89/0x140
   ksys_read+0x55/0xd0
   do_syscall_64+0x55/0x1b0
   entry_SYSCALL_64_after_hwframe+0x44/0xa9:

Fixes: cf7ad30302 ("IB/umad: Avoid destroying device while it is accessed")
Link: https://lore.kernel.org/r/20200212072635.682689-9-leon@kernel.org
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13 10:00:50 -04:00
Yishai Hadas
a8af8694a5 RDMA/mlx5: Fix async events cleanup flows
As in the prior patch, the devx code is not fully cleaning up its
event_lists before finishing driver_destroy allowing a later read to
trigger user after free conditions.

Re-arrange things so that the event_list is always empty after destroy and
ensure it remains empty until the file is closed.

Fixes: f7c8416cce ("RDMA/core: Simplify destruction of FD uobjects")
Link: https://lore.kernel.org/r/20200212072635.682689-7-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13 09:48:17 -04:00
Michael Guralnik
a0767da777 RDMA/core: Add missing list deletion on freeing event queue
When the uobject file scheme was revised to allow device disassociation
from the file it became possible for read() to still happen the driver
destroys the uobject.

The old clode code was not tolerant to concurrent read, and when it was
moved to the driver destroy it creates a bug.

Ensure the event_list is empty after driver destroy by adding the missing
list_del(). Otherwise read() can trigger a use after free and double
kfree.

Fixes: f7c8416cce ("RDMA/core: Simplify destruction of FD uobjects")
Link: https://lore.kernel.org/r/20200212072635.682689-6-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13 09:44:49 -04:00
Krishnamraju Eraparaju
663218a3e7 RDMA/siw: Remove unwanted WARN_ON in siw_cm_llp_data_ready()
Warnings like below can fill up the dmesg while disconnecting RDMA
connections.
Hence, remove the unwanted WARN_ON.

  WARNING: CPU: 6 PID: 0 at drivers/infiniband/sw/siw/siw_cm.c:1229 siw_cm_llp_data_ready+0xc1/0xd0 [siw]
  RIP: 0010:siw_cm_llp_data_ready+0xc1/0xd0 [siw]
  Call Trace:
   <IRQ>
   tcp_data_queue+0x226/0xb40
   tcp_rcv_established+0x220/0x620
   tcp_v4_do_rcv+0x12a/0x1e0
   tcp_v4_rcv+0xb05/0xc00
   ip_local_deliver_finish+0x69/0x210
   ip_local_deliver+0x6b/0xe0
   ip_rcv+0x273/0x362
   __netif_receive_skb_core+0xb35/0xc30
   netif_receive_skb_internal+0x3d/0xb0
   napi_gro_frags+0x13b/0x200
   t4_ethrx_handler+0x433/0x7d0 [cxgb4]
   process_responses+0x318/0x580 [cxgb4]
   napi_rx_handler+0x14/0x100 [cxgb4]
   net_rx_action+0x149/0x3b0
   __do_softirq+0xe3/0x30a
   irq_exit+0x100/0x110
   do_IRQ+0x7f/0xe0
   common_interrupt+0xf/0xf
   </IRQ>

Link: https://lore.kernel.org/r/20200207141429.27927-1-krishna2@chelsio.com
Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 14:33:58 -04:00
Kamal Heib
beb205dd67 RDMA/siw: Fix setting active_mtu attribute
Make sure to set the active_mtu attribute to avoid report the following
invalid value:

$ ibv_devinfo -d siw0 | grep active_mtu
			active_mtu:		invalid MTU (0)

Fixes: 303ae1cdfd ("rdma/siw: application interface")
Link: https://lore.kernel.org/r/20200205081354.30438-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 14:32:35 -04:00
Shiraz Saleem
9a4b24108d i40iw: Do an RCU lookup in i40iw_add_ipv4_addr
The in_dev_for_each_ifa_rtnl() iterator in i40iw_add_ipv4_addr requires
that the rtnl lock be held. But the rtnl_trylock/unlock scheme in this
function does not guarantee it.

Replace the rtnl locking with an RCU lookup using
in_dev_for_each_ifa_rcu()

Fixes: 8e06af711b ("i40iw: add main, hdr, status")
Link: https://lore.kernel.org/r/20200204223840.2151-1-shiraz.saleem@intel.com
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 14:31:11 -04:00
Krishnamraju Eraparaju
d219face90 RDMA/iw_cxgb4: initiate CLOSE when entering TERM
As per draft-hilland-iwarp-verbs-v1.0, sec 6.2.3, always initiate a CLOSE
when entering into TERM state.

In c4iw_modify_qp(), disconnect operation should only be performed when
the modify_qp call is invoked from ib_core. And all other internal
modify_qp calls(invoked within iw_cxgb4) that needs 'disconnect' should
call c4iw_ep_disconnect() explicitly after modify_qp. Otherwise, deadlocks
like below can occur:

 Call Trace:
  schedule+0x2f/0xa0
  schedule_preempt_disabled+0xa/0x10
  __mutex_lock.isra.5+0x2d0/0x4a0
  c4iw_ep_disconnect+0x39/0x430    => tries to reacquire ep lock again
  c4iw_modify_qp+0x468/0x10d0
  rx_data+0x218/0x570              => acquires ep lock
  process_work+0x5f/0x70
  process_one_work+0x1a7/0x3b0
  worker_thread+0x30/0x390
  kthread+0x112/0x130
  ret_from_fork+0x35/0x40

Fixes: d2c33370ae ("RDMA/iw_cxgb4: Always disconnect when QP is transitioning to TERMINATE state")
Link: https://lore.kernel.org/r/20200204091230.7210-1-krishna2@chelsio.com
Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 14:28:00 -04:00
Mark Zhang
10189e8e6f IB/mlx5: Return failure when rts2rts_qp_counters_set_id is not supported
When binding a QP with a counter and the QP state is not RESET, return
failure if the rts2rts_qp_counters_set_id is not supported by the
device.

This is to prevent cases like manual bind for Connect-IB devices from
returning success when the feature is not supported.

Fixes: d14133dd41 ("IB/mlx5: Support set qp counter")
Link: https://lore.kernel.org/r/20200126171708.5167-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 14:23:01 -04:00
Avihai Horon
a72f4ac1d7 RDMA/core: Fix invalid memory access in spec_filter_size
Add a check that the size specified in the flow spec header doesn't cause
an overflow when calculating the filter size, and thus prevent access to
invalid memory.  The following crash from syzkaller revealed it.

  kasan: CONFIG_KASAN_INLINE enabled
  kasan: GPF could be caused by NULL-ptr deref or user memory access
  general protection fault: 0000 [#1] SMP KASAN PTI
  CPU: 1 PID: 17834 Comm: syz-executor.3 Not tainted 5.5.0-rc5 #2
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
  rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  RIP: 0010:memchr_inv+0xd3/0x330
  Code: 89 f9 89 f5 83 e1 07 0f 85 f9 00 00 00 49 89 d5 49 c1 ed 03 45 85
  ed 74 6f 48 89 d9 48 b8 00 00 00 00 00 fc ff df 48 c1 e9 03 <80> 3c 01
  00 0f 85 0d 02 00 00 44 0f b6 e5 48 b8 01 01 01 01 01 01
  RSP: 0018:ffffc9000a13fa50 EFLAGS: 00010202
  RAX: dffffc0000000000 RBX: 7fff88810de9d820 RCX: 0ffff11021bd3b04
  RDX: 000000000000fff8 RSI: 0000000000000000 RDI: 7fff88810de9d820
  RBP: 0000000000000000 R08: ffff888110d69018 R09: 0000000000000009
  R10: 0000000000000001 R11: ffffed10236267cc R12: 0000000000000004
  R13: 0000000000001fff R14: ffff88810de9d820 R15: 0000000000000040
  FS:  00007f9ee0e51700(0000) GS:ffff88811b100000(0000)
  knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 0000000115ea0006 CR4: 0000000000360ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   spec_filter_size.part.16+0x34/0x50
   ib_uverbs_kern_spec_to_ib_spec_filter+0x691/0x770
   ib_uverbs_ex_create_flow+0x9ea/0x1b40
   ib_uverbs_write+0xaa5/0xdf0
   __vfs_write+0x7c/0x100
   vfs_write+0x168/0x4a0
   ksys_write+0xc8/0x200
   do_syscall_64+0x9c/0x390
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x465b49
  Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89
  f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
  f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007f9ee0e50c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000465b49
  RDX: 00000000000003a0 RSI: 00000000200007c0 RDI: 0000000000000004
  RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9ee0e516bc
  R13: 00000000004ca2da R14: 000000000070deb8 R15: 00000000ffffffff
  Modules linked in:
  Dumping ftrace buffer:
     (ftrace buffer empty)

Fixes: 94e03f11ad ("IB/uverbs: Add support for flow tag")
Link: https://lore.kernel.org/r/20200126171500.4623-1-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 14:14:52 -04:00
Xi Wang
d7e2d3432a RDMA/hns: Optimize eqe buffer allocation flow
The eqe has a private multi-hop addressing implementation, but there is
already a set of interfaces in the hns driver that can achieve this.

So, simplify the eqe buffer allocation process by using the mtr interface
and remove large amount of repeated logic.

Link: https://lore.kernel.org/r/20200126145835.11368-1-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 14:12:43 -04:00
Lang Cheng
b14c95bee8 RDMA/hns: Cleanups of magic numbers
Some magic numbers are hard to understand, so replace them with macros or
add some comments for them.

Link: https://lore.kernel.org/r/20200126145504.9700-1-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 14:10:20 -04:00
Parav Pandit
43fb5892cd RDMA/cma: Use refcount API to reflect refcount
Use a refcount_t for atomics being used as a refcount.

Link: https://lore.kernel.org/r/20200126142652.104803-8-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 14:00:40 -04:00
Parav Pandit
e368d23f57 RDMA/cma: Rename cma_device ref/deref helpers to to get/put
Helper functions which increment/decrement reference count of a
structure read better when they are named with the get/put suffix.

Hence, rename cma_ref/deref_id() to cma_id_get/put().  Also use
cma_get_id() wrapper to find the balancing put() calls.

Link: https://lore.kernel.org/r/20200126142652.104803-7-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 14:00:40 -04:00
Parav Pandit
be439912e7 RDMA/cma: Use refcount API to reflect refcount
Use the refcount variant to capture the reference counting of the cma
device structure.

Link: https://lore.kernel.org/r/20200126142652.104803-6-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 14:00:39 -04:00
Parav Pandit
5ff8c8fa44 RDMA/cma: Rename cma_device ref/deref helpers to to get/put
Helper functions which increment/decrement reference count of the
structure read better when they are named with the get/put suffix.

Hence, rename cma_ref/deref_dev() to cma_dev_get/put().

Link: https://lore.kernel.org/r/20200126142652.104803-5-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 13:55:13 -04:00
Parav Pandit
cc055dd3a7 RDMA/cma: Use RDMA device port iterator
Use RDMA device port iterator to avoid open coding.

Link: https://lore.kernel.org/r/20200126142652.104803-4-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 13:55:13 -04:00
Parav Pandit
081ea5195a RDMA/cma: Use a helper function to enqueue resolve work items
To avoid errors, with attaching ownership of work item and its cm_id
refcount which is decremented in work handler, tie them up in single
helper function. Also avoid code duplication.

Link: https://lore.kernel.org/r/20200126142652.104803-3-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 13:55:13 -04:00
Kaike Wan
f92e487188 IB/rdmavt: Reset all QPs when the device is shut down
When the hfi1 device is shut down during a system reboot, it is possible
that some QPs might have not not freed by ULPs. More requests could be
post sent and a lingering timer could be triggered to schedule more packet
sends, leading to a crash:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000102
  IP: [ffffffff810a65f2] __queue_work+0x32/0x3c0
  PGD 0
  Oops: 0000 1 SMP
  Modules linked in: nvmet_rdma(OE) nvmet(OE) nvme(OE) dm_round_robin nvme_rdma(OE) nvme_fabrics(OE) nvme_core(OE) pal_raw(POE) pal_pmt(POE) pal_cache(POE) pal_pile(POE) pal(POE) pal_compatible(OE) rpcrdma sunrpc ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support mxm_wmi ipmi_ssif pcspkr ses enclosure joydev scsi_transport_sas i2c_i801 sg mei_me lpc_ich mei ioatdma shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter acpi_pad dm_multipath hangcheck_timer ip_tables ext4 mbcache jbd2 mlx4_en
  sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm mlx4_core crct10dif_pclmul crct10dif_common hfi1(OE) igb crc32c_intel rdmavt(OE) ahci ib_core libahci libata ptp megaraid_sas pps_core dca i2c_algo_bit i2c_core devlink dm_mirror dm_region_hash dm_log dm_mod
  CPU: 23 PID: 0 Comm: swapper/23 Tainted: P OE ------------ 3.10.0-693.el7.x86_64 #1
  Hardware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.0028.121720182203 12/17/2018
  task: ffff8808f4ec4f10 ti: ffff8808f4ed8000 task.ti: ffff8808f4ed8000
  RIP: 0010:[ffffffff810a65f2] [ffffffff810a65f2] __queue_work+0x32/0x3c0
  RSP: 0018:ffff88105df43d48 EFLAGS: 00010046
  RAX: 0000000000000086 RBX: 0000000000000086 RCX: 0000000000000000
  RDX: ffff880f74e758b0 RSI: 0000000000000000 RDI: 000000000000001f
  RBP: ffff88105df43d80 R08: ffff8808f3c583c8 R09: ffff8808f3c58000
  R10: 0000000000000002 R11: ffff88105df43da8 R12: ffff880f74e758b0
  R13: 000000000000001f R14: 0000000000000000 R15: ffff88105a300000
  FS: 0000000000000000(0000) GS:ffff88105df40000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000102 CR3: 00000000019f2000 CR4: 00000000001407e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Stack:
  ffff88105b6dd708 0000001f00000286 0000000000000086 ffff88105a300000
  ffff880f74e75800 0000000000000000 ffff88105a300000 ffff88105df43d98
  ffffffff810a6b85 ffff88105a301e80 ffff88105df43dc8 ffffffffc0224cde
  Call Trace:
  IRQ

  [ffffffff810a6b85] queue_work_on+0x45/0x50
  [ffffffffc0224cde] _hfi1_schedule_send+0x6e/0xc0 [hfi1]
  [ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt]
  [ffffffffc0224d62] hfi1_schedule_send+0x32/0x70 [hfi1]
  [ffffffffc0170644] rvt_rc_timeout+0xd4/0x120 [rdmavt]
  [ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt]
  [ffffffff81097316] call_timer_fn+0x36/0x110
  [ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt]
  [ffffffff8109982d] run_timer_softirq+0x22d/0x310
  [ffffffff81090b3f] __do_softirq+0xef/0x280
  [ffffffff816b6a5c] call_softirq+0x1c/0x30
  [ffffffff8102d3c5] do_softirq+0x65/0xa0
  [ffffffff81090ec5] irq_exit+0x105/0x110
  [ffffffff816b76c2] smp_apic_timer_interrupt+0x42/0x50
  [ffffffff816b5c1d] apic_timer_interrupt+0x6d/0x80
  EOI

  [ffffffff81527a02] ? cpuidle_enter_state+0x52/0xc0
  [ffffffff81527b48] cpuidle_idle_call+0xd8/0x210
  [ffffffff81034fee] arch_cpu_idle+0xe/0x30
  [ffffffff810e7bca] cpu_startup_entry+0x14a/0x1c0
  [ffffffff81051af6] start_secondary+0x1b6/0x230
  Code: 89 e5 41 57 41 56 49 89 f6 41 55 41 89 fd 41 54 49 89 d4 53 48 83 ec 10 89 7d d4 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 be 02 00 00 41 f6 86 02 01 00 00 01 0f 85 58 02 00 00 49 c7 c7 28 19 01 00
  RIP [ffffffff810a65f2] __queue_work+0x32/0x3c0
  RSP ffff88105df43d48
  CR2: 0000000000000102

The solution is to reset the QPs before the device resources are freed.
This reset will change the QP state to prevent post sends and delete
timers to prevent callbacks.

Fixes: 0acb0cc7ec ("IB/rdmavt: Initialize and teardown of qpn table")
Link: https://lore.kernel.org/r/20200210131040.87408.38161.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 11:41:32 -04:00
Mike Marciniszyn
be8638344c IB/hfi1: Close window for pq and request coliding
Cleaning up a pq can result in the following warning and panic:

  WARNING: CPU: 52 PID: 77418 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0
  list_del corruption, ffff88cb2c6ac068->next is LIST_POISON1 (dead000000000100)
  Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel ast aesni_intel ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev lpc_ich mei_me drm_panel_orientation_quirks i2c_i801 mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables
   nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci i2c_algo_bit libahci dca ptp libata pps_core crc32c_intel [last unloaded: i2c_algo_bit]
  CPU: 52 PID: 77418 Comm: pvbatch Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.38.3.el7.x86_64 #1
  Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019
  Call Trace:
   [<ffffffff90365ac0>] dump_stack+0x19/0x1b
   [<ffffffff8fc98b78>] __warn+0xd8/0x100
   [<ffffffff8fc98bff>] warn_slowpath_fmt+0x5f/0x80
   [<ffffffff8ff970c3>] __list_del_entry+0x63/0xd0
   [<ffffffff8ff9713d>] list_del+0xd/0x30
   [<ffffffff8fddda70>] kmem_cache_destroy+0x50/0x110
   [<ffffffffc0328130>] hfi1_user_sdma_free_queues+0xf0/0x200 [hfi1]
   [<ffffffffc02e2350>] hfi1_file_close+0x70/0x1e0 [hfi1]
   [<ffffffff8fe4519c>] __fput+0xec/0x260
   [<ffffffff8fe453fe>] ____fput+0xe/0x10
   [<ffffffff8fcbfd1b>] task_work_run+0xbb/0xe0
   [<ffffffff8fc2bc65>] do_notify_resume+0xa5/0xc0
   [<ffffffff90379134>] int_signal+0x12/0x17
  BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
  IP: [<ffffffff8fe1f93e>] kmem_cache_close+0x7e/0x300
  PGD 2cdab19067 PUD 2f7bfdb067 PMD 0
  Oops: 0000 [#1] SMP
  Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel ast aesni_intel ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev lpc_ich mei_me drm_panel_orientation_quirks i2c_i801 mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables
   nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci i2c_algo_bit libahci dca ptp libata pps_core crc32c_intel [last unloaded: i2c_algo_bit]
  CPU: 52 PID: 77418 Comm: pvbatch Kdump: loaded Tainted: G        W  OE  ------------   3.10.0-957.38.3.el7.x86_64 #1
  Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019
  task: ffff88cc26db9040 ti: ffff88b5393a8000 task.ti: ffff88b5393a8000
  RIP: 0010:[<ffffffff8fe1f93e>]  [<ffffffff8fe1f93e>] kmem_cache_close+0x7e/0x300
  RSP: 0018:ffff88b5393abd60  EFLAGS: 00010287
  RAX: 0000000000000000 RBX: ffff88cb2c6ac000 RCX: 0000000000000003
  RDX: 0000000000000400 RSI: 0000000000000400 RDI: ffffffff9095b800
  RBP: ffff88b5393abdb0 R08: ffffffff9095b808 R09: ffffffff8ff77c19
  R10: ffff88b73ce1f160 R11: ffffddecddde9800 R12: ffff88cb2c6ac000
  R13: 000000000000000c R14: ffff88cf3fdca780 R15: 0000000000000000
  FS:  00002aaaaab52500(0000) GS:ffff88b73ce00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000010 CR3: 0000002d27664000 CR4: 00000000007607e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
   [<ffffffff8fe20d44>] __kmem_cache_shutdown+0x14/0x80
   [<ffffffff8fddda78>] kmem_cache_destroy+0x58/0x110
   [<ffffffffc0328130>] hfi1_user_sdma_free_queues+0xf0/0x200 [hfi1]
   [<ffffffffc02e2350>] hfi1_file_close+0x70/0x1e0 [hfi1]
   [<ffffffff8fe4519c>] __fput+0xec/0x260
   [<ffffffff8fe453fe>] ____fput+0xe/0x10
   [<ffffffff8fcbfd1b>] task_work_run+0xbb/0xe0
   [<ffffffff8fc2bc65>] do_notify_resume+0xa5/0xc0
   [<ffffffff90379134>] int_signal+0x12/0x17
  Code: 00 00 ba 00 04 00 00 0f 4f c2 3d 00 04 00 00 89 45 bc 0f 84 e7 01 00 00 48 63 45 bc 49 8d 04 c4 48 89 45 b0 48 8b 80 c8 00 00 00 <48> 8b 78 10 48 89 45 c0 48 83 c0 10 48 89 45 d0 48 8b 17 48 39
  RIP  [<ffffffff8fe1f93e>] kmem_cache_close+0x7e/0x300
   RSP <ffff88b5393abd60>
  CR2: 0000000000000010

The panic is the result of slab entries being freed during the destruction
of the pq slab.

The code attempts to quiesce the pq, but looking for n_req == 0 doesn't
account for new requests.

Fix the issue by using SRCU to get a pq pointer and adjust the pq free
logic to NULL the fd pq pointer prior to the quiesce.

Fixes: e87473bc1b ("IB/hfi1: Only set fd pointer when base context is completely initialized")
Link: https://lore.kernel.org/r/20200210131033.87408.81174.stgit@awfm-01.aw.intel.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 11:41:31 -04:00
Kaike Wan
a70ed0f2e6 IB/hfi1: Acquire lock to release TID entries when user file is closed
Each user context is allocated a certain number of RcvArray (TID)
entries and these entries are managed through TID groups. These groups
are put into one of three lists in each user context: tid_group_list,
tid_used_list, and tid_full_list, depending on the number of used TID
entries within each group. When TID packets are expected, one or more
TID groups will be allocated. After the packets are received, the TID
groups will be freed. Since multiple user threads may access the TID
groups simultaneously, a mutex exp_mutex is used to synchronize the
access. However, when the user file is closed, it tries to release
all TID groups without acquiring the mutex first, which risks a race
condition with another thread that may be releasing its TID groups,
leading to data corruption.

This patch addresses the issue by acquiring the mutex first before
releasing the TID groups when the file is closed.

Fixes: 3abb33ac65 ("staging/hfi1: Add TID cache receive init and free funcs")
Link: https://lore.kernel.org/r/20200210131026.87408.86853.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 11:41:31 -04:00
Kamal Heib
8a4f300b97 RDMA/hfi1: Fix memory leak in _dev_comp_vect_mappings_create
Make sure to free the allocated cpumask_var_t's to avoid the following
reported memory leak by kmemleak:

$ cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff8897f812d6a8 (size 8):
  comm "kworker/1:1", pid 347, jiffies 4294751400 (age 101.703s)
  hex dump (first 8 bytes):
    00 00 00 00 00 00 00 00                          ........
  backtrace:
    [<00000000bff49664>] alloc_cpumask_var_node+0x4c/0xb0
    [<0000000075d3ca81>] hfi1_comp_vectors_set_up+0x20f/0x800 [hfi1]
    [<0000000098d420df>] hfi1_init_dd+0x3311/0x4960 [hfi1]
    [<0000000071be7e52>] init_one+0x25e/0xf10 [hfi1]
    [<000000005483d4c2>] local_pci_probe+0xd4/0x180
    [<000000007c3cbc6e>] work_for_cpu_fn+0x51/0xa0
    [<000000001d626905>] process_one_work+0x8f0/0x17b0
    [<000000007e569e7e>] worker_thread+0x536/0xb50
    [<00000000fd39a4a5>] kthread+0x30c/0x3d0
    [<0000000056f2edb3>] ret_from_fork+0x3a/0x50

Fixes: 5d18ee67d4 ("IB/{hfi1, rdmavt, qib}: Implement CQ completion vector support")
Link: https://lore.kernel.org/r/20200205110530.12129-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 11:35:45 -04:00
Linus Torvalds
8fdd4019bc RDMA subsystem updates for 5.6
- Driver updates and cleanup for qedr, bnxt_re, hns, siw, mlx5, mlx4, rxe,
   i40iw
 
 - Larger series doing cleanup and rework for hns and hfi1.
 
 - Some general reworking of the CM code to make it a little more
   understandable
 
 - Unify the different code paths connected to the uverbs FD scheme
 
 - New UAPI ioctls conversions for get context and get async fd
 
 - Trace points for CQ and CM portions of the RDMA stack
 
 - mlx5 driver support for virtio-net formatted rings as RDMA raw ethernet QPs
 
 - verbs support for setting the PCI-E relaxed ordering bit on DMA traffic
   connected to a MR
 
 - A couple of bug fixes that came too late to make rc7
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl4zPwQACgkQOG33FX4g
 mxoURw//fuQmuJ7aTMH+0qrhaZUmzXOcI/WKvY0YMyYLvxolRcIO+uCL239wxezR
 9iTHPO7HeYXUQ4W8Hi/fTyuQ9hzaPOP3wgOJfQhm4QT/XDpRW0H3Mb+hTLHTUAcA
 rgKc9suAn+5BbIDOz7hEfeOTssx1wYrLsaHDc11NZ42JuG6uvPR33lhXiKWG+5tH
 2MpfeTU6BjL035dm3YZXCo+ouobpdMuvzJItYIsB2E5Nl0s91SMzsymIYiD0gb3t
 yUJ3wqPW3pchfAl8VEn+W5AHTUYYgGjmEblL8WdVq5JRrkQgQzj8QtCRT9NOPAT0
 LivCvgBrm0kscaQS2TjtG56Ojbwz8z1QPE/4shf0pj/G2lZfacYDAeaUc/2VafxY
 y/KG+3dB1DxtYY3eXJUxbB7Vpk7kfr35p5b75NdMhd2t49oPgV7EKoZMLYGzfX4S
 PtyNyNSiwx8qsRTr4lznOMswmrDLfG4XiywWgYo6NGOWyKYlARWIYBAEQZ0DPTiE
 9mqJ19gusdSdAgm8LGDInPmH6/AojGOVzYonJFWdlOtwCXGNXL4Gx02x4WYHykDG
 w+oy5NMJbU3b6+MWEagkuQNcrwqv02MT1mB/Lgv4GPm6rS0UXR7zUPDeccE50fSL
 X36k28UlftlPlaD7PeJdTOAhyBv5DxfpL5rbB2TfpUTpNxjayuU=
 =hepK
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "A very quiet cycle with few notable changes. Mostly the usual list of
  one or two patches to drivers changing something that isn't quite rc
  worthy. The subsystem seems to be seeing a larger number of rework and
  cleanup style patches right now, I feel that several vendors are
  prepping their drivers for new silicon.

  Summary:

   - Driver updates and cleanup for qedr, bnxt_re, hns, siw, mlx5, mlx4,
     rxe, i40iw

   - Larger series doing cleanup and rework for hns and hfi1.

   - Some general reworking of the CM code to make it a little more
     understandable

   - Unify the different code paths connected to the uverbs FD scheme

   - New UAPI ioctls conversions for get context and get async fd

   - Trace points for CQ and CM portions of the RDMA stack

   - mlx5 driver support for virtio-net formatted rings as RDMA raw
     ethernet QPs

   - verbs support for setting the PCI-E relaxed ordering bit on DMA
     traffic connected to a MR

   - A couple of bug fixes that came too late to make rc7"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (108 commits)
  RDMA/core: Make the entire API tree static
  RDMA/efa: Mask access flags with the correct optional range
  RDMA/cma: Fix unbalanced cm_id reference count during address resolve
  RDMA/umem: Fix ib_umem_find_best_pgsz()
  IB/mlx4: Fix leak in id_map_find_del
  IB/opa_vnic: Spelling correction of 'erorr' to 'error'
  IB/hfi1: Fix logical condition in msix_request_irq
  RDMA/cm: Remove CM message structs
  RDMA/cm: Use IBA functions for complex structure members
  RDMA/cm: Use IBA functions for simple structure members
  RDMA/cm: Use IBA functions for swapping get/set acessors
  RDMA/cm: Use IBA functions for simple get/set acessors
  RDMA/cm: Add SET/GET implementations to hide IBA wire format
  RDMA/cm: Add accessors for CM_REQ transport_type
  IB/mlx5: Return the administrative GUID if exists
  RDMA/core: Ensure that rdma_user_mmap_entry_remove() is a fence
  IB/mlx4: Fix memory leak in add_gid error flow
  IB/mlx5: Expose RoCE accelerator counters
  RDMA/mlx5: Set relaxed ordering when requested
  RDMA/core: Add the core support field to METHOD_GET_CONTEXT
  ...
2020-01-31 14:40:36 -08:00
John Hubbard
f1f6a7dd9b mm, tree-wide: rename put_user_page*() to unpin_user_page*()
In order to provide a clearer, more symmetric API for pinning and
unpinning DMA pages.  This way, pin_user_pages*() calls match up with
unpin_user_pages*() calls, and the API is a lot closer to being
self-explanatory.

Link: http://lkml.kernel.org/r/20200107224558.2362728-23-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:38 -08:00
John Hubbard
dfa0a4fff1 IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP
Convert infiniband to use the new pin_user_pages*() calls.

Also, revert earlier changes to Infiniband ODP that had it using
put_user_page().  ODP is "Case 3" in
Documentation/core-api/pin_user_pages.rst, which is to say, normal
get_user_pages() and put_page() is the API to use there.

The new pin_user_pages*() calls replace corresponding get_user_pages*()
calls, and set the FOLL_PIN flag.  The FOLL_PIN flag requires that the
caller must return the pages via put_user_page*() calls, but infiniband
was already doing that as part of an earlier commit.

Link: http://lkml.kernel.org/r/20200107224558.2362728-14-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:37 -08:00
John Hubbard
4789fcdd14 IB/umem: use get_user_pages_fast() to pin DMA pages
And get rid of the mmap_sem calls, as part of that.  Note that
get_user_pages_fast() will, if necessary, fall back to
__gup_longterm_unlocked(), which takes the mmap_sem as needed.

Link: http://lkml.kernel.org/r/20200107224558.2362728-10-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:37 -08:00
Jason Gunthorpe
8889f6fa35 RDMA/core: Make the entire API tree static
Compilation of mlx5 driver without CONFIG_INFINIBAND_USER_ACCESS generates
the following error.

on x86_64:

 ld: drivers/infiniband/hw/mlx5/main.o: in function `mlx5_ib_handler_MLX5_IB_METHOD_VAR_OBJ_ALLOC':
 main.c:(.text+0x186d): undefined reference to `ib_uverbs_get_ucontext_file'
 ld: drivers/infiniband/hw/mlx5/main.o:(.rodata+0x2480): undefined reference to `uverbs_idr_class'
 ld: drivers/infiniband/hw/mlx5/main.o:(.rodata+0x24d8): undefined reference to `uverbs_destroy_def_handler'

This is happening because some parts of the UAPI description are not
static. This is a hold over from earlier code that relied on struct
pointers to refer to object types, now object types are referenced by
number. Remove the unused globals and add statics to the remaining UAPI
description elements.

Remove the redundent #ifdefs around mlx5_ib_*defs and obsolete
mlx5_ib_get_devx_tree().

The compiler now trims alot more unused code, including the above
problematic definitions when !CONFIG_INFINIBAND_USER_ACCESS.

Fixes: 7be76bef32 ("IB/mlx5: Introduce VAR object and its alloc/destroy methods")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-30 16:28:52 -04:00
Gal Pressman
ba19e16651 RDMA/efa: Mask access flags with the correct optional range
The uapi value IB_UVERBS_ACCESS_OPTIONAL_RANGE shouldn't be used inside
the driver, use IB_ACCESS_OPTIONAL instead.

Fixes: 86dd738cf2 ("RDMA/efa: Allow passing of optional access flags for MR registration")
Link: https://lore.kernel.org/r/20200129071803.40117-1-galpress@amazon.com
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-29 16:41:05 -04:00
Linus Torvalds
bd2463ac7d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:

 1) Add WireGuard

 2) Add HE and TWT support to ath11k driver, from John Crispin.

 3) Add ESP in TCP encapsulation support, from Sabrina Dubroca.

 4) Add variable window congestion control to TIPC, from Jon Maloy.

 5) Add BCM84881 PHY driver, from Russell King.

 6) Start adding netlink support for ethtool operations, from Michal
    Kubecek.

 7) Add XDP drop and TX action support to ena driver, from Sameeh
    Jubran.

 8) Add new ipv4 route notifications so that mlxsw driver does not have
    to handle identical routes itself. From Ido Schimmel.

 9) Add BPF dynamic program extensions, from Alexei Starovoitov.

10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes.

11) Add support for macsec HW offloading, from Antoine Tenart.

12) Add initial support for MPTCP protocol, from Christoph Paasch,
    Matthieu Baerts, Florian Westphal, Peter Krystad, and many others.

13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu
    Cherian, and others.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits)
  net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC
  udp: segment looped gso packets correctly
  netem: change mailing list
  qed: FW 8.42.2.0 debug features
  qed: rt init valid initialization changed
  qed: Debug feature: ilt and mdump
  qed: FW 8.42.2.0 Add fw overlay feature
  qed: FW 8.42.2.0 HSI changes
  qed: FW 8.42.2.0 iscsi/fcoe changes
  qed: Add abstraction for different hsi values per chip
  qed: FW 8.42.2.0 Additional ll2 type
  qed: Use dmae to write to widebus registers in fw_funcs
  qed: FW 8.42.2.0 Parser offsets modified
  qed: FW 8.42.2.0 Queue Manager changes
  qed: FW 8.42.2.0 Expose new registers and change windows
  qed: FW 8.42.2.0 Internal ram offsets modifications
  MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver
  Documentation: net: octeontx2: Add RVU HW and drivers overview
  octeontx2-pf: ethtool RSS config support
  octeontx2-pf: Add basic ethtool support
  ...
2020-01-28 16:02:33 -08:00
Parav Pandit
b4fb4cc5ba RDMA/cma: Fix unbalanced cm_id reference count during address resolve
Below commit missed the AF_IB and loopback code flow in
rdma_resolve_addr().  This leads to an unbalanced cm_id refcount in
cma_work_handler() which puts the refcount which was not incremented prior
to queuing the work.

A call trace is observed with such code flow:

 BUG: unable to handle kernel NULL pointer dereference at (null)
 [<ffffffff96b67e16>] __mutex_lock_slowpath+0x166/0x1d0
 [<ffffffff96b6715f>] mutex_lock+0x1f/0x2f
 [<ffffffffc0beabb5>] cma_work_handler+0x25/0xa0
 [<ffffffff964b9ebf>] process_one_work+0x17f/0x440
 [<ffffffff964baf56>] worker_thread+0x126/0x3c0

Hence, hold the cm_id reference when scheduling the resolve work item.

Fixes: 722c7b2bfe ("RDMA/{cma, core}: Avoid callback on rdma_addr_cancel()")
Link: https://lore.kernel.org/r/20200126142652.104803-2-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-28 14:15:23 -04:00
Artemy Kovalyov
36798d5ae1 RDMA/umem: Fix ib_umem_find_best_pgsz()
Except for the last entry, the ending iova alignment sets the maximum
possible page size as the low bits of the iova must be zero when starting
the next chunk.

Fixes: 4a35339958 ("RDMA/umem: Add API to find best driver supported page size in an MR")
Link: https://lore.kernel.org/r/20200128135612.174820-1-leon@kernel.org
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Tested-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-28 14:10:54 -04:00
Linus Torvalds
c0e809e244 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel side changes:

   - Ftrace is one of the last W^X violators (after this only KLP is
     left). These patches move it over to the generic text_poke()
     interface and thereby get rid of this oddity. This requires a
     surprising amount of surgery, by Peter Zijlstra.

   - x86/AMD PMUs: add support for 'Large Increment per Cycle Events' to
     count certain types of events that have a special, quirky hw ABI
     (by Kim Phillips)

   - kprobes fixes by Masami Hiramatsu

  Lots of tooling updates as well, the following subcommands were
  updated: annotate/report/top, c2c, clang, record, report/top TUI,
  sched timehist, tests; plus updates were done to the gtk ui, libperf,
  headers and the parser"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
  perf/x86/amd: Add support for Large Increment per Cycle Events
  perf/x86/amd: Constrain Large Increment per Cycle events
  perf/x86/intel/rapl: Add Comet Lake support
  tracing: Initialize ret in syscall_enter_define_fields()
  perf header: Use last modification time for timestamp
  perf c2c: Fix return type for histogram sorting comparision functions
  perf beauty sockaddr: Fix augmented syscall format warning
  perf/ui/gtk: Fix gtk2 build
  perf ui gtk: Add missing zalloc object
  perf tools: Use %define api.pure full instead of %pure-parser
  libperf: Setup initial evlist::all_cpus value
  perf report: Fix no libunwind compiled warning break s390 issue
  perf tools: Support --prefix/--prefix-strip
  perf report: Clarify in help that --children is default
  tools build: Fix test-clang.cpp with Clang 8+
  perf clang: Fix build with Clang 9
  kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic
  tools lib: Fix builds when glibc contains strlcpy()
  perf report/top: Make 'e' visible in the help and make it toggle showing callchains
  perf report/top: Do not offer annotation for symbols without samples
  ...
2020-01-28 09:44:15 -08:00
Linus Torvalds
634cd4b6af Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Cleanup of the GOP [graphics output] handling code in the EFI stub

   - Complete refactoring of the mixed mode handling in the x86 EFI stub

   - Overhaul of the x86 EFI boot/runtime code

   - Increase robustness for mixed mode code

   - Add the ability to disable DMA at the root port level in the EFI
     stub

   - Get rid of RWX mappings in the EFI memory map and page tables,
     where possible

   - Move the support code for the old EFI memory mapping style into its
     only user, the SGI UV1+ support code.

   - plus misc fixes, updates, smaller cleanups.

  ... and due to interactions with the RWX changes, another round of PAT
  cleanups make a guest appearance via the EFI tree - with no side
  effects intended"

* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
  efi/x86: Disable instrumentation in the EFI runtime handling code
  efi/libstub/x86: Fix EFI server boot failure
  efi/x86: Disallow efi=old_map in mixed mode
  x86/boot/compressed: Relax sed symbol type regex for LLVM ld.lld
  efi/x86: avoid KASAN false positives when accessing the 1: 1 mapping
  efi: Fix handling of multiple efi_fake_mem= entries
  efi: Fix efi_memmap_alloc() leaks
  efi: Add tracking for dynamically allocated memmaps
  efi: Add a flags parameter to efi_memory_map
  efi: Fix comment for efi_mem_type() wrt absent physical addresses
  efi/arm: Defer probe of PCIe backed efifb on DT systems
  efi/x86: Limit EFI old memory map to SGI UV machines
  efi/x86: Avoid RWX mappings for all of DRAM
  efi/x86: Don't map the entire kernel text RW for mixed mode
  x86/mm: Fix NX bit clearing issue in kernel_map_pages_in_pgd
  efi/libstub/x86: Fix unused-variable warning
  efi/libstub/x86: Use mandatory 16-byte stack alignment in mixed mode
  efi/libstub/x86: Use const attribute for efi_is_64bit()
  efi: Allow disabling PCI busmastering on bridges during boot
  efi/x86: Allow translating 64-bit arguments for mixed mode calls
  ...
2020-01-28 09:03:40 -08:00
Linus Torvalds
6a1000bd27 ioremap changes for 5.6
- remove ioremap_nocache given that is is equivalent to
    ioremap everywhere
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl4vKHwLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYMPGBAAuVNUZaZfWYHpiVP2oRcUQUguFiD3NTbknsyzV2oH
 J9P0GfeENSKwE9OOhZ7XIjnCZAJwQgTK/ppQY5yiQ/KAtYyyXjXEJ6jqqjiTDInr
 +3+I3t/LhkgrK7tMrb7ylTGa/d7KhaciljnOXC8+b75iddvM9I1z2pbHDbppZMS9
 wT4RXL/cFtRb85AfOyPLybcka3f5P2gGvQz38qyimhJYEzHDXZu9VO1Bd20f8+Xf
 eLBKX0o6yWMhcaPLma8tm0M0zaXHEfLHUKLSOkiOk+eHTWBZ3b/w5nsOQZYZ7uQp
 25yaClbameAn7k5dHajduLGEJv//ZjLRWcN3HJWJ5vzO111aHhswpE7JgTZJSVWI
 ggCVkytD3ESXapvswmACSeCIDMmiJMzvn6JvwuSMVB7a6e5mcqTuGo/FN+DrBF/R
 IP+/gY/T7zIIOaljhQVkiEIIwiD/akYo0V9fheHTBnqcKEDTHV4WjKbeF6aCwcO+
 b8inHyXZSKSMG//UlDuN84/KH/o1l62oKaB1uDIYrrL8JVyjAxctWt3GOt5KgSFq
 wVz1lMw4kIvWtC/Sy2H4oB+RtODLp6yJDqmvmPkeJwKDUcd/1JKf0KsZ8j3FpGei
 /rEkBEss0KBKyFAgBSRO2jIpdj2epgcBcsdB/r5mlhcn8L77AS6mHbA173kY4pQ/
 Kdg=
 =TUCJ
 -----END PGP SIGNATURE-----

Merge tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremap

Pull ioremap updates from Christoph Hellwig:
 "Remove the ioremap_nocache API (plus wrappers) that are always
  identical to ioremap"

* tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremap:
  remove ioremap_nocache and devm_ioremap_nocache
  MIPS: define ioremap_nocache to ioremap
2020-01-27 13:03:00 -08:00
Håkon Bugge
ea660ad7c1 IB/mlx4: Fix leak in id_map_find_del
Using CX-3 virtual functions, either from a bare-metal machine or
pass-through from a VM, MAD packets are proxied through the PF driver.

Since the VF drivers have separate name spaces for MAD Transaction Ids
(TIDs), the PF driver has to re-map the TIDs and keep the book keeping in
a cache.

Following the RDMA Connection Manager (CM) protocol, it is clear when an
entry has to evicted from the cache. When a DREP is sent from
mlx4_ib_multiplex_cm_handler(), id_map_find_del() is called. Similar when
a REJ is received by the mlx4_ib_demux_cm_handler(), id_map_find_del() is
called.

This function wipes out the TID in use from the IDR or XArray and removes
the id_map_entry from the table.

In short, it does everything except the topping of the cake, which is to
remove the entry from the list and free it. In other words, for the REJ
case enumerated above, one id_map_entry will be leaked.

For the other case above, a DREQ has been received first. The reception of
the DREQ will trigger queuing of a delayed work to delete the
id_map_entry, for the case where the VM doesn't send back a DREP.

In the normal case, the VM _will_ send back a DREP, and id_map_find_del()
will be called.

But this scenario introduces a secondary leak. First, when the DREQ is
received, a delayed work is queued. The VM will then return a DREP, which
will call id_map_find_del(). As stated above, this will free the TID used
from the XArray or IDR. Now, there is window where that particular TID can
be re-allocated, lets say by an outgoing REQ. This TID will later be wiped
out by the delayed work, when the function id_map_ent_timeout() is
called. But the id_map_entry allocated by the outgoing REQ will not be
de-allocated, and we have a leak.

Both leaks are fixed by removing the id_map_find_del() function and only
using schedule_delayed(). Of course, a check in schedule_delayed() to see
if the work already has been queued, has been added.

Another benefit of always using the delayed version for deleting entries,
is that we do get a TimeWait effect; a TID no longer in use, will occupy
the XArray or IDR for CM_CLEANUP_CACHE_TIMEOUT time, without any ability
of being re-used for that time period.

Fixes: 3cf69cc8db ("IB/mlx4: Add CM paravirtualization")
Link: https://lore.kernel.org/r/20200123155521.1212288-1-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>
Reviewed-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-27 16:46:53 -04:00
Linus Torvalds
54343d9518 SCSI fixes on 20200126
Two last minute fixes, both in drivers.  The fnic one is a highly
 unlikely condition, but the RDMA one is a recently introduced
 regression that causes a kernel warning to trigger in every RDMA
 logon, which would be unsightly if it got into the final release.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJsEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXi3VRyYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishbrpAP9I/pEp
 TWu/QkqFFrmuYbzuxtRML7X2T7+B96J/CRtQvQD3TAIW0gvw49Uj25yEwTRnVzCs
 1A+eELAahzBPW+rRBw==
 =C3yx
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Two last minute fixes, both in drivers.

  The fnic one is a highly unlikely condition, but the RDMA one is a
  recently introduced regression that causes a kernel warning to trigger
  in every RDMA logon, which would be unsightly if it got into the final
  release"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: RDMA/isert: Fix a recently introduced regression related to logout
  scsi: fnic: do not queue commands during fwreset
2020-01-26 10:39:09 -08:00
Dillon Brock
7f04c71f1f IB/opa_vnic: Spelling correction of 'erorr' to 'error'
Correcting a minor spelling mistake in the comments.

Link: https://lore.kernel.org/r/20200118162542.15188-1-dab9861@gmail.com
Signed-off-by: Dillon Brock <dab9861@gmail.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25 15:37:56 -04:00
Nathan Chancellor
79ba4f9310 IB/hfi1: Fix logical condition in msix_request_irq
Clang warns:

drivers/infiniband/hw/hfi1/msix.c:136:22: warning: overlapping
comparisons always evaluate to false [-Wtautological-overlap-compare]
        if (type < IRQ_SDMA && type >= IRQ_OTHER)
            ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
1 warning generated.

It is impossible for something to be less than 0 (IRQ_SDMA) and greater
than or equal to 3 (IRQ_OTHER) at the same time. A logical OR should
have been used to keep the same logic as before.

Link: https://lore.kernel.org/r/20200116222658.5285-1-natechancellor@gmail.com
Link: https://github.com/ClangBuiltLinux/linux/issues/841
Fixes: 13d2a8384b ("IB/hfi1: Decouple IRQ name from type")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25 15:33:53 -04:00
Jason Gunthorpe
13e0af1801 RDMA/cm: Remove CM message structs
All accesses now use the new IBA acessor scheme, so delete the structs
entirely and generate the structures from the schema file.

Link: https://lore.kernel.org/r/20200116170037.30109-8-jgg@ziepe.ca
Tested-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25 15:11:37 -04:00
Jason Gunthorpe
4ca662a30a RDMA/cm: Use IBA functions for complex structure members
Use a Coccinelle spatch to replace CM structure members used as
structures, arrays, or pointers with IBA_GET/SET versions. Applied with

$ spatch --sp-file edits.sp --in-place drivers/infiniband/core/cm.c

The spatch file was generated using the template pattern:

@@
expression src;
expression len;
{struct} *msg;
@@
- memcpy(msg->{old_name}, src, len)
+ IBA_SET_MEM({new_name}, msg, src, len)
@@
{struct} *msg;
identifier x;
@@
- msg->{old_name}.x
+ IBA_GET_MEM_PTR({new_name}, msg)->x
@@
{struct} *msg;
@@
- &msg->{old_name}
+ IBA_GET_MEM_PTR({new_name}, msg)

For GIDs:
@@
{struct} *msg;
@@
- msg->{old_name}
+ *IBA_GET_MEM_PTR({new_name}, msg)

For non-GIDs:
@@
{struct} *msg;
@@
- msg->{old_name}
+ IBA_GET_MEM_PTR({new_name}, msg)

Iterated for every remaining IBA_CHECK_OFF()/IBA_CHECK_GET()
pairing. Touched up with clang-format after.

Link: https://lore.kernel.org/r/20200116170037.30109-7-jgg@ziepe.ca
Tested-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25 15:06:01 -04:00
Jason Gunthorpe
91b60a7128 RDMA/cm: Use IBA functions for simple structure members
Use a Coccinelle spatch script to replace use of simple CM structure
members with IBA_GET/SET versions. Applied with

$ spatch --sp-file edits.sp --in-place drivers/infiniband/core/cm.c

The spatch file was generated using the template pattern:

@@
expression val;
{struct} *msg;
@@
- msg->{old_name} = val
+ IBA_SET({new_name}, msg, be{bits}_to_cpu(val))
@@
{struct} *msg;
@@
- msg->{old_name}
+ cpu_to_be{bits}(IBA_GET({new_name}, msg))

Iterated for every IBA_CHECK_OFF that isn't a CM_FIELD_MLOC.

And the below iterated over all byte sizes to remove doubled byte swaps:

@@
expression val;
@@
-be{bits}_to_cpu(cpu_to_be{bits}(val))
+val

(and __be_to_cpu and ntoh varients)

Touched up with clang-format after.

Link: https://lore.kernel.org/r/20200116170037.30109-6-jgg@ziepe.ca
Tested-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25 15:06:00 -04:00
Jason Gunthorpe
01adb7f46f RDMA/cm: Use IBA functions for swapping get/set acessors
Use a Coccinelle spatch script to replace CM helper functions that
return/accept BE values with IBA_GET/SET versions. Applied with

$ spatch --sp-file edits.sp --in-place drivers/infiniband/core/cm.c

The spatch file was generated using the template pattern:

@@
expression val;
{struct} *msg;
@@
- {old_setter}(msg, val)
+ IBA_SET({new_name}, msg, be{bits}_to_cpu(val))
@@
{struct} *msg;
@@
- {old_getter}(msg)
+ cpu_to_be{bits}(IBA_GET({new_name}, msg))

Iterated for every IBA_CHECK_GET_BE()/IBA_CHECK_SET_BE() pairing.

And the below iterated over all byte sizes to remove doubled byte swaps:

@@
expression val;
@@
-be{bits}_to_cpu(cpu_to_be{bits}(val))
+val

(and __be_to_cpu and ntoh varients)

Touched up with clang-format after.

Link: https://lore.kernel.org/r/20200116170037.30109-5-jgg@ziepe.ca
Tested-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25 15:06:00 -04:00
Jason Gunthorpe
b6bbee6889 RDMA/cm: Use IBA functions for simple get/set acessors
Use a Coccinelle spatch to replace CM helper functions with IBA_GET/SET
versions. Applied with

$ spatch --sp-file edits.sp --in-place drivers/infiniband/core/cm.c

The spatch file was generated using the template pattern:

@@
expression val;
{struct} *msg;
@@
- {old_setter}
+ IBA_SET({new_name}, msg, val)
@@
{struct} *msg;
@@
- {old_getter}
+ IBA_GET({new_name}, msg)

Iterated for every IBA_CHECK_GET()/IBA_CHECK_GET() pairing. Touched up
with clang-format after.

Link: https://lore.kernel.org/r/20200116170037.30109-4-jgg@ziepe.ca
Tested-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25 15:05:59 -04:00
Leon Romanovsky
d05d4ac4c9 RDMA/cm: Add SET/GET implementations to hide IBA wire format
There is no separation between RDMA-CM wire format as it is declared in
IBTA and kernel logic which implements needed support. Such situation
causes to many mistakes in conversion between big-endian (wire format)
and CPU format used by kernel. It also mixes RDMA core code with
combination of uXX and beXX variables.

The idea that all accesses to IBA definitions will go through special
GET/SET macros to ensure that no conversion mistakes are made. The
shifting and masking required to read the value is automatically deduced
using the field offset description from the tables in the IBA
specification.

This starts with the CM MADs described in IBTA release 1.3 volume 1.

To confirm that the new macros behave the same as the old accessors a
self-test is included in this patch.

Each macro replacing a straightforward struct field compile-time tests
that the new field has the same offsetof() and width as the old field.

For the fields with accessor functions a runtime test, the 'all ones'
value is placed in a dummy message and read back in several ways to
confirm that both approaches give identical results.

Later patches in this series delete the self test.

This creates a tested table of new field name, old field name(s) and some
meta information like BE coding for the functions which will be used in
the next patches.

Link: https://lore.kernel.org/r/20200116170037.30109-3-jgg@ziepe.ca
Link: https://lore.kernel.org/r/20191212093830.316934-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Tested-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25 15:05:59 -04:00
Jason Gunthorpe
792a7c1f2e RDMA/cm: Add accessors for CM_REQ transport_type
Access the two fields through wrappers, like all other fields, to make it
clearer what is happening.

Link: https://lore.kernel.org/r/20200116170037.30109-2-jgg@ziepe.ca
Tested-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25 15:05:59 -04:00
Danit Goldberg
4bbd4923d1 IB/mlx5: Return the administrative GUID if exists
A user can change the operational GUID (a.k.a affective GUID) through
link/infiniband. Therefore it is preferred to return the currently set
GUID if it exists instead of the operational.

This way the PF can query which VF GUID will be set in the next bind.  In
order to align with MAC address, zero is returned if administrative GUID
is not set.

For example, before setting administrative GUID:
 $ ip link show
 ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP mode DEFAULT group default qlen 256
 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
 vf 0     link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
 spoof checking off, NODE_GUID 00:00:00:00:00:00:00:00, PORT_GUID 00:00:00:00:00:00:00:00, link-state auto, trust off, query_rss off

Then:

 $ ip link set ib0 vf 0 node_guid 11:00:af:21:cb:05:11:00
 $ ip link set ib0 vf 0 port_guid 22:11:af:21:cb:05:11:00

After setting administrative GUID:
 $ ip link show
 ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP mode DEFAULT group default qlen 256
 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
 vf 0     link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
 spoof checking off, NODE_GUID 11:00:af:21:cb:05:11:00, PORT_GUID 22:11:af:21:cb:05:11:00, link-state auto, trust off, query_rss off

Fixes: 9c0015ef09 ("IB/mlx5: Implement callbacks for getting VFs GUID attributes")
Link: https://lore.kernel.org/r/20200116120048.12744-1-leon@kernel.org
Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25 14:54:39 -04:00
Jason Gunthorpe
6b3712c024 RDMA/core: Ensure that rdma_user_mmap_entry_remove() is a fence
The set of entry->driver_removed is missing locking, protect it with
xa_lock() which is held by the only reader.

Otherwise readers may continue to see driver_removed = false after
rdma_user_mmap_entry_remove() returns and may continue to try and
establish new mmaps.

Fixes: 3411f9f01b ("RDMA/core: Create mmap database and cookie helper functions")
Link: https://lore.kernel.org/r/20200115202041.GA17199@ziepe.ca
Reviewed-by: Gal Pressman <galpress@amazon.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25 14:48:33 -04:00
Jason Gunthorpe
e8b3a426fb Use ODP MRs for kernel ULPs
The following series extends MR creation routines to allow creation of
 user MRs through kernel ULPs as a proxy. The immediate use case is to
 allow RDS to work over FS-DAX, which requires ODP (on-demand-paging)
 MRs to be created and such MRs were not possible to create prior this
 series.
 
 The first part of this patchset extends RDMA to have special verb
 ib_reg_user_mr(). The common use case that uses this function is a
 userspace application that allocates memory for HCA access but the
 responsibility to register the memory at the HCA is on an kernel ULP.
 This ULP acts as an agent for the userspace application.
 
 The second part provides advise MR functionality for ULPs. This is
 integral part of ODP flows and used to trigger pagefaults in advance
 to prepare memory before running working set.
 
 The third part is actual user of those in-kernel APIs.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQT1m3YD37UfMCUQBNwp8NhrnBAZsQUCXiVO8AAKCRAp8NhrnBAZ
 scTrAP9gb0d3qv0IOtHw5aGI1DAgjTUn/SzUOnsjDEn7DIoh9gEA2+ZmaEyLXKrl
 +UcZb31auy5P8ueJYokRLhLAyRcOIAg=
 =yaHb
 -----END PGP SIGNATURE-----

Merge tag 'rds-odp-for-5.5' into rdma.git for-next

From https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma

Leon Romanovsky says:

====================
Use ODP MRs for kernel ULPs

The following series extends MR creation routines to allow creation of
user MRs through kernel ULPs as a proxy. The immediate use case is to
allow RDS to work over FS-DAX, which requires ODP (on-demand-paging)
MRs to be created and such MRs were not possible to create prior this
series.

The first part of this patchset extends RDMA to have special verb
ib_reg_user_mr(). The common use case that uses this function is a
userspace application that allocates memory for HCA access but the
responsibility to register the memory at the HCA is on an kernel ULP.
This ULP acts as an agent for the userspace application.

The second part provides advise MR functionality for ULPs. This is
integral part of ODP flows and used to trigger pagefaults in advance
to prepare memory before running working set.

The third part is actual user of those in-kernel APIs.
====================

* tag 'rds-odp-for-5.5':
  net/rds: Use prefetch for On-Demand-Paging MR
  net/rds: Handle ODP mr registration/unregistration
  net/rds: Detect need of On-Demand-Paging memory registration
  RDMA/mlx5: Fix handling of IOVA != user_va in ODP paths
  IB/mlx5: Mask out unsupported ODP capabilities for kernel QPs
  RDMA/mlx5: Don't fake udata for kernel path
  IB/mlx5: Add ODP WQE handlers for kernel QPs
  IB/core: Add interface to advise_mr for kernel users
  IB/core: Introduce ib_reg_user_mr
  IB: Allow calls to ib_umem_get from kernel ULPs

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-21 09:55:04 -04:00
David S. Miller
ad063075d4 Use ODP MRs for kernel ULPs
The following series extends MR creation routines to allow creation of
 user MRs through kernel ULPs as a proxy. The immediate use case is to
 allow RDS to work over FS-DAX, which requires ODP (on-demand-paging)
 MRs to be created and such MRs were not possible to create prior this
 series.
 
 The first part of this patchset extends RDMA to have special verb
 ib_reg_user_mr(). The common use case that uses this function is a
 userspace application that allocates memory for HCA access but the
 responsibility to register the memory at the HCA is on an kernel ULP.
 This ULP acts as an agent for the userspace application.
 
 The second part provides advise MR functionality for ULPs. This is
 integral part of ODP flows and used to trigger pagefaults in advance
 to prepare memory before running working set.
 
 The third part is actual user of those in-kernel APIs.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQT1m3YD37UfMCUQBNwp8NhrnBAZsQUCXiVO8AAKCRAp8NhrnBAZ
 scTrAP9gb0d3qv0IOtHw5aGI1DAgjTUn/SzUOnsjDEn7DIoh9gEA2+ZmaEyLXKrl
 +UcZb31auy5P8ueJYokRLhLAyRcOIAg=
 =yaHb
 -----END PGP SIGNATURE-----

Merge tag 'rds-odp-for-5.5' of https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma

Leon Romanovsky says:

====================
Use ODP MRs for kernel ULPs

The following series extends MR creation routines to allow creation of
user MRs through kernel ULPs as a proxy. The immediate use case is to
allow RDS to work over FS-DAX, which requires ODP (on-demand-paging)
MRs to be created and such MRs were not possible to create prior this
series.

The first part of this patchset extends RDMA to have special verb
ib_reg_user_mr(). The common use case that uses this function is a
userspace application that allocates memory for HCA access but the
responsibility to register the memory at the HCA is on an kernel ULP.
This ULP acts as an agent for the userspace application.

The second part provides advise MR functionality for ULPs. This is
integral part of ODP flows and used to trigger pagefaults in advance
to prepare memory before running working set.

The third part is actual user of those in-kernel APIs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-21 10:22:51 +01:00
Bart Van Assche
04060db411 scsi: RDMA/isert: Fix a recently introduced regression related to logout
iscsit_close_connection() calls isert_wait_conn(). Due to commit
e9d3009cb9 both functions call target_wait_for_sess_cmds() although that
last function should be called only once. Fix this by removing the
target_wait_for_sess_cmds() call from isert_wait_conn() and by only calling
isert_wait_conn() after target_wait_for_sess_cmds().

Fixes: e9d3009cb9 ("scsi: target: iscsi: Wait for all commands to finish before freeing a session").
Link: https://lore.kernel.org/r/20200116044737.19507-1-bvanassche@acm.org
Reported-by: Rahul Kundu <rahul.kundu@chelsio.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-01-21 00:24:46 -05:00
Ingo Molnar
cb6c82df68 Linux 5.5-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl4k7i8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGvk0IAKRenVOdiudY77SQ
 VZjsteyrYTTQtPPv494ToIRjR0XQ+gYp8vyWzXTUC5Nm9Y9U3VzDqUPUjWszrSXE
 6mU+tzcMc9qwuUxnIFn8zfg64ygw+37sn/w3xqeH4QmF9Z5Wl3EX3SdXTs7jp3RS
 VxiztkUNI5ZBV2GDtla5K/9qLPqCQnUYXIiyi5lAtBtiitZDVXFp7dy7hMgEiaEO
 +78K5Kh3xlt5ndDsBFOlwIb2Oof3KL7bBXntdbSBc/bjol6IRvAgln48HWCv59G2
 jzAp2tj2KobX9GRAEPj+v4TQZEW0SXDNDi8MgQsM+3DYVCTmANsv57CBKRuf01+F
 nB1kAys=
 =zSnJ
 -----END PGP SIGNATURE-----

Merge tag 'v5.5-rc7' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-20 08:43:44 +01:00
David S. Miller
b3f7e3f23a Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net 2020-01-19 22:10:04 +01:00
Saeed Mahameed
12e9e0d0d9 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
This merge syncs with mlx5-next latest HW bits and layout updates for next
features, in addition one patch that improves
mlx5_create_auto_grouped_flow_table() API across all mlx5 users.

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Refactor mlx5_create_auto_grouped_flow_table
  net/mlx5e: Add discard counters per priority
  net/mlx5e: Expose FEC feilds and related capability bit
  net/mlx5: Add mlx5_ifc definitions for connection tracking support
  net/mlx5: Add copy header action struct layout
  net/mlx5: Expose resource dump register mapping
  net/mlx5: Add structures and defines for MIRC register
  net/mlx5: Read MCAM register groups 1 and 2
  net/mlx5: Add structures layout for new MCAM access reg groups
  net/mlx5: Expose vDPA emulation device capabilities
  net/mlx5: Add Virtio Emulation related device capabilities

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16 15:48:24 -08:00
Paul Blakey
61dc7b0141 net/mlx5: Refactor mlx5_create_auto_grouped_flow_table
Refactor mlx5_create_auto_grouped_flow_table() to use ft_attr param
which already carries the max_fte, prio and flags memebers, and is
used the same in similar mlx5_create_flow_table() function.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16 15:41:59 -08:00
Jack Morgenstein
eaad647e5c IB/mlx4: Fix memory leak in add_gid error flow
In procedure mlx4_ib_add_gid(), if the driver is unable to update the FW
gid table, there is a memory leak in the driver's copy of the gid table:
the gid entry's context buffer is not freed.

If such an error occurs, free the entry's context buffer, and mark the
entry as available (by setting its context pointer to NULL).

Fixes: e26be1bfef ("IB/mlx4: Implement ib_device callbacks")
Link: https://lore.kernel.org/r/20200115085050.73746-1-leon@kernel.org
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16 16:13:22 -04:00
Avihai Horon
d7fab91637 IB/mlx5: Expose RoCE accelerator counters
Introduce the following RoCE accelerator counters:
* roce_adp_retrans - number of adaptive retransmission for RoCE traffic.
* roce_adp_retrans_to - number of times RoCE traffic reached time out
  due to adaptive retransmission.
* roce_slow_restart - number of times RoCE slow restart was used.
* roce_slow_restart_cnps - number of times RoCE slow restart
  generate CNP packets.
* roce_slow_restart_trans - number of times RoCE slow restart change
  state to slow restart.

Link: https://lore.kernel.org/r/20200115145459.83280-3-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16 16:11:52 -04:00
Michael Guralnik
d6de0bb185 RDMA/mlx5: Set relaxed ordering when requested
Enable relaxed ordering in the mkey context when requested. As relaxed
ordering is not currently supported in UMR, disable UMR usage for relaxed
ordering MRs.

Link: https://lore.kernel.org/r/1578506740-22188-11-git-send-email-yishaih@mellanox.com
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16 15:55:47 -04:00
Michael Guralnik
811646998e RDMA/core: Add the core support field to METHOD_GET_CONTEXT
Add the core support field to METHOD_GET_CONTEXT, this field should
represent capabilities that are not device-specific.

Return support for optional access flags for memory regions. User-space
will use this capability to mask the optional access flags for
unsupporting kernels.

Link: https://lore.kernel.org/r/1578506740-22188-10-git-send-email-yishaih@mellanox.com
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16 15:55:46 -04:00
Michael Guralnik
86dd738cf2 RDMA/efa: Allow passing of optional access flags for MR registration
As part of adding a range of optional access flags that drivers need to be
able to accept, mask this range inside efa driver.  This will prevent the
driver from failing when an access flag from that range is passed.

Link: https://lore.kernel.org/r/1578506740-22188-8-git-send-email-yishaih@mellanox.com
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16 15:55:46 -04:00
Jason Gunthorpe
a1123418ba RDMA/uverbs: Add ioctl command to get a device context
Allow future extensions of the get context command through the uverbs
ioctl kabi.

Unlike the uverbs version this does not return an async_fd as well, that
has to be done with another command.

Link: https://lore.kernel.org/r/1578506740-22188-5-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16 15:55:45 -04:00
Jason Gunthorpe
da57db2567 RDMA/core: Remove ucontext_lock from the uverbs_destry_ufile_hw() path
This lock only serializes ucontext creation. Instead of checking the
ucontext_lock during destruction hold the existing hw_destroy_rwsem during
creation, which is the standard pattern for object creation.

The simplification of locking is needed for the next patch.

Link: https://lore.kernel.org/r/1578506740-22188-4-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16 15:55:45 -04:00
Jason Gunthorpe
d680e88e20 RDMA/core: Add UVERBS_METHOD_ASYNC_EVENT_ALLOC
Allow the async FD to be allocated separately from the context.

This is necessary to introduce the ioctl to create a context, as an ioctl
should only ever create a single uobject at a time.

If multiple async FDs are created then the first one is used to deliver
affiliated events from any ib_uevent_object, with all subsequent ones will
receive only unaffiliated events.

Link: https://lore.kernel.org/r/1578506740-22188-3-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16 15:55:45 -04:00
Jason Gunthorpe
8ffc324851 RDMA/mlx5: Fix handling of IOVA != user_va in ODP paths
Till recently it was not possible for userspace to specify a different
IOVA, but with the new ibv_reg_mr_iova() library call this can be done.

To compute the user_va we must compute:
  user_va = (iova - iova_start) + user_va_start

while being cautious of overflow and other math problems.

The iova is not reliably stored in the mmkey when the MR is created. Only
the cached creation path (the common one) set it, so it must also be set
when creating uncached MRs.

Fix the weird use of iova when computing the starting page index in the
MR. In the normal case, when iova == umem.address:
  iova & (~(BIT(page_shift) - 1)) ==
  ALIGN_DOWN(umem.address, odp->page_size) ==
  ib_umem_start(odp)

And when iova is different using it in math with a user_va is wrong.

Finally, do not allow an implicit ODP to be created with a non-zero IOVA
as we have no support for that.

Fixes: 7bdf65d411 ("IB/mlx5: Handle page faults")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16 16:15:07 +02:00
Moni Shoua
a73a895588 IB/mlx5: Mask out unsupported ODP capabilities for kernel QPs
The ODP handler for WQEs in RQ or SRQ is not implented for kernel QPs.
Therefore don't report support in these if query comes from a kernel user.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16 16:15:00 +02:00
Leon Romanovsky
4835709176 RDMA/mlx5: Don't fake udata for kernel path
Kernel paths must not set udata and provide NULL pointer,
instead of faking zeroed udata struct.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16 16:14:53 +02:00
Moni Shoua
da9ee9d8a8 IB/mlx5: Add ODP WQE handlers for kernel QPs
One of the steps in ODP page fault handler for WQEs is to read a WQE
from a QP send queue or receive queue buffer at a specific index.

Since the implementation of this buffer is different between kernel and
user QP the implementation of the handler needs to be aware of that and
handle it in a different way.

ODP for kernel MRs is currently supported only for RDMA_READ
and RDMA_WRITE operations so change the handler to
- read a WQE from a kernel QP send queue
- fail if access to receive queue or shared receive queue is
  required for a kernel QP

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16 16:14:47 +02:00
Moni Shoua
87d8069f6b IB/core: Add interface to advise_mr for kernel users
Allow ULPs to call advise_mr, so they can control ODP regions
in the same way as user space applications.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16 16:14:42 +02:00
Moni Shoua
33006bd4f3 IB/core: Introduce ib_reg_user_mr
Add ib_reg_user_mr() for kernel ULPs to register user MRs.

The common use case that uses this function is a userspace application
that allocates memory for HCA access but the responsibility to register
the memory at the HCA is on an kernel ULP. This ULP that acts as an agent
for the userspace application.

This function is intended to be used without a user context so vendor
drivers need to be aware of calling reg_user_mr() device operation with
udata equal to NULL.

Among all drivers, i40iw is the only driver which relies on presence
of udata, so check udata existence for that driver.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16 16:14:36 +02:00
Moni Shoua
c320e527e1 IB: Allow calls to ib_umem_get from kernel ULPs
So far the assumption was that ib_umem_get() and ib_umem_odp_get()
are called from flows that start in UVERBS and therefore has a user
context. This assumption restricts flows that are initiated by ULPs
and need the service that ib_umem_get() provides.

This patch changes ib_umem_get() and ib_umem_odp_get() to get IB device
directly by relying on the fact that both UVERBS and ULPs sets that
field correctly.

Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16 16:14:28 +02:00
Sergey Gorenko
0fbb37dd82 IB/srp: Never use immediate data if it is disabled by a user
Some SRP targets that do not support specification SRP-2, put the garbage
to the reserved bits of the SRP login response.  The problem was not
detected for a long time because the SRP initiator ignored those bits. But
now one of them is used as SRP_LOGIN_RSP_IMMED_SUPP. And it causes a
critical error on the target when the initiator sends immediate data.

The ib_srp module has a use_imm_date parameter to enable or disable
immediate data manually. But it does not help in the above case, because
use_imm_date is ignored at handling the SRP login response. The problem is
definitely caused by a bug on the target side, but the initiator's
behavior also does not look correct.  The initiator should not use
immediate data if use_imm_date is disabled by a user.

This commit adds an additional checking of use_imm_date at the handling of
SRP login response to avoid unexpected use of immediate data.

Fixes: 882981f4a4 ("RDMA/srp: Add support for immediate data")
Link: https://lore.kernel.org/r/20200115133055.30232-1-sergeygo@mellanox.com
Signed-off-by: Sergey Gorenko <sergeygo@mellanox.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-15 16:37:37 -04:00
Rao Shoaib
363824f92a RDMA/rxe: Compute the maximum sges and inline size based on the WQE size
The SGE buffer size and max_inline data should be derived from the size of
the WQE. Each value individually sets the WQE size, so compute the actual
sizes based on the actual WQE size and configure the QP with the maximums.

Also fix the missing return of the actual maximum capability to the caller.

Link: https://lore.kernel.org/r/1578962480-17814-3-git-send-email-rao.shoaib@oracle.com
Signed-off-by: Rao Shoaib <rao.shoaib@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-15 16:36:41 -04:00
Rao Shoaib
4e8d683f11 Introduce maximum WQE size to check limits
Introduce maximum WQE size to impose limits on max SGE's and inline data

Link: https://lore.kernel.org/r/1578962480-17814-2-git-send-email-rao.shoaib@oracle.com
Signed-off-by: Rao Shoaib <rao.shoaib@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-15 16:27:01 -04:00
Gal Pressman
0428c6ef8a RDMA/efa: Remove unused ucontext parameter from efa_qp_user_mmap_entries_remove
The ucontext parameter is unused, remove it.

Link: https://lore.kernel.org/r/20200114085706.82229-6-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-15 15:32:33 -04:00
Gal Pressman
f5f5ddbe73 RDMA/efa: Remove {} brackets from single statement if
The {} brackets are not needed according to the Linux coding style.

Link: https://lore.kernel.org/r/20200114085706.82229-5-galpress@amazon.com
Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com>
Reviewed-by: Firas JahJah <firasj@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-15 15:32:32 -04:00
Gal Pressman
57f63f371b RDMA/efa: Device definitions documentation updates
Various clarifications and updates to the documentation of the device
definitions.
No functional changes in this patch.

Link: https://lore.kernel.org/r/20200114085706.82229-4-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-15 15:32:32 -04:00
Jiaran Zhang
7db82697b8 RDMA/hns: Add support for extended atomic in userspace
To support extended atomic operations including cmp & swap and fetch & add
of 8 bytes, 16 bytes, 32 bytes, 64 bytes in userspace, some field in qpc
should be configured.

Link: https://lore.kernel.org/r/1579052546-11746-1-git-send-email-liweihang@huawei.com
Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-15 15:27:13 -04:00
Lijun Ou
80a7857016 RDMA/hns: Get pf capabilities from firmware
Get pf capabilities from firmware according to different hardwares, if it
fails, all capabilities will be set with a default value.

Link: https://lore.kernel.org/r/1578738761-3176-4-git-send-email-liweihang@huawei.com
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-15 14:11:34 -04:00
Lijun Ou
ba6bb7e974 RDMA/hns: Add interfaces to get pf capabilities from firmware
pf capabilities are set by default for hip08 previously which should
depends on different types of hardware. So add new interfaces to get them
from firmware.

Link: https://lore.kernel.org/r/1578738761-3176-3-git-send-email-liweihang@huawei.com
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-15 14:11:34 -04:00
Weihang Li
a91e093cad RDMA/hns: Remove some redundant variables related to capabilities
In struct hns_roce_caps, max_srq_sg and max_srqwqes is unused, and
max_srqs has the same effect with num_srqs. So remove them from this
structrue.

Link: https://lore.kernel.org/r/1578738761-3176-2-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-15 14:11:34 -04:00
Jason Gunthorpe
5c55cfd6a5 RDMA/core: Use READ_ONCE for ib_ufile.async_file
The writer for async_file holds the ucontext_lock, while the readers are
left unlocked. Most readers rely on an implicit locking, either by having
a uobject (which cannot be created before a context) or by holding the
ib_ufile kref.

However ib_uverbs_free_hw_resources() has no implicit lock and has a
possible race. Make this all clear and sane by using READ_ONCE
consistently.

Link: https://lore.kernel.org/r/1578504126-9400-15-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13 16:20:16 -04:00
Jason Gunthorpe
3e032c0e92 RDMA/core: Make ib_uverbs_async_event_file into a uobject
This makes async events aligned with completion events as both are full
uobjects of FD type and use the same uobject lifecycle.

A bunch of duplicate code is consolidated and the general flow between the
two FDs is now very similar.

Link: https://lore.kernel.org/r/1578504126-9400-14-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13 16:20:16 -04:00
Jason Gunthorpe
39e83af817 RDMA/core: Remove the ufile arg from rdma_alloc_begin_uobject
Now that all callers provide a non-NULL attrs the ufile is redundant.
Adjust things so that the context handling is done inside alloc_uobj,
and the ib_uverbs_get_ucontext_file() is avoided if we already have the
context.

Link: https://lore.kernel.org/r/1578504126-9400-13-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13 16:20:16 -04:00
Jason Gunthorpe
817d657650 RDMA/core: Simplify type usage for ib_uverbs_async_handler()
This function works on an ib_uverbs_async_file. Accept that as a parameter
instead of the struct ib_uverbs_file.

Consoldiate all the callers working from an ib_uevent_object to a single
function and locate the async_file directly from the struct ib_uobject
instead of using context_ptr.

Link: https://lore.kernel.org/r/1578504126-9400-11-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13 16:20:16 -04:00
Jason Gunthorpe
e04dd13159 RDMA/core: Do not erase the type of ib_wq.uobject
This is a struct ib_uwq_object pointer, instead of using container_of()
all over the place just store it with its actual type.

Link: https://lore.kernel.org/r/1578504126-9400-10-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13 16:20:16 -04:00
Jason Gunthorpe
9fbe334c6a RDMA/core: Do not erase the type of ib_srq.uobject
This is a struct ib_usrq_object pointer, instead of using container_of()
all over the place just store it with its actual type.

Link: https://lore.kernel.org/r/1578504126-9400-9-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13 16:20:15 -04:00
Jason Gunthorpe
620d3f8176 RDMA/core: Do not erase the type of ib_qp.uobject
This is a struct ib_uqp_object pointer, instead of using container_of()
all over the place just store it with its actual type.

Link: https://lore.kernel.org/r/1578504126-9400-8-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13 16:20:15 -04:00
Jason Gunthorpe
5bd48c18c8 RDMA/core: Do not erase the type of ib_cq.uobject
This is a struct ib_ucq_object pointer, instead of using container_of()
all over the place just store it with its actual type.

Link: https://lore.kernel.org/r/1578504126-9400-7-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13 16:20:15 -04:00
Jason Gunthorpe
4ec1dcfcdf RDMA/core: Make ib_ucq_object use ib_uevent_object
Any uobject that sends events into the async_event_file should be using
ib_uevent_object so it can use the standard uevent based helper
functions. CQ pushes events into both the async_event and the comp_channel
in an open coded way. Move the async events related stuff to
ib_uevent_object.

Link: https://lore.kernel.org/r/1578504126-9400-6-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13 16:20:15 -04:00
Jason Gunthorpe
849e149063 RDMA/core: Do not allow alloc_commit to fail
This is a left over from an earlier version that creates a lot of
complexity for error unwind, particularly for FD uobjects.

The only reason this was done is so that anon_inode_get_file() could be
called with the final fops and a fully setup uobject. Both need to be
setup since unwinding anon_inode_get_file() via fput will call the
driver's release().

Now that the driver does not provide release, we no longer need to worry
about this complicated sequence, simply create the struct file at the
start and allow the core code's release function to deal with the abort
case.

This allows all the confusing error paths around commit to be removed.

Link: https://lore.kernel.org/r/1578504126-9400-5-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13 16:20:15 -04:00
Jason Gunthorpe
93887e66ff RDMA/mlx5: Simplify devx async commands
With the new FD structure the async commands do not need to hold any
references while running. The existing mlx5_cmd_exec_cb() and
mlx5_cmd_cleanup_async_ctx() provide enough synchronization to ensure
that all outstanding commands are completed before the uobject can be
destructed.

Remove the now confusing get_file() and the type erasure of the
devx_async_cmd_event_file.

Link: https://lore.kernel.org/r/1578504126-9400-4-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13 16:17:19 -04:00
Jason Gunthorpe
f7c8416cce RDMA/core: Simplify destruction of FD uobjects
FD uobjects have a weird split between the struct file and uobject
world. Simplify this to make them pure uobjects and use a generic release
method for all struct file operations.

This fixes the control flow so that mlx5_cmd_cleanup_async_ctx() is always
called before erasing the linked list contents to make the concurrancy
simpler to understand.

For this to work the uobject destruction must fence anything that it is
cleaning up - the design must not rely on struct file lifetime.

Only deliver_event() relies on the struct file to when adding new events
to the queue, add a is_destroyed check under lock to block it.

Link: https://lore.kernel.org/r/1578504126-9400-3-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13 16:17:19 -04:00
Jason Gunthorpe
6898d1c661 RDMA/mlx5: Use RCU and direct refcounts to keep memory alive
dispatch_event_fd() runs from a notifier with minimal locking, and relies
on RCU and a file refcount to keep the uobject and eventfd alive.

As the next patch wants to remove the file_operations release function
from the drivers, re-organize things so that the devx_event_notifier()
path uses the existing RCU to manage the lifetime of the uobject and
eventfd.

Move the refcount puts to a call_rcu so that the objects are guaranteed to
exist and remove the indirect file refcount.

Link: https://lore.kernel.org/r/1578504126-9400-2-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13 16:17:19 -04:00
Jason Gunthorpe
8bdf9dd984 RDMA/uverbs: Remove needs_kfree_rcu from uverbs_obj_type_class
After device disassociation the uapi_objects are destroyed and freed,
however it is still possible that core code can be holding a kref on the
uobject. When it finally goes to uverbs_uobject_free() via the kref_put()
it can trigger a use-after-free on the uapi_object.

Since needs_kfree_rcu is a micro optimization that only benefits file
uobjects, just get rid of it. There is no harm in using kfree_rcu even if
it isn't required, and the number of involved objects is small.

Link: https://lore.kernel.org/r/20200113143306.GA28717@ziepe.ca
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13 16:17:18 -04:00
Yishai Hadas
3f59b6c3e6 IB/mlx5: Add mmap support for VAR
Add mmap support for VAR, it uses the 'offset' command mode with
involvement of IB core APIs to find the previously allocated mmap entry.

Link: https://lore.kernel.org/r/20191212110928.334995-6-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-12 19:49:13 -04:00
Yishai Hadas
7be76bef32 IB/mlx5: Introduce VAR object and its alloc/destroy methods
Introduce VAR object and its alloc/destroy KABI methods. The internal
implementation uses the IB core API to manage mmap/munamp calls.

Link: https://lore.kernel.org/r/20191212110928.334995-5-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-12 19:49:13 -04:00
Yishai Hadas
f164be8c03 IB/mlx5: Extend caps stage to handle VAR capabilities
Extend caps stage to handle VAR capabilities.

Link: https://lore.kernel.org/r/20191212110928.334995-4-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-12 19:49:13 -04:00
Ingo Molnar
57ad87ddce Merge branch 'x86/mm' into efi/core, to pick up dependencies
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-10 18:53:14 +01:00
Guoqing Jiang
1e123d96b8 RDMA/core: Remove err in iw_query_port
Since we can return device->ops.query_port directly, so no need to keep
those lines.

Link: https://lore.kernel.org/r/20200109134043.15568-1-guoqing.jiang@cloud.ionos.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-10 11:19:04 -04:00
Xi Wang
626903e935 RDMA/hns: Add support for reporting wc as software mode
When hardware is in resetting stage, we may can't poll back all the
expected work completions as the hardware won't generate cqe anymore.

This patch allows the driver to compose the expected wc instead of the
hardware during resetting stage. Once the hardware finished resetting, we
can poll cq from hardware again.

Link: https://lore.kernel.org/r/1578572412-25756-1-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-10 11:17:37 -04:00
Lijun Ou
468d020e2f RDMA/hns: Bugfix for posting a wqe with sge
Driver should first check whether the sge is valid, then fill the valid
sge and the caculated total into hardware, otherwise invalid sges will
cause an error.

Fixes: 52e3b42a2f ("RDMA/hns: Filter for zero length of sge in hip08 kernel mode")
Fixes: 7bdee4158b ("RDMA/hns: Fill sq wqe context of ud type in hip08")
Link: https://lore.kernel.org/r/1578571852-13704-1-git-send-email-liweihang@huawei.com
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-10 11:17:27 -04:00
Mike Marciniszyn
2c9d4e26d1 IB/hfi1: Add RcvShortLengthErrCnt to hfi1stats
This counter, RxShrErr, is required for error analysis and debug.

Fixes: 7724105686 ("IB/hfi1: add driver files")
Link: https://lore.kernel.org/r/20200106134235.119356.29123.stgit@awfm-01.aw.intel.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-10 10:57:17 -04:00
Mike Marciniszyn
5ffd048698 IB/hfi1: Add software counter for ctxt0 seq drop
All other code paths increment some form of drop counter.

This was missed in the original implementation.

Fixes: 82c2611daa ("staging/rdma/hfi1: Handle packets with invalid RHF on context 0")
Link: https://lore.kernel.org/r/20200106134228.119356.96828.stgit@awfm-01.aw.intel.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-10 10:57:17 -04:00
Grzegorz Andrejczuk
d791d294ed IB/hfi1: Return void in packet receiving functions
Packet receiving functions returns int value, and yet the return values
are not used at all.

This patch converts the functions to return void.

Link: https://lore.kernel.org/r/20200106134222.119356.84098.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-10 10:57:17 -04:00
Grzegorz Andrejczuk
13d2a8384b IB/hfi1: Decouple IRQ name from type
IRQ name was connected to IRQ type, this is not sufficient and it would be
better to use name as argument to msix_request_irq instead of assigning it
to variables when function is called.

Index argument was required to generate name and now it can be removed.

To generate name correctly helpers function were added and updated.

Link: https://lore.kernel.org/r/20200106134216.119356.44478.stgit@awfm-01.aw.intel.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-10 10:57:17 -04:00
Mike Marciniszyn
62661038c3 IB/hfi1: Create API for auto activate
Add an auto activate routine for use by the interrupt handler.

Link: https://lore.kernel.org/r/20200106134210.119356.43079.stgit@awfm-01.aw.intel.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-10 10:57:17 -04:00
Mike Marciniszyn
cd47b594db IB/hfi1: IB/hfi1: Add an API to handle special case drop
This patch pushes special case drop logic into an API to be shared by all
interrupt handlers.

Additionally, convert do_drop to a bool.

Link: https://lore.kernel.org/r/20200106134203.119356.36962.stgit@awfm-01.aw.intel.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-10 10:57:16 -04:00
Grzegorz Andrejczuk
7b8a8b72c9 IB/hfi1: Move common receive IRQ code to function
Tracing interrupts, incrementing interrupt counter and ASPM are part that
will be reused by HFI1 receive IRQ handlers.

Create common function to have shared code in one place.

Link: https://lore.kernel.org/r/20200106134157.119356.32656.stgit@awfm-01.aw.intel.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-10 10:57:16 -04:00
Mike Marciniszyn
01c7fc501b IB/hfi1: Add fast and slow handlers for receive context
This patch eliminate special cases by adding a fast_handler member to the
receive context and changes to the fast handler as specified in the new
variable. Initialize the variable as soon as the setting for dma tail is
known when the context is created.

Setting fast path is called every time when any context has entered slow
path. Add function to check if contexts is using fast path and do not set
fast path when it is already done to improve RCD fastpath setting.

Link: https://lore.kernel.org/r/20200106134150.119356.87558.stgit@awfm-01.aw.intel.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-10 10:57:16 -04:00
Mike Marciniszyn
de730f7191 IB/hfi1: Move chip specific functions to chip.c
Move routines and defines associated with hdrq size validation to a chip
specific routine since the limits are specific to the device.

Fix incorrect value for min size 2 -> 32

CSR writes should also be in chip.c.

Create a chip routine to write the hdrq specific CSRs and call as
appropriate.

Link: https://lore.kernel.org/r/20200106134144.119356.74312.stgit@awfm-01.aw.intel.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-10 10:57:16 -04:00
Jason Gunthorpe
14e23bd6d2 RDMA/core: Fix locking in ib_uverbs_event_read
This should not be using ib_dev to test for disassociation, during
disassociation is_closed is set under lock and the waitq is triggered.

Instead check is_closed and be sure to re-obtain the lock to test the
value after the wait_event returns.

Fixes: 036b106357 ("IB/uverbs: Enable device removal when there are active user space applications")
Link: https://lore.kernel.org/r/1578504126-9400-12-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-10 10:52:56 -04:00
Arnd Bergmann
74f75cda75 IB/core: Fix build failure without hugepages
HPAGE_SHIFT is only defined on architectures that support hugepages:

drivers/infiniband/core/umem_odp.c: In function 'ib_umem_odp_get':
drivers/infiniband/core/umem_odp.c:245:26: error: 'HPAGE_SHIFT' undeclared (first use in this function); did you mean 'PAGE_SHIFT'?

Enclose this in an #ifdef.

Fixes: 9ff1b6466a ("IB/core: Fix ODP with IB_ACCESS_HUGETLB handling")
Link: https://lore.kernel.org/r/20200109084740.2872079-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-09 11:59:56 -04:00
Parav Pandit
40adf68612 IB/core: Rename event_handler_lock to qp_open_list_lock
This lock is used to protect the qp->open_list linked list. As a side
effect it seems to also globally serialize the qp event_handler, but it
isn't clear if that is a deliberate design.

Link: https://lore.kernel.org/r/20191212113024.336702-5-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07 20:20:48 -04:00
Parav Pandit
17e1064632 IB/core: Cut down single member ib_cache structure
Given that ib_cache structure has only single member now, merge the cache
lock directly in the ib_device.

Link: https://lore.kernel.org/r/20191212113024.336702-4-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07 20:18:11 -04:00
Parav Pandit
6b57cea922 IB/core: Let IB core distribute cache update events
Currently when the low level driver notifies Pkey, GID, and port change
events they are notified to the registered handlers in the order they are
registered.

IB core and other ULPs such as IPoIB are interested in GID, LID, Pkey
change events.

Since all GID queries done by ULPs are serviced by IB core, and the IB
core deferes cache updates to a work queue, it is possible for other
clients to see stale cache data when they handle their own events.

For example, the below call tree shows how ipoib will call
rdma_query_gid() concurrently with the update to the cache sitting in the
WQ.

mlx5_ib_handle_event()
  ib_dispatch_event()
    ib_cache_event()
       queue_work() -> slow cache update

    [..]
    ipoib_event()
     queue_work()
       [..]
       work handler
         ipoib_ib_dev_flush_light()
           __ipoib_ib_dev_flush()
              ipoib_dev_addr_changed_valid()
                rdma_query_gid() <- Returns old GID, cache not updated.

Move all the event dispatch to a work queue so that the cache update is
always done before any clients are notified.

Fixes: f35faa4ba9 ("IB/core: Simplify ib_query_gid to always refer to cache")
Link: https://lore.kernel.org/r/20191212113024.336702-3-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07 20:18:10 -04:00
Parav Pandit
4cca96a8d9 IB/mlx5: Do reverse sequence during device removal
When IB device profile initialization completes, device is marked as
active.

However, IB device is not marked inactive, during device removal flow. It
should be the mirror of the add flow.

Hence, mark it inactive during remove sequence.

Link: https://lore.kernel.org/r/20191212113024.336702-2-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07 20:18:10 -04:00
Lijun Ou
60262b10a9 RDMA/hns: Fix coding style issues
Fix some coding style issuses without changing logic of codes, most of the
modification is unreasonable line breaks and alignments.

Link: https://lore.kernel.org/r/1578313276-29080-8-git-send-email-liweihang@huawei.com
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07 16:31:42 -04:00
Wenpeng Liang
d800c93bac RDMA/hns: Replace custom macros HNS_ROCE_ALIGN_UP
HNS_ROCE_ALIGN_UP can be replaced by round_up() which is defined in
kernel.h.

Link: https://lore.kernel.org/r/1578313276-29080-7-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07 16:26:33 -04:00
Yixing Liu
0c53426c7c RDMA/hns: Remove redundant print information
There are already necessary prints in outer function, prints in
hns_roce_function_clear() may confuse users. So these prints is removed.

Link: https://lore.kernel.org/r/1578313276-29080-6-git-send-email-liweihang@huawei.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07 16:26:33 -04:00
Lijun Ou
032b057416 RDMA/hns: Delete unnessary parameters in hns_roce_v2_qp_modify()
Current state and new state of qp won't be configured when modifying qp,
so these two redundant parameters should be removed.

Link: https://lore.kernel.org/r/1578313276-29080-5-git-send-email-liweihang@huawei.com
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07 16:26:33 -04:00
Lijun Ou
5e049a5d6c RDMA/hns: Update the value of qp type
The values used to represent service type of RC and UD should be
interchanged according to design of hardware. And it's better to define
these types in enumeration than macros.

Link: https://lore.kernel.org/r/1578313276-29080-4-git-send-email-liweihang@huawei.com
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07 16:26:33 -04:00
Lijun Ou
58e4fc11c1 RDMA/hns: Remove unused function hns_roce_init_eq_table()
hns_roce_init_eq_table() is an unused function that only retains its
declaration in driver.

Link: https://lore.kernel.org/r/1578313276-29080-3-git-send-email-liweihang@huawei.com
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07 16:26:32 -04:00
Wenpeng Liang
eca44507c3 RDMA/hns: Avoid printing address of mtt page
Address of a page shouldn't be printed in case of security issues.

Link: https://lore.kernel.org/r/1578313276-29080-2-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07 16:26:32 -04:00
Chuck Lever
622db5b643 RDMA/core: Add trace points to follow MR allocation
Track the lifetime of ib_mr objects. Here's sample output from a test run
with NFS/RDMA:

           <...>-361   [009] 79238.772782: mr_alloc:             pd.id=3 mr.id=11 type=MEM_REG max_num_sg=30 rc=0
           <...>-361   [009] 79238.772812: mr_alloc:             pd.id=3 mr.id=12 type=MEM_REG max_num_sg=30 rc=0
           <...>-361   [009] 79238.772839: mr_alloc:             pd.id=3 mr.id=13 type=MEM_REG max_num_sg=30 rc=0
           <...>-361   [009] 79238.772866: mr_alloc:             pd.id=3 mr.id=14 type=MEM_REG max_num_sg=30 rc=0
           <...>-361   [009] 79238.772893: mr_alloc:             pd.id=3 mr.id=15 type=MEM_REG max_num_sg=30 rc=0
           <...>-361   [009] 79238.772921: mr_alloc:             pd.id=3 mr.id=16 type=MEM_REG max_num_sg=30 rc=0
           <...>-361   [009] 79238.772947: mr_alloc:             pd.id=3 mr.id=17 type=MEM_REG max_num_sg=30 rc=0
           <...>-361   [009] 79238.772974: mr_alloc:             pd.id=3 mr.id=18 type=MEM_REG max_num_sg=30 rc=0
           <...>-361   [009] 79238.773001: mr_alloc:             pd.id=3 mr.id=19 type=MEM_REG max_num_sg=30 rc=0
           <...>-361   [009] 79238.773028: mr_alloc:             pd.id=3 mr.id=20 type=MEM_REG max_num_sg=30 rc=0
           <...>-361   [009] 79238.773055: mr_alloc:             pd.id=3 mr.id=21 type=MEM_REG max_num_sg=30 rc=0
           <...>-361   [009] 79240.270942: mr_alloc:             pd.id=3 mr.id=22 type=MEM_REG max_num_sg=30 rc=0
           <...>-361   [009] 79240.270975: mr_alloc:             pd.id=3 mr.id=23 type=MEM_REG max_num_sg=30 rc=0
           <...>-361   [009] 79240.271007: mr_alloc:             pd.id=3 mr.id=24 type=MEM_REG max_num_sg=30 rc=0
           <...>-361   [009] 79240.271036: mr_alloc:             pd.id=3 mr.id=25 type=MEM_REG max_num_sg=30 rc=0
           <...>-361   [009] 79240.271067: mr_alloc:             pd.id=3 mr.id=26 type=MEM_REG max_num_sg=30 rc=0
           <...>-361   [009] 79240.271095: mr_alloc:             pd.id=3 mr.id=27 type=MEM_REG max_num_sg=30 rc=0
           <...>-361   [009] 79240.271121: mr_alloc:             pd.id=3 mr.id=28 type=MEM_REG max_num_sg=30 rc=0
           <...>-361   [009] 79240.271153: mr_alloc:             pd.id=3 mr.id=29 type=MEM_REG max_num_sg=30 rc=0
           <...>-361   [009] 79240.271181: mr_alloc:             pd.id=3 mr.id=30 type=MEM_REG max_num_sg=30 rc=0
           <...>-361   [009] 79240.271208: mr_alloc:             pd.id=3 mr.id=31 type=MEM_REG max_num_sg=30 rc=0
           <...>-361   [009] 79240.271236: mr_alloc:             pd.id=3 mr.id=32 type=MEM_REG max_num_sg=30 rc=0
           <...>-4351  [001] 79242.299400: mr_dereg:             mr.id=32
           <...>-4351  [001] 79242.299467: mr_dereg:             mr.id=31
           <...>-4351  [001] 79242.299554: mr_dereg:             mr.id=30
           <...>-4351  [001] 79242.299615: mr_dereg:             mr.id=29
           <...>-4351  [001] 79242.299684: mr_dereg:             mr.id=28
           <...>-4351  [001] 79242.299748: mr_dereg:             mr.id=27
           <...>-4351  [001] 79242.299812: mr_dereg:             mr.id=26
           <...>-4351  [001] 79242.299874: mr_dereg:             mr.id=25
           <...>-4351  [001] 79242.299944: mr_dereg:             mr.id=24
           <...>-4351  [001] 79242.300009: mr_dereg:             mr.id=23
           <...>-4351  [001] 79242.300190: mr_dereg:             mr.id=22
           <...>-4351  [001] 79242.300263: mr_dereg:             mr.id=21
           <...>-4351  [001] 79242.300326: mr_dereg:             mr.id=20
           <...>-4351  [001] 79242.300388: mr_dereg:             mr.id=19
           <...>-4351  [001] 79242.300450: mr_dereg:             mr.id=18
           <...>-4351  [001] 79242.300516: mr_dereg:             mr.id=17
           <...>-4351  [001] 79242.300629: mr_dereg:             mr.id=16
           <...>-4351  [001] 79242.300718: mr_dereg:             mr.id=15
           <...>-4351  [001] 79242.300784: mr_dereg:             mr.id=14
           <...>-4351  [001] 79242.300879: mr_dereg:             mr.id=13
           <...>-4351  [001] 79242.300945: mr_dereg:             mr.id=12
           <...>-4351  [001] 79242.301012: mr_dereg:             mr.id=11

Some features of the output:
- The lifetime and owner PD of each MR is clearly visible.
- The type of MR is captured, as is the SGE array size.
- Failing MR allocation can be recorded.

Link: https://lore.kernel.org/r/20191218201820.30584.34636.stgit@manet.1015granger.net
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07 16:10:53 -04:00
Chuck Lever
3e5901cbfc RDMA/core: Trace points for diagnosing completion queue issues
Sample trace events:

   kworker/u29:0-300   [007]   120.042217: cq_alloc:             cq.id=4 nr_cqe=161 comp_vector=2 poll_ctx=WORKQUEUE
          <idle>-0     [002]   120.056292: cq_schedule:          cq.id=4
    kworker/2:1H-482   [002]   120.056402: cq_process:           cq.id=4 wake-up took 109 [us] from interrupt
    kworker/2:1H-482   [002]   120.056407: cq_poll:              cq.id=4 requested 16, returned 1
          <idle>-0     [002]   120.067503: cq_schedule:          cq.id=4
    kworker/2:1H-482   [002]   120.067537: cq_process:           cq.id=4 wake-up took 34 [us] from interrupt
    kworker/2:1H-482   [002]   120.067541: cq_poll:              cq.id=4 requested 16, returned 1
          <idle>-0     [002]   120.067657: cq_schedule:          cq.id=4
    kworker/2:1H-482   [002]   120.067672: cq_process:           cq.id=4 wake-up took 15 [us] from interrupt
    kworker/2:1H-482   [002]   120.067674: cq_poll:              cq.id=4 requested 16, returned 1

 ...

         systemd-1     [002]   122.392653: cq_schedule:          cq.id=4
    kworker/2:1H-482   [002]   122.392688: cq_process:           cq.id=4 wake-up took 35 [us] from interrupt
    kworker/2:1H-482   [002]   122.392693: cq_poll:              cq.id=4 requested 16, returned 16
    kworker/2:1H-482   [002]   122.392836: cq_poll:              cq.id=4 requested 16, returned 16
    kworker/2:1H-482   [002]   122.392970: cq_poll:              cq.id=4 requested 16, returned 16
    kworker/2:1H-482   [002]   122.393083: cq_poll:              cq.id=4 requested 16, returned 16
    kworker/2:1H-482   [002]   122.393195: cq_poll:              cq.id=4 requested 16, returned 3

Several features to note in this output:
 - The WCE count and context type are reported at allocation time
 - The CPU and kworker for each CQ is evident
 - The CQ's restracker ID is tagged on each trace event
 - CQ poll scheduling latency is measured
 - Details about how often single completions occur versus multiple
   completions are evident
 - The cost of the ULP's completion handler is recorded

Link: https://lore.kernel.org/r/20191218201815.30584.3481.stgit@manet.1015granger.net
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07 16:10:53 -04:00
Chuck Lever
ed999f820a RDMA/cma: Add trace points in RDMA Connection Manager
Record state transitions as each connection is established. The IP address
of both peers and the Type of Service is reported. These trace points are
not in performance hot paths.

Also, record each cm_event_handler call to ULPs. This eliminates the need
for each ULP to add its own similar trace point in its CM event handler
function.

These new trace points appear in a new trace subsystem called "rdma_cma".

Sample events:

           <...>-220   [004]   121.430733: cm_id_create:         cm.id=0
           <...>-472   [003]   121.430991: cm_event_handler:     cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 ADDR_RESOLVED (0/0)
           <...>-472   [003]   121.430995: cm_event_done:        cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 result=0
           <...>-472   [003]   121.431172: cm_event_handler:     cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 ROUTE_RESOLVED (2/0)
           <...>-472   [003]   121.431174: cm_event_done:        cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 result=0
           <...>-220   [004]   121.433480: cm_qp_create:         cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 pd.id=2 qp_type=RC send_wr=4091 recv_wr=256 qp_num=521 rc=0
           <...>-220   [004]   121.433577: cm_send_req:          cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 qp_num=521
     kworker/1:2-973   [001]   121.436190: cm_send_mra:          cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0
     kworker/1:2-973   [001]   121.436340: cm_send_rtu:          cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0
     kworker/1:2-973   [001]   121.436359: cm_event_handler:     cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 ESTABLISHED (9/0)
     kworker/1:2-973   [001]   121.436365: cm_event_done:        cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 result=0
           <...>-1975  [005]   123.161954: cm_disconnect:        cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0
           <...>-1975  [005]   123.161974: cm_sent_dreq:         cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0
           <...>-220   [004]   123.162102: cm_disconnect:        cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0
     kworker/0:1-13    [000]   123.162391: cm_event_handler:     cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 DISCONNECTED (10/0)
     kworker/0:1-13    [000]   123.162393: cm_event_done:        cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 result=0
           <...>-220   [004]   123.164456: cm_qp_destroy:        cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 qp_num=521
           <...>-220   [004]   123.165290: cm_id_destroy:        cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0

Some features to note:
- restracker ID of the rdma_cm_id is tagged on each trace event
- The source and destination IP addresses and TOS are reported
- CM event upcalls are shown with decoded event and status
- CM state transitions are reported
- rdma_cm_id lifetime events are captured
- The latency of ULP CM event handlers is reported
- Lifetime events of associated QPs are reported
- Device removal and insertion is reported

This patch is based on previous work by:

Saeed Mahameed <saeedm@mellanox.com>
Mukesh Kacker <mukesh.kacker@oracle.com>
Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Aron Silverton <aron.silverton@oracle.com>
Avinash Repaka <avinash.repaka@oracle.com>
Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

Link: https://lore.kernel.org/r/20191218201810.30584.3052.stgit@manet.1015granger.net
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07 16:10:53 -04:00
Shiraz Saleem
9554de394b i40iw: Remove setting of VMA private data and use rdma_user_mmap_io
vm_ops is now initialized in ib_uverbs_mmap() with the recent rdma mmap
API changes. Earlier it was done in rdma_umap_priv_init() which would not
be called unless a driver called rdma_user_mmap_io() in its mmap.

i40iw does not use the rdma_user_mmap_io API but sets the vma's
vm_private_data to a driver object. This now conflicts with the vm_op
rdma_umap_close as priv pointer points to the i40iw driver object instead
of the private data setup by core when rdma_user_mmap_io is called.  This
leads to a crash in rdma_umap_close with a mmap put being called when it
should not have.

Remove the redundant setting of the vma private_data in i40iw as it is not
used. Also move i40iw over to use the rdma_user_mmap_io API. This gives
the extra protection of having the mappings zapped when the context is
detsroyed.

  BUG: unable to handle page fault for address: 0000000100000001
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0002) - not-present page
  PGD 0 P4D 0
  Oops: 0002 [#1] SMP PTI
  CPU: 6 PID: 9528 Comm: rping Kdump: loaded Not tainted 5.5.0-rc4+ #117
  Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Q87M-D2H, BIOS F7 01/17/2014
  RIP: 0010:rdma_user_mmap_entry_put+0xa/0x30 [ib_core]
  RSP: 0018:ffffb340c04c7c38 EFLAGS: 00010202
  RAX: 00000000ffffffff RBX: ffff9308e7be2a00 RCX: 000000000000cec0
  RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000100000001
  RBP: ffff9308dc7641f0 R08: 0000000000000001 R09: 0000000000000000
  R10: 0000000000000001 R11: ffffffff8d4414d8 R12: ffff93075182c780
  R13: 0000000000000001 R14: ffff93075182d2a8 R15: ffff9308e2ddc840
  FS:  0000000000000000(0000) GS:ffff9308fdc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000100000001 CR3: 00000002e0412004 CR4: 00000000001606e0
  Call Trace:
   rdma_umap_close+0x40/0x90 [ib_uverbs]
   remove_vma+0x43/0x80
   exit_mmap+0xfd/0x1b0
   mmput+0x6e/0x130
   do_exit+0x290/0xcc0
   ? get_signal+0x152/0xc40
   do_group_exit+0x46/0xc0
   get_signal+0x1bd/0xc40
   ? prepare_to_wait_event+0x97/0x190
   do_signal+0x36/0x630
   ? remove_wait_queue+0x60/0x60
   ? __audit_syscall_exit+0x1d9/0x290
   ? rcu_read_lock_sched_held+0x52/0x90
   ? kfree+0x21c/0x2e0
   exit_to_usermode_loop+0x4f/0xc3
   do_syscall_64+0x1ed/0x270
   entry_SYSCALL_64_after_hwframe+0x49/0xbe
  RIP: 0033:0x7fae715a81fd
  Code: Bad RIP value.
  RSP: 002b:00007fae6e163cb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
  RAX: fffffffffffffe00 RBX: 00007fae6e163d30 RCX: 00007fae715a81fd
  RDX: 0000000000000010 RSI: 00007fae6e163cf0 RDI: 0000000000000003
  RBP: 00000000013413a0 R08: 00007fae68000000 R09: 0000000000000017
  R10: 0000000000000001 R11: 0000000000000293 R12: 00007fae680008c0
  R13: 00007fae6e163cf0 R14: 00007fae717c9804 R15: 00007fae6e163ed0
  CR2: 0000000100000001
  ---[ end trace b33d58d3a06782cb ]---
  RIP: 0010:rdma_user_mmap_entry_put+0xa/0x30 [ib_core]

Fixes: b86deba977 ("RDMA/core: Move core content from ib_uverbs to ib_core")
Link: https://lore.kernel.org/r/20200107162223.1745-1-shiraz.saleem@intel.com
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07 15:07:37 -04:00
Christoph Hellwig
4bdc0d676a remove ioremap_nocache and devm_ioremap_nocache
ioremap has provided non-cached semantics by default since the Linux 2.6
days, so remove the additional ioremap_nocache interface.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
2020-01-06 09:45:59 +01:00
Leon Romanovsky
ad9efa05a0 RDMA/cm: Delete unused CM ARP functions
Clean the code by deleting ARP functions, which are not called anyway.

Link: https://lore.kernel.org/r/20191212093830.316934-46-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 21:07:21 -04:00
Leon Romanovsky
017d8eada8 RDMA/cm: Delete unused CM LAP functions
Clean the code by deleting LAP functions, which are not called anyway.

Link: https://lore.kernel.org/r/20191212093830.316934-43-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 21:06:21 -04:00
Xiyu Yang
04db1580b5 RDMA/i40iw: fix a potential NULL pointer dereference
A NULL pointer can be returned by in_dev_get(). Thus add a corresponding
check so that a NULL pointer dereference will be avoided at this place.

Fixes: 8e06af711b ("i40iw: add main, hdr, status")
Link: https://lore.kernel.org/r/1577672668-46499-1-git-send-email-xiyuyang19@fudan.edu.cn
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 19:59:34 -04:00
Jiewei Ke
6ca18d8927 RDMA/rxe: Fix error type of mmap_offset
The type of mmap_offset should be u64 instead of int to match the type of
mminfo.offset. If otherwise, after we create several thousands of CQs, it
will run into overflow issues.

Link: https://lore.kernel.org/r/20191227113613.5020-1-kejiewei.cn@gmail.com
Signed-off-by: Jiewei Ke <kejiewei.cn@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 19:43:51 -04:00
zhengbin
2ab367a70a RDMA/mlx5: use true,false for bool variable
Fixes coccicheck warning:

drivers/infiniband/hw/mlx5/mr.c:150:2-26: WARNING: Assignment of 0/1 to bool variable
drivers/infiniband/hw/mlx5/mr.c:1455:2-26: WARNING: Assignment of 0/1 to bool variable
drivers/infiniband/hw/mlx5/qp.c:1874:6-20: WARNING: Assignment of 0/1 to bool variable

Link: https://lore.kernel.org/r/1577176812-2238-6-git-send-email-zhengbin13@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 19:14:00 -04:00
zhengbin
cf368beb81 RDMA/mlx4: use true,false for bool variable
Fixes coccicheck warning:

drivers/infiniband/hw/mlx4/qp.c:852:2-14: WARNING: Assignment of 0/1 to bool variable
drivers/infiniband/hw/mlx4/qp.c:3087:3-10: WARNING: Assignment of 0/1 to bool variable

Link: https://lore.kernel.org/r/1577176812-2238-5-git-send-email-zhengbin13@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 19:13:59 -04:00
zhengbin
c934833e77 IB/iser: use true,false for bool variable
Fixes coccicheck warning:

drivers/infiniband/ulp/iser/iser_memory.c:530:2-21: WARNING: Assignment of 0/1 to bool variable
drivers/infiniband/ulp/iser/iser_verbs.c:1096:2-21: WARNING: Assignment of 0/1 to bool variable

Link: https://lore.kernel.org/r/1577176812-2238-4-git-send-email-zhengbin13@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 19:13:59 -04:00
zhengbin
d09dbe74e9 IB/hfi1: use true,false for bool variable
Fixes coccicheck warning:

drivers/infiniband/hw/hfi1/rc.c:2602:1-8: WARNING: Assignment of 0/1 to bool variable

Link: https://lore.kernel.org/r/1577176812-2238-3-git-send-email-zhengbin13@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 19:13:59 -04:00
zhengbin
5369b48289 RDMA/siw: use true,false for bool variable
Fixes coccicheck warning:

drivers/infiniband/sw/siw/siw_cm.c:32:18-41: WARNING: Assignment of 0/1 to bool variable

Link: https://lore.kernel.org/r/1577176812-2238-2-git-send-email-zhengbin13@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 19:13:59 -04:00
Yishai Hadas
9ff1b6466a IB/core: Fix ODP with IB_ACCESS_HUGETLB handling
As VMAs for a given range might not be available as part of the
registration phase in ODP.

ib_init_umem_odp() considered the expected page shift value that was
previously set and initializes its internals accordingly.

If memory isn't backed by physical contiguous pages aligned to a hugepage
boundary an error will be set as part of the page fault flow and come back
to the user as some failed RDMA operation.

Fixes: 0008b84ea9 ("IB/umem: Add support to huge ODP")
Link: https://lore.kernel.org/r/20191222124649.52300-4-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 17:00:14 -04:00
Yishai Hadas
d07de8bd17 IB/core: Fix ODP get user pages flow
The nr_pages argument of get_user_pages_remote() should always be in terms
of the system page size, not the MR page size. Use PAGE_SIZE instead of
umem_odp->page_shift.

Fixes: 403cd12e2c ("IB/umem: Add contiguous ODP support")
Link: https://lore.kernel.org/r/20191222124649.52300-3-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 17:00:13 -04:00
Artemy Kovalyov
cbe4b8f0a5 IB/mlx5: Unify ODP MR code paths to allow extra flexibility
Building MR translation table in the ODP case requires additional
flexibility, namely random access to DMA addresses. Make both direct and
indirect ODP MR use same code path, separated from the non-ODP MR code
path.

With the restructuring the correct page_shift is now used around
__mlx5_ib_populate_pas().

Fixes: d2183c6f19 ("RDMA/umem: Move page_shift from ib_umem to ib_odp_umem")
Link: https://lore.kernel.org/r/20191222124649.52300-2-leon@kernel.org
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 17:00:13 -04:00
Kaike Wan
b2ff0d5101 IB/hfi1: Adjust flow PSN with the correct resync_psn
When a TID RDMA ACK to RESYNC request is received, the flow PSNs for
pending TID RDMA WRITE segments will be adjusted with the next flow
generation number, based on the resync_psn value extracted from the flow
PSN of the TID RDMA ACK packet. The resync_psn value indicates the last
flow PSN for which a TID RDMA WRITE DATA packet has been received by the
responder and the requester should resend TID RDMA WRITE DATA packets,
starting from the next flow PSN.

However, if resync_psn points to the last flow PSN for a segment and the
next segment flow PSN starts with a new generation number, use of the old
resync_psn to adjust the flow PSN for the next segment will lead to
miscalculation, resulting in WARN_ON and sge rewinding errors:

  WARNING: CPU: 4 PID: 146961 at /nfs/site/home/phcvs2/gitrepo/ifs-all/components/Drivers/tmp/rpmbuild/BUILD/ifs-kernel-updates-3.10.0_957.el7.x86_64/hfi1/tid_rdma.c:4764 hfi1_rc_rcv_tid_rdma_ack+0x8f6/0xa90 [hfi1]
  Modules linked in: ib_ipoib(OE) hfi1(OE) rdmavt(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfsv3 nfs_acl nfs lockd grace fscache iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel ib_isert iscsi_target_mod target_core_mod aesni_intel lrw gf128mul glue_helper ablk_helper cryptd rpcrdma sunrpc opa_vnic ast ttm ib_iser libiscsi drm_kms_helper scsi_transport_iscsi ipmi_ssif syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev ipmi_si pcspkr sg drm_panel_orientation_quirks ipmi_devintf lpc_ich i2c_i801 ipmi_msghandler wmi rdma_ucm ib_ucm ib_uverbs acpi_cpufreq acpi_power_meter ib_umad rdma_cm ib_cm iw_cm ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul i2c_algo_bit crct10dif_common
   crc32c_intel e1000e ib_core ahci libahci ptp libata pps_core nfit libnvdimm [last unloaded: rdmavt]
  CPU: 4 PID: 146961 Comm: kworker/4:0H Kdump: loaded Tainted: G        W  OE  ------------   3.10.0-957.el7.x86_64 #1
  Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.0X.02.0117.040420182310 04/04/2018
  Workqueue: hfi0_0 _hfi1_do_tid_send [hfi1]
  Call Trace:
   <IRQ>  [<ffffffff9e361dc1>] dump_stack+0x19/0x1b
   [<ffffffff9dc97648>] __warn+0xd8/0x100
   [<ffffffff9dc9778d>] warn_slowpath_null+0x1d/0x20
   [<ffffffffc05d28c6>] hfi1_rc_rcv_tid_rdma_ack+0x8f6/0xa90 [hfi1]
   [<ffffffffc05c21cc>] hfi1_kdeth_eager_rcv+0x1dc/0x210 [hfi1]
   [<ffffffffc05c23ef>] ? hfi1_kdeth_expected_rcv+0x1ef/0x210 [hfi1]
   [<ffffffffc0574f15>] kdeth_process_eager+0x35/0x90 [hfi1]
   [<ffffffffc0575b5a>] handle_receive_interrupt_nodma_rtail+0x17a/0x2b0 [hfi1]
   [<ffffffffc056a623>] receive_context_interrupt+0x23/0x40 [hfi1]
   [<ffffffff9dd4a294>] __handle_irq_event_percpu+0x44/0x1c0
   [<ffffffff9dd4a442>] handle_irq_event_percpu+0x32/0x80
   [<ffffffff9dd4a4cc>] handle_irq_event+0x3c/0x60
   [<ffffffff9dd4d27f>] handle_edge_irq+0x7f/0x150
   [<ffffffff9dc2e554>] handle_irq+0xe4/0x1a0
   [<ffffffff9e3795dd>] do_IRQ+0x4d/0xf0
   [<ffffffff9e36b362>] common_interrupt+0x162/0x162
   <EOI>  [<ffffffff9dfa0f79>] ? swiotlb_map_page+0x49/0x150
   [<ffffffffc05c2ed1>] hfi1_verbs_send_dma+0x291/0xb70 [hfi1]
   [<ffffffffc05c2c40>] ? hfi1_wait_kmem+0xf0/0xf0 [hfi1]
   [<ffffffffc05c3f26>] hfi1_verbs_send+0x126/0x2b0 [hfi1]
   [<ffffffffc05ce683>] _hfi1_do_tid_send+0x1d3/0x320 [hfi1]
   [<ffffffff9dcb9d4f>] process_one_work+0x17f/0x440
   [<ffffffff9dcbade6>] worker_thread+0x126/0x3c0
   [<ffffffff9dcbacc0>] ? manage_workers.isra.25+0x2a0/0x2a0
   [<ffffffff9dcc1c31>] kthread+0xd1/0xe0
   [<ffffffff9dcc1b60>] ? insert_kthread_work+0x40/0x40
   [<ffffffff9e374c1d>] ret_from_fork_nospec_begin+0x7/0x21
   [<ffffffff9dcc1b60>] ? insert_kthread_work+0x40/0x40

This patch fixes the issue by adjusting the resync_psn first if the flow
generation has been advanced for a pending segment.

Fixes: 9e93e967f7 ("IB/hfi1: Add a function to receive TID RDMA ACK packet")
Link: https://lore.kernel.org/r/20191219231920.51069.37147.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 16:48:01 -04:00
Mike Marciniszyn
4ad6429d27 IB/rdmavt: Correct comments in rdmavt_qp.h header
Comments need to be with the definition of rvt_restart_sge().

Other comments were duplicated in sw/rdmavt/rc.c and were removed.

Fixes: 385156c5f2 ("IB/hfi: Move RC functions into a header file")
Link: https://lore.kernel.org/r/20191219211934.58387.88014.stgit@awfm-01.aw.intel.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 16:44:50 -04:00
Michael J. Ruhl
44ec5aa3c6 IB/hfi1: List all receive contexts from debugfs
The current debugfs output for receive contexts (rcds), stops after the
kernel receive contexts have been displayed.  This is not enough
information to fully diagnose packet drops.

Display all of the receive contexts.

Augment the output with some more context information.

Limit the ring buffer header output to 5 entries to avoid overextending
the sequential file output.

Fixes: bf808b5039 ("IB/hfi1: Add kernel receive context info to debugfs")
Link: https://lore.kernel.org/r/20191219211928.58387.20737.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 16:44:50 -04:00
Mike Marciniszyn
2fb3b5ae1c IB/hfi1: Add accessor API routines to access context members
This patch adds a set of accessor routines to access context members.

Link: https://lore.kernel.org/r/20191219211922.58387.26548.stgit@awfm-01.aw.intel.com
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 16:44:49 -04:00
Kaike Wan
ca9033ba69 IB/hfi1: Don't cancel unused work item
In the iowait structure, two iowait_work entries were included to queue a
given object: one for normal IB operations, and the other for TID RDMA
operations. For non-TID RDMA operations, the iowait_work structure for TID
RDMA is initialized to contain a NULL function (not used). When the QP is
reset, the function iowait_cancel_work will be called to cancel any
pending work. The problem is that this function will call
cancel_work_sync() for both iowait_work entries, even though the one for
TID RDMA is not used at all. Eventually, the call cascades to
__flush_work(), wherein a WARN_ON will be triggered due to the fact that
work->func is NULL.

The WARN_ON was introduced in commit 4d43d395fe ("workqueue: Try to
catch flush_work() without INIT_WORK().")

This patch fixes the issue by making sure that a work function is present
for TID RDMA before calling cancel_work_sync in iowait_cancel_work.

Fixes: 4d43d395fe ("workqueue: Try to catch flush_work() without INIT_WORK().")
Fixes: 5da0fc9dbf ("IB/hfi1: Prepare resource waits for dual leg")
Link: https://lore.kernel.org/r/20191219211941.58387.39883.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 16:41:51 -04:00
Eugene Crosser
3593f69c55 RDMA/mlx4: Redo TX checksum offload in line with docs
Ingress checksum offload was not working for IPv6 frames because the
conditional expression that checks validation status passed from the
hardware was not matching the algorithm described in the documentation.

This patch defines L4_CSUM flag (which falls inside the badfcs_enc field
in the existing definition of the CQE layout) and replaces the conditional
expression with the one defined in the "ConnectX(r) Family Programmer's
Manual" document.

Link: https://lore.kernel.org/r/20191219134847.413582-1-leon@kernel.org
Signed-off-by: Eugene Crosser <evgenii.cherkashin@profitbricks.com>
Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 16:37:58 -04:00
Danit Goldberg
4d6e8a033f RDMA/cm: Use RCU synchronization mechanism to protect cm_id_private xa_load()
The RCU mechanism is optimized for read-mostly scenarios and therefore
more suitable to protect the cm_id_private to decrease "cm.lock"
congestion.

This patch replaces the existing spinlock locking mechanism and kfree with
RCU mechanism in places where spinlock(cm.lock) protected xa_load
returning the cm_id_priv

In addition, delete the cm_get_id() function as there is no longer a
distinction if the caller already holds the cm_lock.

Remove an open coded version of cm_get_id().

Link: https://lore.kernel.org/r/20191219134750.413429-1-leon@kernel.org
Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 16:29:44 -04:00
Aditya Pakki
9f48db0d4a RDMA/srpt: Remove unnecessary assertion in srpt_queue_response
Since ch has already been de-referenced by the time we get to the BUG_ON,
it is useless. The back trace alone is enough to tell what is going on,
delete the redundant BUG_ON.

Link: https://lore.kernel.org/r/20191217194437.25568-1-pakki001@umn.edu
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 16:11:47 -04:00
Håkon Bugge
a242c36951 RDMA/netlink: Do not always generate an ACK for some netlink operations
In rdma_nl_rcv_skb(), the local variable err is assigned the return value
of the supplied callback function, which could be one of
ib_nl_handle_resolve_resp(), ib_nl_handle_set_timeout(), or
ib_nl_handle_ip_res_resp(). These three functions all return skb->len on
success.

rdma_nl_rcv_skb() is merely a copy of netlink_rcv_skb(). The callback
functions used by the latter have the convention: "Returns 0 on success or
a negative error code".

In particular, the statement (equal for both functions):

   if (nlh->nlmsg_flags & NLM_F_ACK || err)

implies that rdma_nl_rcv_skb() always will ack a message, independent of
the NLM_F_ACK being set in nlmsg_flags or not.

The fix could be to change the above statement, but it is better to keep
the two *_rcv_skb() functions equal in this respect and instead change the
three callback functions in the rdma subsystem to the correct convention.

Fixes: 2ca546b92a ("IB/sa: Route SA pathrecord query through netlink")
Fixes: ae43f82867 ("IB/core: Add IP to GID netlink offload")
Link: https://lore.kernel.org/r/20191216120436.3204814-1-haakon.bugge@oracle.com
Suggested-by: Mark Haywood <mark.haywood@oracle.com>
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Tested-by: Mark Haywood <mark.haywood@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 16:02:32 -04:00
Prabhath Sajeepa
b5671afe5e IB/mlx5: Fix outstanding_pi index for GSI qps
Commit b0ffeb537f ("IB/mlx5: Fix iteration overrun in GSI qps") changed
the way outstanding WRs are tracked for the GSI QP. But the fix did not
cover the case when a call to ib_post_send() fails and updates index to
track outstanding.

Since the prior commmit outstanding_pi should not be bounded otherwise the
loop generate_completions() will fail.

Fixes: b0ffeb537f ("IB/mlx5: Fix iteration overrun in GSI qps")
Link: https://lore.kernel.org/r/1576195889-23527-1-git-send-email-psajeepa@purestorage.com
Signed-off-by: Prabhath Sajeepa <psajeepa@purestorage.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 15:56:41 -04:00
Bernard Metzler
58fb0b5625 RDMA/siw: Simplify QP representation
Change siw_qp to contain ib_qp. Use rdma_is_kernel_res() on contained
ib_qp to distinguish kernel level from user level applications
resources. Apply same mechanism for kernel/user level application
detection to completion queues.

Link: https://lore.kernel.org/r/20191210161729.31598-1-bmt@zurich.ibm.com
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 15:54:09 -04:00
Yixian Liu
4768820243 RDMA/hns: Simplify the calculation and usage of wqe idx for post verbs
Currently, the wqe idx is calculated repeatly everywhere it is used.  This
patch defines wqe_idx and calculated it only once, then just use it as
needed.

Fixes: 2d40788825 ("RDMA/hns: Add support for processing send wr and receive wr")
Link: https://lore.kernel.org/r/1575981902-5274-1-git-send-email-liweihang@hisilicon.com
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 15:48:48 -04:00
Selvin Xavier
53bb802315 RDMA/bnxt_re: Report more number of completion vectors
Report the the data path MSIx vectors allocated by driver as number of
completion vectors. One interrupt vector is used for Control path. So
reporting one less than the total number of MSIx vectors allocated by the
driver.

Link: https://lore.kernel.org/r/1574671174-5064-7-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 15:45:31 -04:00
Selvin Xavier
c527572358 RDMA/bnxt_re: Fix Send Work Entry state check while polling completions
Some adapters need a fence Work Entry to handle retransmission.  Currently
the driver checks for this condition, only if the Send queue entry is
signalled. Implement the condition check, irrespective of the signalled
state of the Work queue entries

Failure to add the fence can result in access to memory that is already
marked as completed, triggering data corruption, transmission failure,
IOMMU failures, etc.

Fixes: 9152e0b722 ("RDMA/bnxt_re: HW workarounds for handling specific conditions")
Link: https://lore.kernel.org/r/1574671174-5064-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 15:27:22 -04:00
Selvin Xavier
9a4467a6b2 RDMA/bnxt_re: Avoid freeing MR resources if dereg fails
The driver returns an error code for MR dereg, but frees the MR structure.
When the MR dereg is retried due to previous error, the system crashes as
the structure is already freed.

  BUG: unable to handle kernel NULL pointer dereference at 00000000000001b8
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP PTI
  CPU: 7 PID: 12178 Comm: ib_send_bw Kdump: loaded Not tainted 4.18.0-124.el8.x86_64 #1
  Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.1.10 03/10/2015
  RIP: 0010:__dev_printk+0x2a/0x70
  Code: 0f 1f 44 00 00 49 89 d1 48 85 f6 0f 84 f6 2b 00 00 4c 8b 46 70 4d 85 c0 75 04 4c 8b
46 10 48 8b 86 a8 00 00 00 48 85 c0 74 16 <48> 8b 08 0f be 7f 01 48 c7 c2 13 ac ac 83 83 ef 30 e9 10 fe ff ff
  RSP: 0018:ffffaf7c04607a60 EFLAGS: 00010006
  RAX: 00000000000001b8 RBX: ffffa0010c91c488 RCX: 0000000000000246
  RDX: ffffaf7c04607a68 RSI: ffffa0010c91caa8 RDI: ffffffff83a788eb
  RBP: ffffaf7c04607ac8 R08: 0000000000000000 R09: ffffaf7c04607a68
  R10: 0000000000000000 R11: 0000000000000001 R12: ffffaf7c04607b90
  R13: 000000000000000e R14: 0000000000000000 R15: 00000000ffffa001
  FS:  0000146fa1f1cdc0(0000) GS:ffffa0012fac0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000000001b8 CR3: 000000007680a003 CR4: 00000000001606e0
  Call Trace:
   dev_err+0x6c/0x90
   ? dev_printk_emit+0x4e/0x70
   bnxt_qplib_rcfw_send_message+0x594/0x660 [bnxt_re]
   ? dev_err+0x6c/0x90
   bnxt_qplib_free_mrw+0x80/0xe0 [bnxt_re]
   bnxt_re_dereg_mr+0x2e/0xd0 [bnxt_re]
   ib_dereg_mr+0x2f/0x50 [ib_core]
   destroy_hw_idr_uobject+0x20/0x70 [ib_uverbs]
   uverbs_destroy_uobject+0x2e/0x170 [ib_uverbs]
   __uverbs_cleanup_ufile+0x6e/0x90 [ib_uverbs]
   uverbs_destroy_ufile_hw+0x61/0x130 [ib_uverbs]
   ib_uverbs_close+0x1f/0x80 [ib_uverbs]
   __fput+0xb7/0x230
   task_work_run+0x8a/0xb0
   do_exit+0x2da/0xb40
...
  RIP: 0033:0x146fa113a387
  Code: Bad RIP value.
  RSP: 002b:00007fff945d1478 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff02
  RAX: 0000000000000000 RBX: 000055a248908d70 RCX: 0000000000000000
  RDX: 0000146fa1f2b000 RSI: 0000000000000001 RDI: 000055a248906488
  RBP: 000055a248909630 R08: 0000000000010000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: 000055a248906488
  R13: 0000000000000001 R14: 0000000000000000 R15: 000055a2489095f0

Do not free the MR structures, when driver returns error to the stack.

Fixes: 872f357824 ("RDMA/bnxt_re: Add support for MRs with Huge pages")
Link: https://lore.kernel.org/r/1574671174-5064-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 15:20:42 -04:00
Michal Kalderon
93a3d05f9d RDMA/qedr: Add kernel capability flags for dpm enabled mode
HW/FW support two types of latency enhancement features.  Until now
user-space implemented only edpm (enhanced dpm).  We add kernel capability
flags to differentiate between current FW in kernel that supports both
ldpm and edpm.  Since edpm is not yet supported for iWARP we add different
flags for iWARP + RoCE.  We also fix bad practice of defining sizes in
rdma-core and pass initialization to kernel, for forward compatibility.

The capability flags are added for backward-forward compatibility between
kernel and rdma-core for qedr.

Before this change there was a field called dpm_enabled which could hold
either 0 or 1 value, this indicated whether RoCE edpm was enabled or
not. We modified this field to be dpm_flags, and bit 1 still holds the
same meaning of RoCE edpm being enabled or not.

Link: https://lore.kernel.org/r/20191121112957.25162-1-michal.kalderon@marvell.com
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 12:37:00 -04:00
Ingo Molnar
46f5cfc13d Merge branch 'core/kprobes' into perf/core, to pick up a completed branch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25 10:43:08 +01:00
David S. Miller
ac80010fc9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Mere overlapping changes in the conflicts here.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-22 15:15:05 -08:00
Linus Torvalds
9603e22104 Pull request for 5.5-rc2
- Update Steve Wise info
 - Fix for soft-RoCE crc calculations (will break back compatibility, but
   only with the soft-RoCE driver, which has had this bug since it was
   introduced and it is an on-the-wire bug, but will make soft-RoCE fully
   compatible with real RoCE hardware)
 - cma init fixup
 - counters oops fix
 - fix for mlx4 init/teardown sequence
 - fix for mkx5 steering rules
 - introduce a cleanup API, which isn't a fix, but we want to use it in
   the next fix
 - fix for mlx5 memory management that uses API in previous patch
 
 Signed-off-by: Doug Ledford <dledford@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEErmsb2hIrI7QmWxJ0uCajMw5XL90FAl32qrIACgkQuCajMw5X
 L900Lw/+Noq2pY30TosVFd/f/8EyPH58QjsBe9UdOucYWdijD05WtjA56il8ef8b
 wsnJp48Qdo4PvhX1zvYPtV3iBlXcIAUDc0F3ZM9d1s5ppV3pvsAlSzZf4OC+yU+a
 qstoXuXyz6S7Oadja3Y94xZirIw9PWJ6MAEvlBa0ERufr42E/wdU1614I9XA88aQ
 RkbKsaCMMD68cKAUm/hjAxZef6iSya4/4xRI1lcCgJji2Qw6vDTDC6RRm2XHCKAi
 nr1D7fCIqEZikvAA+iCiw4kvTEwjwRc/igF5i9lftCfn3x118N/Kc9izswjg55l4
 Eukf9xHXXbZCfGed2a1+b6D7A0cRgrOrZkZ7FZkMOxu3eMRZUzNMd+xm8NQYi6u7
 UeXo4XtC5vfhlapqdGxHeVJnzDf3colRN0P9RkliSBmLYlXzPnyJ82leEK6P0xOh
 y2VluGkHCH/SV3rmP5TUZJGsnjPOlq+NMFOinFgjcjK8O4QXTE+4IU+66gI040dn
 wbFXeuQ1kashopr7W/cdJENvWFyl774X06XxIzIdoIyfi9TDTso2kVQJ7IiK193l
 WZe9gCfdkx+V8q8Z8INlDO4lzDmBpJszk9r7IpVPsdjuZjnUGq4H+DakP4y5cNyU
 Tj90y2NlduTVKzYMMrT4DSkiBKLHODwc9WT+5SKwp4NNzeVCTak=
 =UNwD
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Doug Ledford:
 "A small collection of -rc fixes. Mostly. One API addition, but that's
  because we wanted to use it in a fix. There's also a bug fix that is
  going to render the 5.5 kernel's soft-RoCE driver incompatible with
  all soft-RoCE versions prior, but it's required to actually implement
  the protocol according to the RoCE spec and required in order for the
  soft-RoCE driver to be able to successfully work with actual RoCE
  hardware.

  Summary:

   - Update Steve Wise info

   - Fix for soft-RoCE crc calculations (will break back compatibility,
     but only with the soft-RoCE driver, which has had this bug since it
     was introduced and it is an on-the-wire bug, but will make
     soft-RoCE fully compatible with real RoCE hardware)

   - cma init fixup

   - counters oops fix

   - fix for mlx4 init/teardown sequence

   - fix for mkx5 steering rules

   - introduce a cleanup API, which isn't a fix, but we want to use it
     in the next fix

   - fix for mlx5 memory management that uses API in previous patch"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  IB/mlx5: Fix device memory flows
  IB/core: Introduce rdma_user_mmap_entry_insert_range() API
  IB/mlx5: Fix steering rule of drop and count
  IB/mlx4: Follow mirror sequence of device add during device removal
  RDMA/counter: Prevent auto-binding a QP which are not tracked with res
  rxe: correctly calculate iCRC for unaligned payloads
  Update mailmap info for Steve Wise
  RDMA/cma: add missed unregister_pernet_subsys in init failure
2019-12-15 14:58:13 -08:00
Michael S. Tsirkin
0290bd291c netdev: pass the stuck queue to the timeout handler
This allows incrementing the correct timeout statistic without any mess.
Down the road, devices can learn to reset just the specific queue.

The patch was generated with the following script:

use strict;
use warnings;

our $^I = '.bak';

my @work = (
["arch/m68k/emu/nfeth.c", "nfeth_tx_timeout"],
["arch/um/drivers/net_kern.c", "uml_net_tx_timeout"],
["arch/um/drivers/vector_kern.c", "vector_net_tx_timeout"],
["arch/xtensa/platforms/iss/network.c", "iss_net_tx_timeout"],
["drivers/char/pcmcia/synclink_cs.c", "hdlcdev_tx_timeout"],
["drivers/infiniband/ulp/ipoib/ipoib_main.c", "ipoib_timeout"],
["drivers/infiniband/ulp/ipoib/ipoib_main.c", "ipoib_timeout"],
["drivers/message/fusion/mptlan.c", "mpt_lan_tx_timeout"],
["drivers/misc/sgi-xp/xpnet.c", "xpnet_dev_tx_timeout"],
["drivers/net/appletalk/cops.c", "cops_timeout"],
["drivers/net/arcnet/arcdevice.h", "arcnet_timeout"],
["drivers/net/arcnet/arcnet.c", "arcnet_timeout"],
["drivers/net/arcnet/com20020.c", "arcnet_timeout"],
["drivers/net/ethernet/3com/3c509.c", "el3_tx_timeout"],
["drivers/net/ethernet/3com/3c515.c", "corkscrew_timeout"],
["drivers/net/ethernet/3com/3c574_cs.c", "el3_tx_timeout"],
["drivers/net/ethernet/3com/3c589_cs.c", "el3_tx_timeout"],
["drivers/net/ethernet/3com/3c59x.c", "vortex_tx_timeout"],
["drivers/net/ethernet/3com/3c59x.c", "vortex_tx_timeout"],
["drivers/net/ethernet/3com/typhoon.c", "typhoon_tx_timeout"],
["drivers/net/ethernet/8390/8390.h", "ei_tx_timeout"],
["drivers/net/ethernet/8390/8390.h", "eip_tx_timeout"],
["drivers/net/ethernet/8390/8390.c", "ei_tx_timeout"],
["drivers/net/ethernet/8390/8390p.c", "eip_tx_timeout"],
["drivers/net/ethernet/8390/ax88796.c", "ax_ei_tx_timeout"],
["drivers/net/ethernet/8390/axnet_cs.c", "axnet_tx_timeout"],
["drivers/net/ethernet/8390/etherh.c", "__ei_tx_timeout"],
["drivers/net/ethernet/8390/hydra.c", "__ei_tx_timeout"],
["drivers/net/ethernet/8390/mac8390.c", "__ei_tx_timeout"],
["drivers/net/ethernet/8390/mcf8390.c", "__ei_tx_timeout"],
["drivers/net/ethernet/8390/lib8390.c", "__ei_tx_timeout"],
["drivers/net/ethernet/8390/ne2k-pci.c", "ei_tx_timeout"],
["drivers/net/ethernet/8390/pcnet_cs.c", "ei_tx_timeout"],
["drivers/net/ethernet/8390/smc-ultra.c", "ei_tx_timeout"],
["drivers/net/ethernet/8390/wd.c", "ei_tx_timeout"],
["drivers/net/ethernet/8390/zorro8390.c", "__ei_tx_timeout"],
["drivers/net/ethernet/adaptec/starfire.c", "tx_timeout"],
["drivers/net/ethernet/agere/et131x.c", "et131x_tx_timeout"],
["drivers/net/ethernet/allwinner/sun4i-emac.c", "emac_timeout"],
["drivers/net/ethernet/alteon/acenic.c", "ace_watchdog"],
["drivers/net/ethernet/amazon/ena/ena_netdev.c", "ena_tx_timeout"],
["drivers/net/ethernet/amd/7990.h", "lance_tx_timeout"],
["drivers/net/ethernet/amd/7990.c", "lance_tx_timeout"],
["drivers/net/ethernet/amd/a2065.c", "lance_tx_timeout"],
["drivers/net/ethernet/amd/am79c961a.c", "am79c961_timeout"],
["drivers/net/ethernet/amd/amd8111e.c", "amd8111e_tx_timeout"],
["drivers/net/ethernet/amd/ariadne.c", "ariadne_tx_timeout"],
["drivers/net/ethernet/amd/atarilance.c", "lance_tx_timeout"],
["drivers/net/ethernet/amd/au1000_eth.c", "au1000_tx_timeout"],
["drivers/net/ethernet/amd/declance.c", "lance_tx_timeout"],
["drivers/net/ethernet/amd/lance.c", "lance_tx_timeout"],
["drivers/net/ethernet/amd/mvme147.c", "lance_tx_timeout"],
["drivers/net/ethernet/amd/ni65.c", "ni65_timeout"],
["drivers/net/ethernet/amd/nmclan_cs.c", "mace_tx_timeout"],
["drivers/net/ethernet/amd/pcnet32.c", "pcnet32_tx_timeout"],
["drivers/net/ethernet/amd/sunlance.c", "lance_tx_timeout"],
["drivers/net/ethernet/amd/xgbe/xgbe-drv.c", "xgbe_tx_timeout"],
["drivers/net/ethernet/apm/xgene-v2/main.c", "xge_timeout"],
["drivers/net/ethernet/apm/xgene/xgene_enet_main.c", "xgene_enet_timeout"],
["drivers/net/ethernet/apple/macmace.c", "mace_tx_timeout"],
["drivers/net/ethernet/atheros/ag71xx.c", "ag71xx_tx_timeout"],
["drivers/net/ethernet/atheros/alx/main.c", "alx_tx_timeout"],
["drivers/net/ethernet/atheros/atl1c/atl1c_main.c", "atl1c_tx_timeout"],
["drivers/net/ethernet/atheros/atl1e/atl1e_main.c", "atl1e_tx_timeout"],
["drivers/net/ethernet/atheros/atlx/atl.c", "atlx_tx_timeout"],
["drivers/net/ethernet/atheros/atlx/atl1.c", "atlx_tx_timeout"],
["drivers/net/ethernet/atheros/atlx/atl2.c", "atl2_tx_timeout"],
["drivers/net/ethernet/broadcom/b44.c", "b44_tx_timeout"],
["drivers/net/ethernet/broadcom/bcmsysport.c", "bcm_sysport_tx_timeout"],
["drivers/net/ethernet/broadcom/bnx2.c", "bnx2_tx_timeout"],
["drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h", "bnx2x_tx_timeout"],
["drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c", "bnx2x_tx_timeout"],
["drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c", "bnx2x_tx_timeout"],
["drivers/net/ethernet/broadcom/bnxt/bnxt.c", "bnxt_tx_timeout"],
["drivers/net/ethernet/broadcom/genet/bcmgenet.c", "bcmgenet_timeout"],
["drivers/net/ethernet/broadcom/sb1250-mac.c", "sbmac_tx_timeout"],
["drivers/net/ethernet/broadcom/tg3.c", "tg3_tx_timeout"],
["drivers/net/ethernet/calxeda/xgmac.c", "xgmac_tx_timeout"],
["drivers/net/ethernet/cavium/liquidio/lio_main.c", "liquidio_tx_timeout"],
["drivers/net/ethernet/cavium/liquidio/lio_vf_main.c", "liquidio_tx_timeout"],
["drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c", "lio_vf_rep_tx_timeout"],
["drivers/net/ethernet/cavium/thunder/nicvf_main.c", "nicvf_tx_timeout"],
["drivers/net/ethernet/cirrus/cs89x0.c", "net_timeout"],
["drivers/net/ethernet/cisco/enic/enic_main.c", "enic_tx_timeout"],
["drivers/net/ethernet/cisco/enic/enic_main.c", "enic_tx_timeout"],
["drivers/net/ethernet/cortina/gemini.c", "gmac_tx_timeout"],
["drivers/net/ethernet/davicom/dm9000.c", "dm9000_timeout"],
["drivers/net/ethernet/dec/tulip/de2104x.c", "de_tx_timeout"],
["drivers/net/ethernet/dec/tulip/tulip_core.c", "tulip_tx_timeout"],
["drivers/net/ethernet/dec/tulip/winbond-840.c", "tx_timeout"],
["drivers/net/ethernet/dlink/dl2k.c", "rio_tx_timeout"],
["drivers/net/ethernet/dlink/sundance.c", "tx_timeout"],
["drivers/net/ethernet/emulex/benet/be_main.c", "be_tx_timeout"],
["drivers/net/ethernet/ethoc.c", "ethoc_tx_timeout"],
["drivers/net/ethernet/faraday/ftgmac100.c", "ftgmac100_tx_timeout"],
["drivers/net/ethernet/fealnx.c", "fealnx_tx_timeout"],
["drivers/net/ethernet/freescale/dpaa/dpaa_eth.c", "dpaa_tx_timeout"],
["drivers/net/ethernet/freescale/fec_main.c", "fec_timeout"],
["drivers/net/ethernet/freescale/fec_mpc52xx.c", "mpc52xx_fec_tx_timeout"],
["drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c", "fs_timeout"],
["drivers/net/ethernet/freescale/gianfar.c", "gfar_timeout"],
["drivers/net/ethernet/freescale/ucc_geth.c", "ucc_geth_timeout"],
["drivers/net/ethernet/fujitsu/fmvj18x_cs.c", "fjn_tx_timeout"],
["drivers/net/ethernet/google/gve/gve_main.c", "gve_tx_timeout"],
["drivers/net/ethernet/hisilicon/hip04_eth.c", "hip04_timeout"],
["drivers/net/ethernet/hisilicon/hix5hd2_gmac.c", "hix5hd2_net_timeout"],
["drivers/net/ethernet/hisilicon/hns/hns_enet.c", "hns_nic_net_timeout"],
["drivers/net/ethernet/hisilicon/hns3/hns3_enet.c", "hns3_nic_net_timeout"],
["drivers/net/ethernet/huawei/hinic/hinic_main.c", "hinic_tx_timeout"],
["drivers/net/ethernet/i825xx/82596.c", "i596_tx_timeout"],
["drivers/net/ethernet/i825xx/ether1.c", "ether1_timeout"],
["drivers/net/ethernet/i825xx/lib82596.c", "i596_tx_timeout"],
["drivers/net/ethernet/i825xx/sun3_82586.c", "sun3_82586_timeout"],
["drivers/net/ethernet/ibm/ehea/ehea_main.c", "ehea_tx_watchdog"],
["drivers/net/ethernet/ibm/emac/core.c", "emac_tx_timeout"],
["drivers/net/ethernet/ibm/emac/core.c", "emac_tx_timeout"],
["drivers/net/ethernet/ibm/ibmvnic.c", "ibmvnic_tx_timeout"],
["drivers/net/ethernet/intel/e100.c", "e100_tx_timeout"],
["drivers/net/ethernet/intel/e1000/e1000_main.c", "e1000_tx_timeout"],
["drivers/net/ethernet/intel/e1000e/netdev.c", "e1000_tx_timeout"],
["drivers/net/ethernet/intel/fm10k/fm10k_netdev.c", "fm10k_tx_timeout"],
["drivers/net/ethernet/intel/i40e/i40e_main.c", "i40e_tx_timeout"],
["drivers/net/ethernet/intel/iavf/iavf_main.c", "iavf_tx_timeout"],
["drivers/net/ethernet/intel/ice/ice_main.c", "ice_tx_timeout"],
["drivers/net/ethernet/intel/ice/ice_main.c", "ice_tx_timeout"],
["drivers/net/ethernet/intel/igb/igb_main.c", "igb_tx_timeout"],
["drivers/net/ethernet/intel/igbvf/netdev.c", "igbvf_tx_timeout"],
["drivers/net/ethernet/intel/ixgb/ixgb_main.c", "ixgb_tx_timeout"],
["drivers/net/ethernet/intel/ixgbe/ixgbe_debugfs.c", "adapter->netdev->netdev_ops->ndo_tx_timeout(adapter->netdev);"],
["drivers/net/ethernet/intel/ixgbe/ixgbe_main.c", "ixgbe_tx_timeout"],
["drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c", "ixgbevf_tx_timeout"],
["drivers/net/ethernet/jme.c", "jme_tx_timeout"],
["drivers/net/ethernet/korina.c", "korina_tx_timeout"],
["drivers/net/ethernet/lantiq_etop.c", "ltq_etop_tx_timeout"],
["drivers/net/ethernet/marvell/mv643xx_eth.c", "mv643xx_eth_tx_timeout"],
["drivers/net/ethernet/marvell/pxa168_eth.c", "pxa168_eth_tx_timeout"],
["drivers/net/ethernet/marvell/skge.c", "skge_tx_timeout"],
["drivers/net/ethernet/marvell/sky2.c", "sky2_tx_timeout"],
["drivers/net/ethernet/marvell/sky2.c", "sky2_tx_timeout"],
["drivers/net/ethernet/mediatek/mtk_eth_soc.c", "mtk_tx_timeout"],
["drivers/net/ethernet/mellanox/mlx4/en_netdev.c", "mlx4_en_tx_timeout"],
["drivers/net/ethernet/mellanox/mlx4/en_netdev.c", "mlx4_en_tx_timeout"],
["drivers/net/ethernet/mellanox/mlx5/core/en_main.c", "mlx5e_tx_timeout"],
["drivers/net/ethernet/micrel/ks8842.c", "ks8842_tx_timeout"],
["drivers/net/ethernet/micrel/ksz884x.c", "netdev_tx_timeout"],
["drivers/net/ethernet/microchip/enc28j60.c", "enc28j60_tx_timeout"],
["drivers/net/ethernet/microchip/encx24j600.c", "encx24j600_tx_timeout"],
["drivers/net/ethernet/natsemi/sonic.h", "sonic_tx_timeout"],
["drivers/net/ethernet/natsemi/sonic.c", "sonic_tx_timeout"],
["drivers/net/ethernet/natsemi/jazzsonic.c", "sonic_tx_timeout"],
["drivers/net/ethernet/natsemi/macsonic.c", "sonic_tx_timeout"],
["drivers/net/ethernet/natsemi/natsemi.c", "ns_tx_timeout"],
["drivers/net/ethernet/natsemi/ns83820.c", "ns83820_tx_timeout"],
["drivers/net/ethernet/natsemi/xtsonic.c", "sonic_tx_timeout"],
["drivers/net/ethernet/neterion/s2io.h", "s2io_tx_watchdog"],
["drivers/net/ethernet/neterion/s2io.c", "s2io_tx_watchdog"],
["drivers/net/ethernet/neterion/vxge/vxge-main.c", "vxge_tx_watchdog"],
["drivers/net/ethernet/netronome/nfp/nfp_net_common.c", "nfp_net_tx_timeout"],
["drivers/net/ethernet/nvidia/forcedeth.c", "nv_tx_timeout"],
["drivers/net/ethernet/nvidia/forcedeth.c", "nv_tx_timeout"],
["drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c", "pch_gbe_tx_timeout"],
["drivers/net/ethernet/packetengines/hamachi.c", "hamachi_tx_timeout"],
["drivers/net/ethernet/packetengines/yellowfin.c", "yellowfin_tx_timeout"],
["drivers/net/ethernet/pensando/ionic/ionic_lif.c", "ionic_tx_timeout"],
["drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c", "netxen_tx_timeout"],
["drivers/net/ethernet/qlogic/qla3xxx.c", "ql3xxx_tx_timeout"],
["drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c", "qlcnic_tx_timeout"],
["drivers/net/ethernet/qualcomm/emac/emac.c", "emac_tx_timeout"],
["drivers/net/ethernet/qualcomm/qca_spi.c", "qcaspi_netdev_tx_timeout"],
["drivers/net/ethernet/qualcomm/qca_uart.c", "qcauart_netdev_tx_timeout"],
["drivers/net/ethernet/rdc/r6040.c", "r6040_tx_timeout"],
["drivers/net/ethernet/realtek/8139cp.c", "cp_tx_timeout"],
["drivers/net/ethernet/realtek/8139too.c", "rtl8139_tx_timeout"],
["drivers/net/ethernet/realtek/atp.c", "tx_timeout"],
["drivers/net/ethernet/realtek/r8169_main.c", "rtl8169_tx_timeout"],
["drivers/net/ethernet/renesas/ravb_main.c", "ravb_tx_timeout"],
["drivers/net/ethernet/renesas/sh_eth.c", "sh_eth_tx_timeout"],
["drivers/net/ethernet/renesas/sh_eth.c", "sh_eth_tx_timeout"],
["drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c", "sxgbe_tx_timeout"],
["drivers/net/ethernet/seeq/ether3.c", "ether3_timeout"],
["drivers/net/ethernet/seeq/sgiseeq.c", "timeout"],
["drivers/net/ethernet/sfc/efx.c", "efx_watchdog"],
["drivers/net/ethernet/sfc/falcon/efx.c", "ef4_watchdog"],
["drivers/net/ethernet/sgi/ioc3-eth.c", "ioc3_timeout"],
["drivers/net/ethernet/sgi/meth.c", "meth_tx_timeout"],
["drivers/net/ethernet/silan/sc92031.c", "sc92031_tx_timeout"],
["drivers/net/ethernet/sis/sis190.c", "sis190_tx_timeout"],
["drivers/net/ethernet/sis/sis900.c", "sis900_tx_timeout"],
["drivers/net/ethernet/smsc/epic100.c", "epic_tx_timeout"],
["drivers/net/ethernet/smsc/smc911x.c", "smc911x_timeout"],
["drivers/net/ethernet/smsc/smc9194.c", "smc_timeout"],
["drivers/net/ethernet/smsc/smc91c92_cs.c", "smc_tx_timeout"],
["drivers/net/ethernet/smsc/smc91x.c", "smc_timeout"],
["drivers/net/ethernet/stmicro/stmmac/stmmac_main.c", "stmmac_tx_timeout"],
["drivers/net/ethernet/sun/cassini.c", "cas_tx_timeout"],
["drivers/net/ethernet/sun/ldmvsw.c", "sunvnet_tx_timeout_common"],
["drivers/net/ethernet/sun/niu.c", "niu_tx_timeout"],
["drivers/net/ethernet/sun/sunbmac.c", "bigmac_tx_timeout"],
["drivers/net/ethernet/sun/sungem.c", "gem_tx_timeout"],
["drivers/net/ethernet/sun/sunhme.c", "happy_meal_tx_timeout"],
["drivers/net/ethernet/sun/sunqe.c", "qe_tx_timeout"],
["drivers/net/ethernet/sun/sunvnet.c", "sunvnet_tx_timeout_common"],
["drivers/net/ethernet/sun/sunvnet_common.c", "sunvnet_tx_timeout_common"],
["drivers/net/ethernet/sun/sunvnet_common.h", "sunvnet_tx_timeout_common"],
["drivers/net/ethernet/synopsys/dwc-xlgmac-net.c", "xlgmac_tx_timeout"],
["drivers/net/ethernet/ti/cpmac.c", "cpmac_tx_timeout"],
["drivers/net/ethernet/ti/cpsw.c", "cpsw_ndo_tx_timeout"],
["drivers/net/ethernet/ti/cpsw_priv.c", "cpsw_ndo_tx_timeout"],
["drivers/net/ethernet/ti/cpsw_priv.h", "cpsw_ndo_tx_timeout"],
["drivers/net/ethernet/ti/davinci_emac.c", "emac_dev_tx_timeout"],
["drivers/net/ethernet/ti/netcp_core.c", "netcp_ndo_tx_timeout"],
["drivers/net/ethernet/ti/tlan.c", "tlan_tx_timeout"],
["drivers/net/ethernet/toshiba/ps3_gelic_net.h", "gelic_net_tx_timeout"],
["drivers/net/ethernet/toshiba/ps3_gelic_net.c", "gelic_net_tx_timeout"],
["drivers/net/ethernet/toshiba/ps3_gelic_wireless.c", "gelic_net_tx_timeout"],
["drivers/net/ethernet/toshiba/spider_net.c", "spider_net_tx_timeout"],
["drivers/net/ethernet/toshiba/tc35815.c", "tc35815_tx_timeout"],
["drivers/net/ethernet/via/via-rhine.c", "rhine_tx_timeout"],
["drivers/net/ethernet/wiznet/w5100.c", "w5100_tx_timeout"],
["drivers/net/ethernet/wiznet/w5300.c", "w5300_tx_timeout"],
["drivers/net/ethernet/xilinx/xilinx_emaclite.c", "xemaclite_tx_timeout"],
["drivers/net/ethernet/xircom/xirc2ps_cs.c", "xirc_tx_timeout"],
["drivers/net/fjes/fjes_main.c", "fjes_tx_retry"],
["drivers/net/slip/slip.c", "sl_tx_timeout"],
["include/linux/usb/usbnet.h", "usbnet_tx_timeout"],
["drivers/net/usb/aqc111.c", "usbnet_tx_timeout"],
["drivers/net/usb/asix_devices.c", "usbnet_tx_timeout"],
["drivers/net/usb/asix_devices.c", "usbnet_tx_timeout"],
["drivers/net/usb/asix_devices.c", "usbnet_tx_timeout"],
["drivers/net/usb/ax88172a.c", "usbnet_tx_timeout"],
["drivers/net/usb/ax88179_178a.c", "usbnet_tx_timeout"],
["drivers/net/usb/catc.c", "catc_tx_timeout"],
["drivers/net/usb/cdc_mbim.c", "usbnet_tx_timeout"],
["drivers/net/usb/cdc_ncm.c", "usbnet_tx_timeout"],
["drivers/net/usb/dm9601.c", "usbnet_tx_timeout"],
["drivers/net/usb/hso.c", "hso_net_tx_timeout"],
["drivers/net/usb/int51x1.c", "usbnet_tx_timeout"],
["drivers/net/usb/ipheth.c", "ipheth_tx_timeout"],
["drivers/net/usb/kaweth.c", "kaweth_tx_timeout"],
["drivers/net/usb/lan78xx.c", "lan78xx_tx_timeout"],
["drivers/net/usb/mcs7830.c", "usbnet_tx_timeout"],
["drivers/net/usb/pegasus.c", "pegasus_tx_timeout"],
["drivers/net/usb/qmi_wwan.c", "usbnet_tx_timeout"],
["drivers/net/usb/r8152.c", "rtl8152_tx_timeout"],
["drivers/net/usb/rndis_host.c", "usbnet_tx_timeout"],
["drivers/net/usb/rtl8150.c", "rtl8150_tx_timeout"],
["drivers/net/usb/sierra_net.c", "usbnet_tx_timeout"],
["drivers/net/usb/smsc75xx.c", "usbnet_tx_timeout"],
["drivers/net/usb/smsc95xx.c", "usbnet_tx_timeout"],
["drivers/net/usb/sr9700.c", "usbnet_tx_timeout"],
["drivers/net/usb/sr9800.c", "usbnet_tx_timeout"],
["drivers/net/usb/usbnet.c", "usbnet_tx_timeout"],
["drivers/net/vmxnet3/vmxnet3_drv.c", "vmxnet3_tx_timeout"],
["drivers/net/wan/cosa.c", "cosa_net_timeout"],
["drivers/net/wan/farsync.c", "fst_tx_timeout"],
["drivers/net/wan/fsl_ucc_hdlc.c", "uhdlc_tx_timeout"],
["drivers/net/wan/lmc/lmc_main.c", "lmc_driver_timeout"],
["drivers/net/wan/x25_asy.c", "x25_asy_timeout"],
["drivers/net/wimax/i2400m/netdev.c", "i2400m_tx_timeout"],
["drivers/net/wireless/intel/ipw2x00/ipw2100.c", "ipw2100_tx_timeout"],
["drivers/net/wireless/intersil/hostap/hostap_main.c", "prism2_tx_timeout"],
["drivers/net/wireless/intersil/hostap/hostap_main.c", "prism2_tx_timeout"],
["drivers/net/wireless/intersil/hostap/hostap_main.c", "prism2_tx_timeout"],
["drivers/net/wireless/intersil/orinoco/main.c", "orinoco_tx_timeout"],
["drivers/net/wireless/intersil/orinoco/orinoco_usb.c", "orinoco_tx_timeout"],
["drivers/net/wireless/intersil/orinoco/orinoco.h", "orinoco_tx_timeout"],
["drivers/net/wireless/intersil/prism54/islpci_dev.c", "islpci_eth_tx_timeout"],
["drivers/net/wireless/intersil/prism54/islpci_eth.c", "islpci_eth_tx_timeout"],
["drivers/net/wireless/intersil/prism54/islpci_eth.h", "islpci_eth_tx_timeout"],
["drivers/net/wireless/marvell/mwifiex/main.c", "mwifiex_tx_timeout"],
["drivers/net/wireless/quantenna/qtnfmac/core.c", "qtnf_netdev_tx_timeout"],
["drivers/net/wireless/quantenna/qtnfmac/core.h", "qtnf_netdev_tx_timeout"],
["drivers/net/wireless/rndis_wlan.c", "usbnet_tx_timeout"],
["drivers/net/wireless/wl3501_cs.c", "wl3501_tx_timeout"],
["drivers/net/wireless/zydas/zd1201.c", "zd1201_tx_timeout"],
["drivers/s390/net/qeth_core.h", "qeth_tx_timeout"],
["drivers/s390/net/qeth_core_main.c", "qeth_tx_timeout"],
["drivers/s390/net/qeth_l2_main.c", "qeth_tx_timeout"],
["drivers/s390/net/qeth_l2_main.c", "qeth_tx_timeout"],
["drivers/s390/net/qeth_l3_main.c", "qeth_tx_timeout"],
["drivers/s390/net/qeth_l3_main.c", "qeth_tx_timeout"],
["drivers/staging/ks7010/ks_wlan_net.c", "ks_wlan_tx_timeout"],
["drivers/staging/qlge/qlge_main.c", "qlge_tx_timeout"],
["drivers/staging/rtl8192e/rtl8192e/rtl_core.c", "_rtl92e_tx_timeout"],
["drivers/staging/rtl8192u/r8192U_core.c", "tx_timeout"],
["drivers/staging/unisys/visornic/visornic_main.c", "visornic_xmit_timeout"],
["drivers/staging/wlan-ng/p80211netdev.c", "p80211knetdev_tx_timeout"],
["drivers/tty/n_gsm.c", "gsm_mux_net_tx_timeout"],
["drivers/tty/synclink.c", "hdlcdev_tx_timeout"],
["drivers/tty/synclink_gt.c", "hdlcdev_tx_timeout"],
["drivers/tty/synclinkmp.c", "hdlcdev_tx_timeout"],
["net/atm/lec.c", "lec_tx_timeout"],
["net/bluetooth/bnep/netdev.c", "bnep_net_timeout"]
);

for my $p (@work) {
	my @pair = @$p;
	my $file = $pair[0];
	my $func = $pair[1];
	print STDERR $file , ": ", $func,"\n";
	our @ARGV = ($file);
	while (<ARGV>) {
		if (m/($func\s*\(struct\s+net_device\s+\*[A-Za-z_]?[A-Za-z-0-9_]*)(\))/) {
			print STDERR "found $1+$2 in $file\n";
		}
		if (s/($func\s*\(struct\s+net_device\s+\*[A-Za-z_]?[A-Za-z-0-9_]*)(\))/$1, unsigned int txqueue$2/) {
			print STDERR "$func found in $file\n";
		}
		print;
	}
}

where the list of files and functions is simply from:

git grep ndo_tx_timeout, with manual addition of headers
in the rare cases where the function is from a header,
then manually changing the few places which actually
call ndo_tx_timeout.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Martin Habets <mhabets@solarflare.com>

changes from v9:
	fixup a forward declaration
changes from v9:
	more leftovers from v3 change
changes from v8:
        fix up a missing direct call to timeout
        rebased on net-next
changes from v7:
	fixup leftovers from v3 change
changes from v6:
	fix typo in rtl driver
changes from v5:
	add missing files (allow any net device argument name)
changes from v4:
	add a missing driver header
changes from v3:
        change queue # to unsigned
Changes from v2:
        added headers
Changes from v1:
        Fix errors found by kbuild:
        generalize the pattern a bit, to pick up
        a couple of instances missed by the previous
        version.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-12 21:38:57 -08:00
Yishai Hadas
dc2316eba7 IB/mlx5: Fix device memory flows
Fix device memory flows so that only once there will be no live mmaped
VA to a given allocation the matching object will be destroyed.

This prevents a potential scenario that existing VA that was mmaped by
one process might still be used post its deallocation despite that it's
owned now by other process.

The above is achieved by integrating with IB core APIs to manage
mmap/munmap. Only once the refcount will become 0 the DM object and its
underlay area will be freed.

Fixes: 3b113a1ec3 ("IB/mlx5: Support device memory type attribute")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191212100237.330654-3-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-12-12 16:55:36 -05:00
Yishai Hadas
7a763d18ff IB/core: Introduce rdma_user_mmap_entry_insert_range() API
Introduce rdma_user_mmap_entry_insert_range() API to be used once the
required key for the given entry should be in a given range.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191212100237.330654-2-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-12-12 16:55:36 -05:00
Maor Gottlieb
ed9085fed9 IB/mlx5: Fix steering rule of drop and count
There are two flow rule destinations: QP and packet. While users are
setting DROP packet rule, the QP should not be set as a destination.

Fixes: 3b3233fbf0 ("IB/mlx5: Add flow counters binding support")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191212091214.315005-4-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-12-12 15:38:16 -05:00
Parav Pandit
89f988d93c IB/mlx4: Follow mirror sequence of device add during device removal
Current code device add sequence is:

ib_register_device()
ib_mad_init()
init_sriov_init()
register_netdev_notifier()

Therefore, the remove sequence should be,

unregister_netdev_notifier()
close_sriov()
mad_cleanup()
ib_unregister_device()

However it is not above.
Hence, make do above remove sequence.

Fixes: fa417f7b52 ("IB/mlx4: Add support for IBoE")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191212091214.315005-3-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-12-12 15:38:15 -05:00
Mark Zhang
33df2f1929 RDMA/counter: Prevent auto-binding a QP which are not tracked with res
Some QPs (e.g. XRC QP) are not tracked in kernel, in this case they have
an invalid res and should not be bound to any dynamically-allocated
counter in auto mode.

This fixes below call trace:
BUG: kernel NULL pointer dereference, address: 0000000000000390
PGD 80000001a7233067 P4D 80000001a7233067 PUD 1a7215067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 2 PID: 24822 Comm: ibv_xsrq_pingpo Not tainted 5.4.0-rc5+ #21
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
RIP: 0010:rdma_counter_bind_qp_auto+0x142/0x270 [ib_core]
Code: e1 48 85 c0 48 89 c2 0f 84 bc 00 00 00 49 8b 06 48 39 42 48 75 d6 40 3a aa 90 00 00 00 75 cd 49 8b 86 00 01 00 00 48 8b 4a 28 <8b> 80 90 03 00 00 39 81 90 03 00 00 75 b4 85 c0 74 b0 48 8b 04 24
RSP: 0018:ffffc900003f39c0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: ffff88820020ec00 RSI: 0000000000000004 RDI: ffffffffffffffc0
RBP: 0000000000000001 R08: ffff888224149ff0 R09: ffffc900003f3968
R10: ffffffffffffffff R11: ffff8882249c5848 R12: ffffffffffffffff
R13: ffff88821d5aca50 R14: ffff8881f7690800 R15: ffff8881ff890000
FS:  00007fe53a3e1740(0000) GS:ffff888237b00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000390 CR3: 00000001a7292006 CR4: 00000000003606a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 _ib_modify_qp+0x3a4/0x3f0 [ib_core]
 ? lookup_get_idr_uobject.part.8+0x23/0x40 [ib_uverbs]
 modify_qp+0x322/0x3e0 [ib_uverbs]
 ib_uverbs_modify_qp+0x43/0x70 [ib_uverbs]
 ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xb1/0xf0 [ib_uverbs]
 ib_uverbs_run_method+0x6be/0x760 [ib_uverbs]
 ? uverbs_disassociate_api+0xd0/0xd0 [ib_uverbs]
 ib_uverbs_cmd_verbs+0x18d/0x3a0 [ib_uverbs]
 ? get_acl+0x1a/0x120
 ? __alloc_pages_nodemask+0x15d/0x2c0
 ib_uverbs_ioctl+0xa7/0x110 [ib_uverbs]
 do_vfs_ioctl+0xa5/0x610
 ksys_ioctl+0x60/0x90
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x48/0x110
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 99fa331dc8 ("RDMA/counter: Add "auto" configuration mode support")
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Ido Kalir <idok@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191212091214.315005-2-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-12-12 15:38:15 -05:00
Ingo Molnar
eb243d1d28 x86/mm/pat: Rename <asm/pat.h> => <asm/memtype.h>
pat.h is a file whose main purpose is to provide the memtype_*() APIs.

PAT is the low level hardware mechanism - but the high level abstraction
is memtype.

So name the header <memtype.h> as well - this goes hand in hand with memtype.c
and memtype_interval.c.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 10:12:55 +01:00
Ingo Molnar
2040cf9f59 Linux 5.5-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl3tf/0eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGlKwH/3fTToujuJfTx5E5
 mrARAP65J1L/DxpEKvKRt2bNZo6w13mNd8g7ZPmYChz90bYGvXQSG8hYTU9iAw3O
 yimSTJlNXDhVAluB53XnDdUxIWC4HUZsNxWJNCeXMuiMcGNsTGX+v3f+x7oHCT0P
 jI1RSIsFGjgr0RWqZ8U5aJckQo2xABC1TfYw53K66Oc/JLZpSFJFwMgjf1fD5diU
 HGDA8E2p0u1TQIyNzr86iqMvnlSRYBQwBQn6OgEKCG4Z0NLtXfDF4mqnxsXgLmIH
 oQoFfxaMKXyGWds7ZxwcGWntALCF41ThfpiJWDIyxjWxFEty4bqTCbDPwwyp7ip0
 iuASmTI=
 =YqO2
 -----END PGP SIGNATURE-----

Merge tag 'v5.5-rc1' into core/kprobes, to resolve conflicts

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 10:11:00 +01:00
Steve Wise
2030abddec rxe: correctly calculate iCRC for unaligned payloads
If RoCE PDUs being sent or received contain pad bytes, then the iCRC
is miscalculated, resulting in PDUs being emitted by RXE with an incorrect
iCRC, as well as ingress PDUs being dropped due to erroneously detecting
a bad iCRC in the PDU.  The fix is to include the pad bytes, if any,
in iCRC computations.

Note: This bug has caused broken on-the-wire compatibility with actual
hardware RoCE devices since the soft-RoCE driver was first put into the
mainstream kernel.  Fixing it will create an incompatibility with the
original soft-RoCE devices, but is necessary to be compatible with real
hardware devices.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Steve Wise <larrystevenwise@gmail.com>
Link: https://lore.kernel.org/r/20191203020319.15036-2-larrystevenwise@gmail.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-12-09 13:55:26 -05:00
Pankaj Bharadiya
c593642c8b treewide: Use sizeof_field() macro
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().

This patch is generated using following script:

EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do

	if [[ "$file" =~ $EXCLUDE_FILES ]]; then
		continue
	fi
	sed -i  -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done

Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: David Miller <davem@davemloft.net> # for net
2019-12-09 10:36:44 -08:00
Chuhong Yuan
44a7b67590 RDMA/cma: add missed unregister_pernet_subsys in init failure
The driver forgets to call unregister_pernet_subsys() in the error path
of cma_init().
Add the missed call to fix it.

Fixes: 4be74b42a6 ("IB/cma: Separate port allocation to network namespaces")
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Link: https://lore.kernel.org/r/20191206012426.12744-1-hslester96@gmail.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-12-09 12:02:11 -05:00
Sabrina Dubroca
6c8991f415 net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup
ipv6_stub uses the ip6_dst_lookup function to allow other modules to
perform IPv6 lookups. However, this function skips the XFRM layer
entirely.

All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the
ip_route_output_key and ip_route_output helpers) for their IPv4 lookups,
which calls xfrm_lookup_route(). This patch fixes this inconsistent
behavior by switching the stub to ip6_dst_lookup_flow, which also calls
xfrm_lookup_route().

This requires some changes in all the callers, as these two functions
take different arguments and have different return types.

Fixes: 5f81bd2e5d ("ipv6: export a stub for IPv6 symbols used by vxlan")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-04 12:27:13 -08:00
Linus Torvalds
0da522107e compat_ioctl: remove most of fs/compat_ioctl.c
As part of the cleanup of some remaining y2038 issues, I came to
 fs/compat_ioctl.c, which still has a couple of commands that need support
 for time64_t.
 
 In completely unrelated work, I spent time on cleaning up parts of this
 file in the past, moving things out into drivers instead.
 
 After Al Viro reviewed an earlier version of this series and did a lot
 more of that cleanup, I decided to try to completely eliminate the rest
 of it and move it all into drivers.
 
 This series incorporates some of Al's work and many patches of my own,
 but in the end stops short of actually removing the last part, which is
 the scsi ioctl handlers. I have patches for those as well, but they need
 more testing or possibly a rewrite.
 
 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJdsHCdAAoJEJpsee/mABjZtYkP/1JGl3jFv3Iq/5BCdPkaePP1
 RtMJRNfURgK3GeuHUui330PvVjI/pLWXU/VXMK2MPTASpJLzYz3uCaZrpVWEMpDZ
 +ImzGmgJkITlW1uWU3zOcQhOxTyb1hCZ0Ci+2xn9QAmyOL7prXoXCXDWv3h6iyiF
 lwG+nW+HNtyx41YG+9bRfKNoG0ZJ+nkJ70BV6u0acQHXWn7Xuupa9YUmBL87hxAL
 6dlJfLTJg6q8QSv/Q6LxslfWk2Ti8OOJZOwtFM5R8Bgl0iUcvshiRCKfv/3t9jXD
 dJNvF1uq8z+gracWK49Qsfq5dnZ2ZxHFUo9u0NjbCrxNvWH/sdvhbaUBuJI75seH
 VIznCkdxFhrqitJJ8KmxANxG08u+9zSKjSlxG2SmlA4qFx/AoStoHwQXcogJscNb
 YIXYKmWBvwPzYu09QFAXdHFPmZvp/3HhMWU6o92lvDhsDwzkSGt3XKhCJea4DCaT
 m+oCcoACqSWhMwdbJOEFofSub4bY43s5iaYuKes+c8O261/Dwg6v/pgIVez9mxXm
 TBnvCsotq5m8wbwzv99eFqGeJH8zpDHrXxEtRR5KQqMqjLq/OQVaEzmpHZTEuK7n
 e/V/PAKo2/V63g4k6GApQXDxnjwT+m0aWToWoeEzPYXS6KmtWC91r4bWtslu3rdl
 bN65armTm7bFFR32Avnu
 =lgCl
 -----END PGP SIGNATURE-----

Merge tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground

Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann:
 "As part of the cleanup of some remaining y2038 issues, I came to
  fs/compat_ioctl.c, which still has a couple of commands that need
  support for time64_t.

  In completely unrelated work, I spent time on cleaning up parts of
  this file in the past, moving things out into drivers instead.

  After Al Viro reviewed an earlier version of this series and did a lot
  more of that cleanup, I decided to try to completely eliminate the
  rest of it and move it all into drivers.

  This series incorporates some of Al's work and many patches of my own,
  but in the end stops short of actually removing the last part, which
  is the scsi ioctl handlers. I have patches for those as well, but they
  need more testing or possibly a rewrite"

* tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits)
  scsi: sd: enable compat ioctls for sed-opal
  pktcdvd: add compat_ioctl handler
  compat_ioctl: move SG_GET_REQUEST_TABLE handling
  compat_ioctl: ppp: move simple commands into ppp_generic.c
  compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t
  compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic
  compat_ioctl: unify copy-in of ppp filters
  tty: handle compat PPP ioctls
  compat_ioctl: move SIOCOUTQ out of compat_ioctl.c
  compat_ioctl: handle SIOCOUTQNSD
  af_unix: add compat_ioctl support
  compat_ioctl: reimplement SG_IO handling
  compat_ioctl: move WDIOC handling into wdt drivers
  fs: compat_ioctl: move FITRIM emulation into file systems
  gfs2: add compat_ioctl support
  compat_ioctl: remove unused convert_in_user macro
  compat_ioctl: remove last RAID handling code
  compat_ioctl: remove /dev/raw ioctl translation
  compat_ioctl: remove PCI ioctl translation
  compat_ioctl: remove joystick ioctl translation
  ...
2019-12-01 13:46:15 -08:00
Linus Torvalds
aa32f11691 hmm related patches for 5.5
This is another round of bug fixing and cleanup. This time the focus is on
 the driver pattern to use mmu notifiers to monitor a VA range. This code
 is lifted out of many drivers and hmm_mirror directly into the
 mmu_notifier core and written using the best ideas from all the driver
 implementations.
 
 This removes many bugs from the drivers and has a very pleasing
 diffstat. More drivers can still be converted, but that is for another
 cycle.
 
 - A shared branch with RDMA reworking the RDMA ODP implementation
 
 - New mmu_interval_notifier API. This is focused on the use case of
   monitoring a VA and simplifies the process for drivers
 
 - A common seq-count locking scheme built into the mmu_interval_notifier
   API usable by drivers that call get_user_pages() or hmm_range_fault()
   with the VA range
 
 - Conversion of mlx5 ODP, hfi1, radeon, nouveau, AMD GPU, and Xen GntDev
   drivers to the new API. This deletes a lot of wonky driver code.
 
 - Two improvements for hmm_range_fault(), from testing done by Ralph
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl3cCjQACgkQOG33FX4g
 mxpp8xAAiR9iOdT28m/tx1GF31XludrMhRZVIiz0vmCIxIiAkWekWEfAEVm9PDnh
 wdrxTJohSs+B65AK3sfToOM3AIuNCuFVWmbbHI5qmOO76vaSvcZa905Z++pNsawO
 Bn8mgRCprYoFHcxWLvTvnA5U0g1S2BSSOwBSZI43CbEnVvHjYAR6MnvRqfGMk+NF
 bf8fTk/x+fl0DCemhynlBLuJkogzoE2Hgl0yPY5bFna4PktOxdpa1yPaQsiqZ7e6
 2s2NtM3pbMBJk0W42q5BU+aPhiqfxFFszasPSLBduXrD2xDsG76HJdHj5VydKmfL
 nelG4BvqJozXTEZWvTEePYhCqaZ41eJZ7Asw8BXtmacVqE5mDlTXo/Zdgbz7yEOR
 mI5MVyjD5rauZJldUOWXbwrPoWVFRvboauehiSgqvxvT9HvlFp9GKObSuu4gubBQ
 mzxs4t48tPhA7bswLmw0/pETSogFuVDfaB7hsyY0gi8EwxMFMpw2qFypm1PEEF+C
 BuUxCSShzvNKrraNe5PWaNNFd3AzIwAOWJHE+poH4bCoXQVr5nA+rq2gnHkdY5vq
 /xrBCyxkf0U05YoFGYembPVCInMehzp9Xjy8V+SueSvCg2/TYwGDCgGfsbe9dNOP
 Bc40JpS7BDn5w9nyLUJmOx7jfruNV6kx1QslA7NDDrB/rzOlsEc=
 =Hj8a
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull hmm updates from Jason Gunthorpe:
 "This is another round of bug fixing and cleanup. This time the focus
  is on the driver pattern to use mmu notifiers to monitor a VA range.
  This code is lifted out of many drivers and hmm_mirror directly into
  the mmu_notifier core and written using the best ideas from all the
  driver implementations.

  This removes many bugs from the drivers and has a very pleasing
  diffstat. More drivers can still be converted, but that is for another
  cycle.

   - A shared branch with RDMA reworking the RDMA ODP implementation

   - New mmu_interval_notifier API. This is focused on the use case of
     monitoring a VA and simplifies the process for drivers

   - A common seq-count locking scheme built into the
     mmu_interval_notifier API usable by drivers that call
     get_user_pages() or hmm_range_fault() with the VA range

   - Conversion of mlx5 ODP, hfi1, radeon, nouveau, AMD GPU, and Xen
     GntDev drivers to the new API. This deletes a lot of wonky driver
     code.

   - Two improvements for hmm_range_fault(), from testing done by Ralph"

* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  mm/hmm: remove hmm_range_dma_map and hmm_range_dma_unmap
  mm/hmm: make full use of walk_page_range()
  xen/gntdev: use mmu_interval_notifier_insert
  mm/hmm: remove hmm_mirror and related
  drm/amdgpu: Use mmu_interval_notifier instead of hmm_mirror
  drm/amdgpu: Use mmu_interval_insert instead of hmm_mirror
  drm/amdgpu: Call find_vma under mmap_sem
  nouveau: use mmu_interval_notifier instead of hmm_mirror
  nouveau: use mmu_notifier directly for invalidate_range_start
  drm/radeon: use mmu_interval_notifier_insert
  RDMA/hfi1: Use mmu_interval_notifier_insert for user_exp_rcv
  RDMA/odp: Use mmu_interval_notifier_insert()
  mm/hmm: define the pre-processor related parts of hmm.h even if disabled
  mm/hmm: allow hmm_range to be used with a mmu_interval_notifier or hmm_mirror
  mm/mmu_notifier: add an interval tree notifier
  mm/mmu_notifier: define the header pre-processor parts even if disabled
  mm/hmm: allow snapshot of the special zero page
2019-11-30 10:33:14 -08:00
Linus Torvalds
9a3d7fd275 Driver core patches for 5.5-rc1
Here is the "big" set of driver core patches for 5.5-rc1
 
 There's a few minor cleanups and fixes in here, but the majority of the
 patches in here fall into two buckets:
   - debugfs api cleanups and fixes
   - driver core device link support for boot dependancy issues
 
 The debugfs api cleanups are working to slowly refactor the debugfs apis
 so that it is even harder to use incorrectly.  That work has been
 happening for the past few kernel releases and will continue over time,
 it's a long-term project/goal
 
 The driver core device link support missed 5.4 by just a bit, so it's
 been sitting and baking for many months now.  It's from Saravana Kannan
 to help resolve the problems that DT-based systems have at boot time
 with dependancy graphs and kernel modules.  Turns out that no one has
 actually tried to build a generic arm64 kernel with loads of modules and
 have it "just work" for a variety of platforms (like a distro kernel)
 The big problem turned out to be a lack of depandancy information
 between different areas of DT entries, and the work here resolves that
 problem and now allows devices to boot properly, and quicker than a
 monolith kernel.
 
 All of these patches have been in linux-next for a long time with no
 reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXd6m6Q8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yntJQCcCqg6RQ7LTdHuZv1ETeefXlsfk00An1Jtean6
 42bWGx52bGFvAcpjWy8R
 =P7hq
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the "big" set of driver core patches for 5.5-rc1

  There's a few minor cleanups and fixes in here, but the majority of
  the patches in here fall into two buckets:

   - debugfs api cleanups and fixes

   - driver core device link support for boot dependancy issues

  The debugfs api cleanups are working to slowly refactor the debugfs
  apis so that it is even harder to use incorrectly. That work has been
  happening for the past few kernel releases and will continue over
  time, it's a long-term project/goal

  The driver core device link support missed 5.4 by just a bit, so it's
  been sitting and baking for many months now. It's from Saravana Kannan
  to help resolve the problems that DT-based systems have at boot time
  with dependancy graphs and kernel modules. Turns out that no one has
  actually tried to build a generic arm64 kernel with loads of modules
  and have it "just work" for a variety of platforms (like a distro
  kernel). The big problem turned out to be a lack of dependency
  information between different areas of DT entries, and the work here
  resolves that problem and now allows devices to boot properly, and
  quicker than a monolith kernel.

  All of these patches have been in linux-next for a long time with no
  reported issues"

* tag 'driver-core-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (68 commits)
  tracing: Remove unnecessary DEBUG_FS dependency
  of: property: Add device link support for interrupt-parent, dmas and -gpio(s)
  debugfs: Fix !DEBUG_FS debugfs_create_automount
  of: property: Add device link support for "iommu-map"
  of: property: Fix the semantics of of_is_ancestor_of()
  i2c: of: Populate fwnode in of_i2c_get_board_info()
  drivers: base: Fix Kconfig indentation
  firmware_loader: Fix labels with comma for builtin firmware
  driver core: Allow device link operations inside sync_state()
  driver core: platform: Declare ret variable only once
  cpu-topology: declare parse_acpi_topology in <linux/arch_topology.h>
  crypto: hisilicon: no need to check return value of debugfs_create functions
  driver core: platform: use the correct callback type for bus_find_device
  firmware_class: make firmware caching configurable
  driver core: Clarify documentation for fwnode_operations.add_links()
  mailbox: tegra: Fix superfluous IRQ error message
  net: caif: Fix debugfs on 64-bit platforms
  mac80211: Use debugfs_create_xul() helper
  media: c8sectpfe: no need to check return value of debugfs_create functions
  of: property: Add device link support for iommus, mboxes and io-channels
  ...
2019-11-27 11:06:20 -08:00
Linus Torvalds
d768869728 RDMA subsystem updates for 5.5
Mainly a collection of smaller of driver updates this cycle.
 
 - Various driver updates and bug fixes for siw, bnxt_re, hns, qedr,
   iw_cxgb4, vmw_pvrdma, mlx5
 
 - Improvements in SRPT from working with iWarp
 
 - SRIOV VF support for bnxt_re
 
 - Skeleton kernel-doc files for drivers/infiniband
 
 - User visible counters for events related to ODP
 
 - Common code for tracking of mmap lifetimes so that drivers can link HW
   object liftime to a VMA
 
 - ODP bug fixes and rework
 
 - RDMA READ support for efa
 
 - Removal of the very old cxgb3 driver
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl3dwFYACgkQOG33FX4g
 mxohHRAAnkUr0SKh1uV7vuhA8sUlejkwAaw2V2Sm3E1y4GiPpfRAak/FcbEu9F8l
 E7Q7VI04DRKTTpTRmREVyKYjlKr5QOA4mSryNMfBZobjkp+t++U1GC9YKL3zh65U
 odyJedtv3zKCLzV9RBwX06C8Vi1PQNQp7RbQ42xH/IyKXJIZPHeCUeKv7PsfTWVo
 vhuXFc9pPwqHDfRVTbj6s1g+OwVuRHc+SWep6eTKLGYvt8CqdN9WEpA0sJrlwPet
 NLTZTrFDpuC12RPc4Lo5c7R5MeAzHgojZCeSFaL2DhJLOx3kfmU30wG+rV94Lvsq
 Y2yt6BwKeLleKzpR5ApkuIVHQt2KECPrJbmVLDFqi+rT7yvzsd2AB+uiCS50+LPG
 h2UpWJdKWtrnSzTqbJQieXd+oT253Dk+ciy7zbdPSGPwz1dc/Cna9TTn26X2SezR
 bmRhHOykrh7LCNrv/7jiSq/zWApGTlR9YZ86x9Li+lJ3kkG+7d2DBEofDU+oKE5F
 p0+1Cer1td5IkN3N8oDpvyXAbCIv7qdWwXB2Cm7dSfUETIpAKnXiBxFCHyvmsuus
 7gGyKZh2490N3SoIL1muu12dIvhPC2Obg9ckCNU3ldw9+uPJkCzrs+n+JtkIW0mi
 i1fMjRQ6FumVT6/KdavYgeCsHmItVqv03SGdFsHF3SGwvWo6y8Y=
 =UV2u
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Again another fairly quiet cycle with few notable core code changes
  and the usual variety of driver bug fixes and small improvements.

   - Various driver updates and bug fixes for siw, bnxt_re, hns, qedr,
     iw_cxgb4, vmw_pvrdma, mlx5

   - Improvements in SRPT from working with iWarp

   - SRIOV VF support for bnxt_re

   - Skeleton kernel-doc files for drivers/infiniband

   - User visible counters for events related to ODP

   - Common code for tracking of mmap lifetimes so that drivers can link
     HW object liftime to a VMA

   - ODP bug fixes and rework

   - RDMA READ support for efa

   - Removal of the very old cxgb3 driver"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (168 commits)
  RDMA/hns: Delete unnecessary callback functions for cq
  RDMA/hns: Rename the functions used inside creating cq
  RDMA/hns: Redefine the member of hns_roce_cq struct
  RDMA/hns: Redefine interfaces used in creating cq
  RDMA/efa: Expose RDMA read related attributes
  RDMA/efa: Support remote read access in MR registration
  RDMA/efa: Store network attributes in device attributes
  IB/hfi1: remove redundant assignment to variable ret
  RDMA/bnxt_re: Fix missing le16_to_cpu
  RDMA/bnxt_re: Fix stat push into dma buffer on gen p5 devices
  RDMA/bnxt_re: Fix chip number validation Broadcom's Gen P5 series
  RDMA/bnxt_re: Fix Kconfig indentation
  IB/mlx5: Implement callbacks for getting VFs GUID attributes
  IB/ipoib: Add ndo operation for getting VFs GUID attributes
  IB/core: Add interfaces to get VF node and port GUIDs
  net/core: Add support for getting VF GUIDs
  RDMA/qedr: Fix null-pointer dereference when calling rdma_user_mmap_get_offset
  RDMA/cm: Use refcount_t type for refcount variable
  IB/mlx5: Support extended number of strides for Striding RQ
  IB/mlx4: Update HW GID table while adding vlan GID
  ...
2019-11-27 10:17:28 -08:00
Peter Zijlstra
04ae87a520 ftrace: Rework event_create_dir()
Rework event_create_dir() to use an array of static data instead of
function pointers where possible.

The problem is that it would call the function pointer on module load
before parse_args(), possibly even before jump_labels were initialized.
Luckily the generated functions don't use jump_labels but it still seems
fragile. It also gets in the way of changing when we make the module map
executable.

The generated function are basically calling trace_define_field() with a
bunch of static arguments. So instead of a function, capture these
arguments in a static array, avoiding the function call.

Now there are a number of cases where the fields are dynamic (syscall
arguments, kprobes and uprobes), in which case a static array does not
work, for these we preserve the function call. Luckily all these cases
are not related to modules and so we can retain the function call for
them.

Also fix up all broken tracepoint definitions that now generate a
compile error.

Tested-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191111132458.342979914@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-27 07:44:25 +01:00
Yixian Liu
f295e4cece RDMA/hns: Delete unnecessary callback functions for cq
Currently, when cq event occurred, we first call our own callback
functions in the event process function, then call ib callback
functions. Actually, we can directly call ib callback functions.

Link: https://lore.kernel.org/r/1574044493-46984-5-git-send-email-liweihang@hisilicon.com
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25 10:31:48 -04:00
Yixian Liu
707783ab5f RDMA/hns: Rename the functions used inside creating cq
Current names of functions are not proper, such as hns_roce_free_cq,
actually it means free cqc, thus we rename them. Furthermore, functions
used inside one file can be named without the prefix hns_roce_ which will
make the functions for verbs symbols more eye-catching.

Link: https://lore.kernel.org/r/1574044493-46984-4-git-send-email-liweihang@hisilicon.com
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25 10:31:48 -04:00
Yixian Liu
18a96d25ce RDMA/hns: Redefine the member of hns_roce_cq struct
There is no need to package buf and mtt into hns_roce_cq_buf, which will
make code more complex, just delete this struct and move buf and mtt into
hns_roce_cq. Furthermore, we add size member for hns_roce_buf to avoid
repeatly calculating where needed it.

Link: https://lore.kernel.org/r/1574044493-46984-3-git-send-email-liweihang@hisilicon.com
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25 10:31:48 -04:00
Yixian Liu
e2b2744a06 RDMA/hns: Redefine interfaces used in creating cq
Some interfaces defined with unnecessary input parameters, such as "nent"
and "vector". This patch redefined these interfaces to make the code more
readable and simple.

Link: https://lore.kernel.org/r/1574044493-46984-2-git-send-email-liweihang@hisilicon.com
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25 10:31:48 -04:00
Daniel Kranzdorf
666e8ff535 RDMA/efa: Expose RDMA read related attributes
Query the device attributes for RDMA operations, including maximum
transfer size and maximum number of SGEs per RDMA WR, and report them
back to the userspace library.

Link: https://lore.kernel.org/r/20191121141509.59297-4-galpress@amazon.com
Signed-off-by: Daniel Kranzdorf <dkkranzd@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25 10:31:48 -04:00
Daniel Kranzdorf
e6c4f3ff43 RDMA/efa: Support remote read access in MR registration
Enable remote read access for memory regions in order to support RDMA
operations.

Link: https://lore.kernel.org/r/20191121141509.59297-3-galpress@amazon.com
Signed-off-by: Daniel Kranzdorf <dkkranzd@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25 10:31:47 -04:00
Gal Pressman
bcf7cc534c RDMA/efa: Store network attributes in device attributes
There's no reason to separate the network attributes from all other
device attributes. Embed the fields inside the device attributes and
query them all in one function.

Link: https://lore.kernel.org/r/20191121141509.59297-2-galpress@amazon.com
Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25 10:31:47 -04:00
Colin Ian King
25d24f4241 IB/hfi1: remove redundant assignment to variable ret
The variable ret is being initialized with a value that is never
read and it is being updated later with a new value. The
initialization is redundant and can be removed.

Link: https://lore.kernel.org/r/20191122154814.87257-1-colin.king@canonical.com
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25 10:31:47 -04:00
Devesh Sharma
fca5b9dc09 RDMA/bnxt_re: Fix missing le16_to_cpu
From sparse:

drivers/infiniband/hw/bnxt_re/main.c:1274:18: warning: cast from restricted __le16
drivers/infiniband/hw/bnxt_re/main.c:1275:18: warning: cast from restricted __le16
drivers/infiniband/hw/bnxt_re/main.c:1276:18: warning: cast from restricted __le16
drivers/infiniband/hw/bnxt_re/main.c:1277:21: warning: restricted __le16 degrades to integer

Fixes: 2b827ea192 ("RDMA/bnxt_re: Query HWRM Interface version from FW")
Link: https://lore.kernel.org/r/1574317343-23300-4-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25 10:31:47 -04:00
Devesh Sharma
98998ffe52 RDMA/bnxt_re: Fix stat push into dma buffer on gen p5 devices
Due to recent advances in the firmware for Broadcom's gen p5 series of
adaptors the driver code to report hardware counters has been broken
w.r.t. roce devices.

The new firmware command expects dma length to be specified during stat
dma buffer allocation.

Fixes: 2792b5b95e ("bnxt_en: Update firmware interface spec. to 1.10.0.89.")
Link: https://lore.kernel.org/r/1574317343-23300-3-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25 10:31:47 -04:00
Luke Starrett
e284b159c6 RDMA/bnxt_re: Fix chip number validation Broadcom's Gen P5 series
In the first version of Gen P5 ASIC, chip-id was always set to 0x1750 for
all adaptor port configurations. This has been fixed in the new chip rev.

Due to this missing fix users are not able to use adaptors based on latest
chip rev of Broadcom's Gen P5 adaptors.

Fixes: ae8637e131 ("RDMA/bnxt_re: Add chip context to identify 57500 series")
Link: https://lore.kernel.org/r/1574317343-23300-2-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Luke Starrett <luke.starrett@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25 10:31:47 -04:00
Krzysztof Kozlowski
6e419e35e6 RDMA/bnxt_re: Fix Kconfig indentation
Adjust indentation from spaces to tab (+optional two spaces) as in coding
style with command like:
	$ sed -e 's/^        /\t/' -i */Kconfig

Link: https://lore.kernel.org/r/20191120134138.15245-1-krzk@kernel.org
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25 10:31:47 -04:00
Jason Gunthorpe
3694e41e41 Merge branch 'ib-guids' into rdma.git for-next
Danit Goldberg says:

====================
This series extends RTNETLINK to provide IB port and node GUIDs, which
were configured for Infiniband VFs.

The functionality to set VF GUIDs already existed for a long time, and
here we are adding the missing "get" so that netlink will be symmetric and
various cloud orchestration tools will be able to manage such VFs more
naturally.

The iproute2 was extended too to present those GUIDs.

- ip link show <device>

For example:
- ip link set ib4 vf 0 node_guid 22:44:33:00:33:11:00:33
- ip link set ib4 vf 0 port_guid 10:21:33:12:00:11:22:10
- ip link show ib4
    ib4: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN mode DEFAULT group default qlen 256
    link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    vf 0     link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
    spoof checking off, NODE_GUID 22:44:33:00:33:11:00:33, PORT_GUID 10:21:33:12:00:11:22:10, link-state disable, trust off, query_rss off
====================

Based on the mlx5-next branch from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for
dependencies

* branch 'ib-guids': (35 commits)
  IB/mlx5: Implement callbacks for getting VFs GUID attributes
  IB/ipoib: Add ndo operation for getting VFs GUID attributes
  IB/core: Add interfaces to get VF node and port GUIDs
  net/core: Add support for getting VF GUIDs

  net/mlx5: Add new chain for netfilter flow table offload
  net/mlx5: Refactor creating fast path prio chains
  net/mlx5: Accumulate levels for chains prio namespaces
  net/mlx5: Define fdb tc levels per prio
  net/mlx5: Rename FDB_* tc related defines to FDB_TC_* defines
  net/mlx5: Simplify fdb chain and prio eswitch defines
  IB/mlx5: Load profile according to RoCE enablement state
  IB/mlx5: Rename profile and init methods
  net/mlx5: Handle "enable_roce" devlink param
  net/mlx5: Document flow_steering_mode devlink param
  devlink: Add new "enable_roce" generic device param
  net/mlx5: fix spelling mistake "metdata" -> "metadata"
  net/mlx5: fix kvfree of uninitialized pointer spec
  IB/mlx5: Introduce and use mlx5_core_is_vf()
  net/mlx5: E-switch, Enable metadata on own vport
  net/mlx5: Refactor ingress acl configuration
  ...

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25 10:31:47 -04:00
Jason Gunthorpe
3889551db2 RDMA/hfi1: Use mmu_interval_notifier_insert for user_exp_rcv
This converts one of the two users of mmu_notifiers to use the new API.
The conversion is fairly straightforward, however the existing use of
notifiers here seems to be racey.

Link: https://lore.kernel.org/r/20191112202231.3856-7-jgg@ziepe.ca
Tested-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-23 19:56:44 -04:00
Jason Gunthorpe
f25a546e65 RDMA/odp: Use mmu_interval_notifier_insert()
Replace the internal interval tree based mmu notifier with the new common
mmu_interval_notifier_insert() API. This removes a lot of code and fixes a
deadlock that can be triggered in ODP:

 zap_page_range()
  mmu_notifier_invalidate_range_start()
   [..]
    ib_umem_notifier_invalidate_range_start()
       down_read(&per_mm->umem_rwsem)
  unmap_single_vma()
    [..]
      __split_huge_page_pmd()
        mmu_notifier_invalidate_range_start()
        [..]
           ib_umem_notifier_invalidate_range_start()
              down_read(&per_mm->umem_rwsem)   // DEADLOCK

        mmu_notifier_invalidate_range_end()
           up_read(&per_mm->umem_rwsem)
  mmu_notifier_invalidate_range_end()
     up_read(&per_mm->umem_rwsem)

The umem_rwsem is held across the range_start/end as the ODP algorithm for
invalidate_range_end cannot tolerate changes to the interval
tree. However, due to the nested invalidation regions the second
down_read() can deadlock if there are competing writers. The new core code
provides an alternative scheme to solve this problem.

Fixes: ca748c39ea ("RDMA/umem: Get rid of per_mm->notifier_count")
Link: https://lore.kernel.org/r/20191112202231.3856-6-jgg@ziepe.ca
Tested-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-23 19:56:44 -04:00
Taehee Yoo
ab818362c9 net: use rhashtable_lookup() instead of rhashtable_lookup_fast()
rhashtable_lookup_fast() internally calls rcu_read_lock() then,
calls rhashtable_lookup(). So if rcu_read_lock() is already held,
rhashtable_lookup() is enough.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-23 12:15:01 -08:00
Danit Goldberg
9c0015ef09 IB/mlx5: Implement callbacks for getting VFs GUID attributes
Implement the IB defined callback mlx5_ib_get_vf_guid used to query FW
for VFs attributes and return node and port GUIDs.

Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-11-22 18:17:24 +02:00
Danit Goldberg
2446887ed2 IB/ipoib: Add ndo operation for getting VFs GUID attributes
Add ndo operation to the network driver that enables configuring
ipoib_get_vf_guid operation. The operation allows to get a VF port
and node GUIDs.

Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-11-22 18:17:24 +02:00
Danit Goldberg
bfcb3c5d14 IB/core: Add interfaces to get VF node and port GUIDs
Provide ability to get node and port GUIDs of VFs to be symmetrical
to already existing set option.

Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-11-22 18:17:24 +02:00
Michal Kalderon
a25984f3ba RDMA/qedr: Fix null-pointer dereference when calling rdma_user_mmap_get_offset
When running against rdma-core that doesn't support doorbell recovery, the
rdma_user_mmap_entry won't be allocated for doorbell recovery related
mappings.

We have a flag indicating whether rdma-core supports doorbell recovery or
not which was used during initialization, however some cases didn't check
that the rdma_user_mmap_entry exists before attempting to acquire it's
offset.

Fixes: 97f6125092 ("RDMA/qedr: Add doorbell overflow recovery support")
Link: https://lore.kernel.org/r/20191118150645.26602-1-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-19 16:15:36 -04:00
Danit Goldberg
0acc637dac RDMA/cm: Use refcount_t type for refcount variable
This atomic in struct cm_id_private is being used as a refcount, change it
to refcount_t for better clarity and to get the refcount protections.

Link: https://lore.kernel.org/r/1573997601-4502-1-git-send-email-danitg@mellanox.com
Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-19 16:10:04 -04:00
Mark Zhang
c16339b69c IB/mlx5: Support extended number of strides for Striding RQ
Extends the minimum single WQE strides from 64 to 8, which is exposed
by the "min_single_wqe_log_num_of_strides" field of striding_rq_caps.
Choose right number of strides based on FW capability.

Link: https://lore.kernel.org/r/20191115154555.247856-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-19 16:05:41 -04:00
Danit Goldberg
ff3195b3ed IB/mlx4: Update HW GID table while adding vlan GID
When adding a new GID compare the vlan along with the GID and type. This
allows vlan's to have GIDs that alias each other, such as the default
GID. Otherwise they the GID cache view can become inconsistent with the HW
view.

Link: https://lore.kernel.org/r/20191115154457.247763-1-leon@kernel.org
Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-19 15:58:55 -04:00
Christophe JAILLET
9067f2f0b4 RDMA/iw_cgxb4: Fix an error handling path in 'c4iw_connect()'
We should jump to fail3 in order to undo the 'xa_insert_irq()' call.

Link: https://lore.kernel.org/r/20190923190746.10964-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-17 10:37:00 -04:00
Dag Moxnes
e1ee1e62be RDMA/cma: Use ACK timeout for RoCE packetLifeTime
The cma is currently using a hard-coded value, CMA_IBOE_PACKET_LIFETIME,
for the PacketLifeTime, as it can not be determined from the network.
This value might not be optimal for all networks.

The cma module supports the function rdma_set_ack_timeout to set the ACK
timeout for a QP associated with a connection. As per IBTA 12.7.34 local
ACK timeout = (2 * PacketLifeTime + Local CA’s ACK delay).  Assuming a
negligible local ACK delay, we can use PacketLifeTime = local ACK
timeout/2 as a reasonable approximation for RoCE networks.

Link: https://lore.kernel.org/r/1572439440-17416-1-git-send-email-dag.moxnes@oracle.com
Signed-off-by: Dag Moxnes <dag.moxnes@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-17 10:37:00 -04:00
Christoph Hellwig
72b894b09a IB/umem: remove the dmasync argument to ib_umem_get
The argument is always ignored, so remove it.

Link: https://lore.kernel.org/r/20191113073214.9514-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-17 10:37:00 -04:00
David S. Miller
19b7e21c55 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Lots of overlapping changes and parallel additions, stuff
like that.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16 21:51:42 -08:00
Christoph Hellwig
7283fff8b5 dma-mapping: remove the DMA_ATTR_WRITE_BARRIER flag
This flag is not implemented by any backend and only set by the ib_umem
module in a single instance.

Link: https://lore.kernel.org/r/20191113073214.9514-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-14 12:01:54 -04:00
Gal Pressman
64c264872b RDMA/efa: Clear the admin command buffer prior to its submission
We cannot rely on the entry memcpy as we only copy the actual size of the
command, the rest of the bytes must be memset to zero.

Currently providing non-zero memory will not have any user visible impact.
However, since admin commands are extendable (in a backwards compatible
way) everything beyond the size of the command must be cleared to prevent
issues in the future.

Fixes: 0420e54256 ("RDMA/efa: Implement functions that submit and complete admin commands")
Link: https://lore.kernel.org/r/20191112092608.46964-1-galpress@amazon.com
Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com>
Reviewed-by: Firas JahJah <firasj@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-14 11:57:33 -04:00
Bernard Metzler
289b20b2a5 RDMA/siw: Cleanup unused mmap structures.
Removes obsolete driver specific mmap information after
generalization of RDMA driver mmap service. Also removes
useless forward declaration of struct siw_mr.

Fixes: 11f1a75567 ("RDMA/siw: Use the common mmap_xa helpers")
Link: https://lore.kernel.org/r/20191113153404.7402-1-bmt@zurich.ibm.com
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-14 11:56:11 -04:00
Kamal Heib
9a5407d74c RDMA/qedr: Make qedr_iw_load_qp() static
The function qedr_iw_load_qp() is only used in qedr_iw_cm.c

Fixes: 82af6d19d8 ("RDMA/qedr: Fix synchronization methods and memory leaks in qedr")
Link: https://lore.kernel.org/r/20191110113645.20058-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-14 11:55:31 -04:00
Colin Ian King
6296665cee RDMA/ocrdma: Fix spelling mistake in variable name
There is a spelling mistake in the variable nak_invalid_requst_errors,
rename it to nak_invalid_request_errors.

Link: https://lore.kernel.org/r/20191107224855.417647-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-14 11:51:42 -04:00
Viresh Kumar
7ee23491b3 RDMA/qib: Validate ->show()/store() callbacks before calling them
The permissions of the read-only or write-only sysfs files can be
changed (as root) and the user can then try to read a write-only file or
write to a read-only file which will lead to kernel crash here.

Protect against that by always validating the show/store callbacks.

Link: https://lore.kernel.org/r/d45cc26361a174ae12dbb86c994ef334d257924b.1573096807.git.viresh.kumar@linaro.org
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-14 11:49:15 -04:00
Pan Bian
da046d5f89 RDMA/i40iw: Fix potential use after free
Release variable dst after logging dst->error to avoid possible use after
free.

Link: https://lore.kernel.org/r/1573022651-37171-1-git-send-email-bianpan2016@163.com
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-14 11:47:18 -04:00
Pan Bian
960657b732 RDMA/qedr: Fix potential use after free
Move the release operation after error log to avoid possible use after
free.

Link: https://lore.kernel.org/r/1573021434-18768-1-git-send-email-bianpan2016@163.com
Signed-off-by: Pan Bian <bianpan2016@163.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-14 11:47:18 -04:00
Saeed Mahameed
c94ef13b04 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
1) New generic devlink param "enable_roce", for downstream devlink
   reload support

2) Do vport ACL configuration on per vport basis when
   enabling/disabling a vport. This enables to have vports enabled/disabled
   outside of eswitch config for future

3) Split the code for legacy vs offloads mode and make it clear

4) Tide up vport locking and workqueue usage

5) Fix metadata enablement for ECPF

6) Make explicit use of VF property to publish IB_DEVICE_VIRTUAL_FUNCTION

7) E-Switch and flow steering core low level support and refactoring for
   netfilter flowtables offload

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-13 14:24:58 -08:00
Bart Van Assche
e88982ad1b RDMA/srpt: Report the SCSI residual to the initiator
The code added by this patch is similar to the code that already exists in
ibmvscsis_determine_resid(). This patch has been tested by running the
following command:

strace sg_raw -r 1k /dev/sdb 12 00 00 00 60 00 -o inquiry.bin |&
    grep resid=

Link: https://lore.kernel.org/r/20191105214632.183302-1-bvanassche@acm.org
Fixes: a42d985bd5 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Honggang Li <honli@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-13 15:50:41 -04:00
Yevgeny Kliteynik
208d70f562 IB/mlx5: Support flow counters offset for bulk counters
Add support for flow steering counters action with a non-base counter
ID (offset) for bulk counters.

When creating a flow counter object, save the bulk value.  This value is
used when a flow action with a non-base counter ID is requested - to
validate that the required offset is in the range of the allocated bulk.

Link: https://lore.kernel.org/r/20191103140723.77411-1-leon@kernel.org
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-13 15:42:36 -04:00
Leon Romanovsky
e26e7b88f6 RDMA: Change MAD processing function to remove extra casting and parameter
All users of process_mad() converts input pointers from ib_mad_hdr to be
ib_mad, update the function declaration to use ib_mad directly.

Also remove not used input MAD size parameter.

Link: https://lore.kernel.org/r/20191029062745.7932-17-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Tested-By: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-12 20:20:15 -04:00
Leon Romanovsky
333ee7e2d0 RDMA/hfi1: Delete unreachable code
All callers allocate MAD structures with proper sizes, there is no need to
recheck it.

Link: https://lore.kernel.org/r/20191029062745.7932-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-12 20:20:14 -04:00
Michael Guralnik
94de879c28 IB/mlx5: Load profile according to RoCE enablement state
When RoCE is disabled load mlx5_ib in raw_eth profile.
Clean pf_profile roce capability checks as it will not be used without
roce capability.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-11 12:15:29 -08:00
Michael Guralnik
b5a498baf9 IB/mlx5: Rename profile and init methods
Rename uplink_rep_profile and its unique init and cleanup stages to
suit its upcoming use as the profile when RoCE is disabled.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-11 12:15:29 -08:00
Wenpeng Liang
d11769fdc1 RDMA/hns: Modify appropriate printings
Modify some printings that is not in uniformed style, non-standard or with
spelling errors.

Link: https://lore.kernel.org/r/1572952082-6681-10-git-send-email-liweihang@hisilicon.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-08 16:37:55 -04:00
Yixian Liu
1ceb0b11a8 RDMA/hns: Fix non-standard error codes
It is better to return a linux error code than define a private constant.

Link: https://lore.kernel.org/r/1572952082-6681-9-git-send-email-liweihang@hisilicon.com
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-08 16:37:54 -04:00
Lang Cheng
301cc7eb2c RDMA/hns: Modify hns_roce_hw_v2_get_cfg to simplify the code
Merge base configuration of hr_dev into hns_roce_hw_v2_get_cfg(). In
addition, there is no need to return 0 at last, so we change return type
of it to void.

Link: https://lore.kernel.org/r/1572952082-6681-8-git-send-email-liweihang@hisilicon.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-08 16:37:54 -04:00
Lang Cheng
880f133c60 RDMA/hns: Simplify doorbell initialization code
If a variable needs to be set to 0 before use, it can be directly
initialized to 0.

Link: https://lore.kernel.org/r/1572952082-6681-7-git-send-email-liweihang@hisilicon.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-08 16:37:54 -04:00
Yixing Liu
6eef524201 RDMA/hns: Replace not intuitive function/macro names
Replace "sw2hw" and "hw2sw" which is hard to understand with "create" and
"destroy".

Link: https://lore.kernel.org/r/1572952082-6681-6-git-send-email-liweihang@hisilicon.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-08 16:37:54 -04:00
Yixian Liu
d938d7856f RDMA/hns: Modify fields of struct hns_roce_srq
Use wqe_cnt instead of max which means the queue size of srq, and remove
wqe_ctr which is not used.

Link: https://lore.kernel.org/r/1572952082-6681-5-git-send-email-liweihang@hisilicon.com
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-08 16:37:54 -04:00
Yixian Liu
03ccba5c2c RDMA/hns: Delete unnecessary uar from hns_roce_cq
The uar information is already recorded in priv_uar of hns_roce_dev, there
is no need to record it in hns_roce_cq again.

Link: https://lore.kernel.org/r/1572952082-6681-4-git-send-email-liweihang@hisilicon.com
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-08 16:37:53 -04:00
Lang Cheng
16a11e0bff RDMA/hns: Remove unnecessary structure hns_roce_sqp
Special QP have no differences with normal qp in data structure, so
definition of struct hns_roce_sqp should be removed and replaced by struct
hns_roce_qp.

Link: https://lore.kernel.org/r/1572952082-6681-3-git-send-email-liweihang@hisilicon.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-08 16:37:53 -04:00
Yixian Liu
ec6adad0a1 RDMA/hns: Delete unnecessary variable max_post
There is no need to define max_post in hns_roce_wq, as it does same thing
as wqe_cnt.

Link: https://lore.kernel.org/r/1572952082-6681-2-git-send-email-liweihang@hisilicon.com
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-08 16:37:53 -04:00
Leon Romanovsky
ffa2fd1323 RDMA/mlx5: Rewrite MAD processing logic to be readable
Multiple "if"s and "||" make extension of process_mad() function
as a tedious task, rewrite that function to be more readable.

Link: https://lore.kernel.org/r/20191029062745.7932-14-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 16:00:06 -04:00
Leon Romanovsky
84b56d57cf RDMA/ocrdma: Simplify process_mad function
Change the switch with one case into a simple if statement so the code is
less confusing.

Link: https://lore.kernel.org/r/20191029062745.7932-12-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 16:00:06 -04:00
Leon Romanovsky
dd0b0159f7 RDMA/mad: Do not check MAD sizes in roce and ib drivers
All callers for process_mad allocate MAD structures with proper sizes,
there is no need to recheck it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 15:55:22 -04:00
Leon Romanovsky
6a42265c91 RDMA/ocrdma: Make ocrdma_pma_counters() return void
This function always returns 0, so just use void and remove the bogus
checking at the only call site.

Link: https://lore.kernel.org/r/20191029062745.7932-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 15:38:57 -04:00
Leon Romanovsky
be4a8d4673 RDMA/mad: Allocate zeroed MAD buffer
Ensure that MAD output buffer is zero-based allocated in all the callers
of process_mad and remove the various memset()'s from the drivers.

Link: https://lore.kernel.org/r/20191029062745.7932-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 15:38:28 -04:00
Leon Romanovsky
874e476ba9 RDMA/qib: Delete empty check_cc_key function
Function always returns zero, just delete it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 15:25:24 -04:00
Leon Romanovsky
688eec9d3d RDMA/qib: Delete extra line
Trivial cleanup to fix the following warning:
 drivers/infiniband/hw/qib/qib_iba6120.c:1420: warning: bad line:

Fixes: f931551baf ("IB/qib: Add new qib driver for QLogic PCIe InfiniBand adapters")
Link: https://lore.kernel.org/r/20191029062745.7932-15-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 15:19:33 -04:00
Leon Romanovsky
8a80cf9310 RDMA/mad: Delete never implemented functions
Delete never implemented and used MAD functions.

Link: https://lore.kernel.org/r/20191029062745.7932-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 15:14:23 -04:00
Bart Van Assche
77cf98d4ec Revert "RDMA/srpt: Postpone HCA removal until after configfs directory removal"
Although the mentioned patch fixes a use-after-free bug, it introduces a
hang during shutdown. Since the latter is worse, revert this patch.

Link: https://lore.kernel.org/r/20191101204756.182162-1-bvanassche@acm.org
Reported-by: Honggang Li <honli@redhat.com>
Fixes: 9b64f7d0bb ("RDMA/srpt: Postpone HCA removal until after configfs directory removal")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Honggang Li <honli@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 15:13:03 -04:00
Wenpeng Liang
411c1e6774 RDMA/hns: Correct the value of srq_desc_size
srq_desc_size should be rounded up to pow of two before used, or related
calculation may cause allocating wrong size of memory for srq buffer.

Fixes: c7bcb13442 ("RDMA/hns: Add SRQ support for hip08 kernel mode")
Link: https://lore.kernel.org/r/1572575610-52530-3-git-send-email-liweihang@hisilicon.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:37:02 -04:00
Sirong Wang
531eb45b3d RDMA/hns: Correct the value of HNS_ROCE_HEM_CHUNK_LEN
Size of pointer to buf field of struct hns_roce_hem_chunk should be
considered when calculating HNS_ROCE_HEM_CHUNK_LEN, or sg table size will
be larger than expected when allocating hem.

Fixes: 9a4435375c ("IB/hns: Add driver files for hns RoCE driver")
Link: https://lore.kernel.org/r/1572575610-52530-2-git-send-email-liweihang@hisilicon.com
Signed-off-by: Sirong Wang <wangsirong@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:37:02 -04:00
Kamal Heib
ad0593ec89 RDMA/qedr: Remove unsupported modify_port callback
There is no need to return always zero for function which is not
supported.

Fixes: ac1b36e55a ("qedr: Add support for user context verbs")
Link: https://lore.kernel.org/r/20191028155931.1114-5-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:30:24 -04:00
Kamal Heib
6135b71159 RDMA/ocrdma: Remove unsupported modify_port callback
There is no need to return always zero for function which is not
supported.

Fixes: fe2caefcdf ("RDMA/ocrdma: Add driver for Emulex OneConnect IBoE RDMA adapter")
Link: https://lore.kernel.org/r/20191028155931.1114-4-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:29:15 -04:00
Kamal Heib
25f3b49b92 RDMA/hns: Remove unsupported modify_port callback
There is no need to return always zero for function which is not
supported.

Fixes: 9a4435375c ("IB/hns: Add driver files for hns RoCE driver")
Link: https://lore.kernel.org/r/20191028155931.1114-3-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:29:15 -04:00
Kamal Heib
55bfe905fa RDMA/core: Fix return code when modify_port isn't supported
Improve return code from ib_modify_port() by doing the following:
 - Use "-EOPNOTSUPP" instead "-ENOSYS" which is the proper return code

 - Allow only fake IB_PORT_CM_SUP manipulation for RoCE providers that
   didn't implement the modify_port callback, otherwise return
   "-EOPNOTSUPP"

Fixes: 61e0962d52 ("IB: Avoid ib_modify_port() failure for RoCE devices")
Link: https://lore.kernel.org/r/20191028155931.1114-2-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:29:15 -04:00
Kaike Wan
ce8e8087cf IB/hfi1: TID RDMA WRITE should not return IB_WC_RNR_RETRY_EXC_ERR
Normal RDMA WRITE request never returns IB_WC_RNR_RETRY_EXC_ERR to ULPs
because it does not need post receive buffer on the responder side.
Consequently, as an enhancement to normal RDMA WRITE request inside the
hfi1 driver, TID RDMA WRITE request should not return such an error status
to ULPs, although it does receive RNR NAKs from the responder when TID
resources are not available. This behavior is violated when
qp->s_rnr_retry_cnt is set in current hfi1 implementation.

This patch enforces these semantics by avoiding any reaction to the updates
of the RNR QP attributes.

Fixes: 3c6cb20a0d ("IB/hfi1: Add TID RDMA WRITE functionality into RDMA verbs")
Link: https://lore.kernel.org/r/20191025195842.106825.71532.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:15:36 -04:00
Kaike Wan
c2be3865a1 IB/hfi1: Calculate flow weight based on QP MTU for TID RDMA
For a TID RDMA WRITE request, a QP on the responder side could be put into
a queue when a hardware flow is not available. A RNR NAK will be returned
to the requester with a RNR timeout value based on the position of the QP
in the queue. The tid_rdma_flow_wt variable is used to calculate the
timeout value and is determined by using a MTU of 4096 at the module
loading time. This could reduce the timeout value by half from the desired
value, leading to excessive RNR retries.

This patch fixes the issue by calculating the flow weight with the real
MTU assigned to the QP.

Fixes: 07b923701e ("IB/hfi1: Add functions to receive TID RDMA WRITE request")
Link: https://lore.kernel.org/r/20191025195836.106825.77769.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:15:36 -04:00
Kaike Wan
c1abd865bd IB/hfi1: Ensure r_tid_ack is valid before building TID RDMA ACK packet
The index r_tid_ack is used to indicate the next TID RDMA WRITE request to
acknowledge in the ring s_ack_queue[] on the responder side and should be
set to a valid index other than its initial value before r_tid_tail is
advanced to the next TID RDMA WRITE request and particularly before a TID
RDMA ACK is built. Otherwise, a NULL pointer dereference may result:

  BUG: unable to handle kernel paging request at ffff9a32d27abff8
  IP: [<ffffffffc0d87ea6>] hfi1_make_tid_rdma_pkt+0x476/0xcb0 [hfi1]
  PGD 2749032067 PUD 0
  Oops: 0000 1 SMP
  Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) ib_ipoib(OE) hfi1(OE) rdmavt(OE) nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_mod target_core_mod ib_ucm dm_mirror dm_region_hash dm_log mlx5_ib dm_mod zfs(POE) rpcrdma sunrpc rdma_ucm ib_uverbs opa_vnic ib_iser zunicode(POE) ib_umad zavl(POE) icp(POE) sb_edac intel_powerclamp coretemp rdma_cm intel_rapl iosf_mbi iw_cm libiscsi scsi_transport_iscsi kvm ib_cm iTCO_wdt mxm_wmi iTCO_vendor_support irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd zcommon(POE) znvpair(POE) pcspkr spl(OE) mei_me
  sg mei ioatdma lpc_ich joydev i2c_i801 shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 mlx5_core drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ixgbe ahci ttm mlxfw ib_core libahci devlink mdio crct10dif_pclmul crct10dif_common drm ptp libata megaraid_sas crc32c_intel i2c_algo_bit pps_core i2c_core dca [last unloaded: rdmavt]
  CPU: 15 PID: 68691 Comm: kworker/15:2H Kdump: loaded Tainted: P W OE ------------ 3.10.0-862.2.3.el7_lustre.x86_64 #1
  Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
  Workqueue: hfi0_0 _hfi1_do_tid_send [hfi1]
  task: ffff9a01f47faf70 ti: ffff9a11776a8000 task.ti: ffff9a11776a8000
  RIP: 0010:[<ffffffffc0d87ea6>] [<ffffffffc0d87ea6>] hfi1_make_tid_rdma_pkt+0x476/0xcb0 [hfi1]
  RSP: 0018:ffff9a11776abd08 EFLAGS: 00010002
  RAX: ffff9a32d27abfc0 RBX: ffff99f2d27aa000 RCX: 00000000ffffffff
  RDX: 0000000000000000 RSI: 0000000000000220 RDI: ffff99f2ffc05300
  RBP: ffff9a11776abd88 R08: 000000000001c310 R09: ffffffffc0d87ad4
  R10: 0000000000000000 R11: 0000000000000000 R12: ffff9a117a423c00
  R13: ffff9a117a423c00 R14: ffff9a03500c0000 R15: ffff9a117a423cb8
  FS: 0000000000000000(0000) GS:ffff9a117e9c0000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffff9a32d27abff8 CR3: 0000002748a0e000 CR4: 00000000001607e0
  Call Trace:
  [<ffffffffc0d88874>] _hfi1_do_tid_send+0x194/0x320 [hfi1]
  [<ffffffffaf0b2dff>] process_one_work+0x17f/0x440
  [<ffffffffaf0b3ac6>] worker_thread+0x126/0x3c0
  [<ffffffffaf0b39a0>] ? manage_workers.isra.24+0x2a0/0x2a0
  [<ffffffffaf0bae31>] kthread+0xd1/0xe0
  [<ffffffffaf0bad60>] ? insert_kthread_work+0x40/0x40
  [<ffffffffaf71f5f7>] ret_from_fork_nospec_begin+0x21/0x21
  [<ffffffffaf0bad60>] ? insert_kthread_work+0x40/0x40
  hfi1 0000:05:00.0: hfi1_0: reserved_op: opcode 0xf2, slot 2, rsv_used 1, rsv_ops 1
  Code: 00 00 41 8b 8d d8 02 00 00 89 c8 48 89 45 b0 48 c1 65 b0 06 48 8b 83 a0 01 00 00 48 01 45 b0 48 8b 45 b0 41 80 bd 10 03 00 00 00 <48> 8b 50 38 4c 8d 7a 50 74 45 8b b2 d0 00 00 00 85 f6 0f 85 72
  RIP [<ffffffffc0d87ea6>] hfi1_make_tid_rdma_pkt+0x476/0xcb0 [hfi1]
  RSP <ffff9a11776abd08>
  CR2: ffff9a32d27abff8

This problem can happen if a RESYNC request is received before r_tid_ack
is modified.

This patch fixes the issue by making sure that r_tid_ack is set to a valid
value before a TID RDMA ACK is built. Functions are defined to simplify
the code.

Fixes: 07b923701e ("IB/hfi1: Add functions to receive TID RDMA WRITE request")
Fixes: 7cf0ad679d ("IB/hfi1: Add a function to receive TID RDMA RESYNC packet")
Link: https://lore.kernel.org/r/20191025195830.106825.44022.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:15:35 -04:00
James Erwin
a9c3c4c597 IB/hfi1: Ensure full Gen3 speed in a Gen4 system
If an hfi1 card is inserted in a Gen4 systems, the driver will avoid the
gen3 speed bump and the card will operate at half speed.

This is because the driver avoids the gen3 speed bump when the parent bus
speed isn't identical to gen3, 8.0GT/s.  This is not compatible with gen4
and newer speeds.

Fix by relaxing the test to explicitly look for the lower capability
speeds which inherently allows for gen4 and all future speeds.

Fixes: 7724105686 ("IB/hfi1: add driver files")
Link: https://lore.kernel.org/r/20191101192059.106248.1699.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: James Erwin <james.erwin@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:13:43 -04:00
Michal Kalderon
b4bc766097 RDMA/qedr: Add iWARP doorbell recovery support
This patch adds the iWARP specific doorbells to the doorbell recovery
mechanism.

Link: https://lore.kernel.org/r/20191030094417.16866-9-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:08:01 -04:00
Michal Kalderon
97f6125092 RDMA/qedr: Add doorbell overflow recovery support
Use the doorbell recovery mechanism to register rdma related doorbells
that will be restored in case there is a doorbell overflow attention.

Link: https://lore.kernel.org/r/20191030094417.16866-8-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:08:01 -04:00
Michal Kalderon
4c6bb02d59 RDMA/qedr: Use the common mmap API
Remove all functions related to mmap from qedr and use the common API.

Link: https://lore.kernel.org/r/20191030094417.16866-7-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:08:01 -04:00
Michal Kalderon
11f1a75567 RDMA/siw: Use the common mmap_xa helpers
Remove the functions related to managing the mmap_xa database.  This code
is now common in ib_core.

Link: https://lore.kernel.org/r/20191030094417.16866-6-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Tested-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:08:01 -04:00
Michal Kalderon
e84d3c184e RDMA/efa: Use the common mmap_xa helpers
Remove the functions related to managing the mmap_xa database.  This code
was replaced with common code in ib_core.

Link: https://lore.kernel.org/r/20191030094417.16866-5-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:08:01 -04:00
Michal Kalderon
c043ff2cfb RDMA: Connect between the mmap entry and the umap_priv structure
The rdma_user_mmap_io interface created a common interface for drivers to
correctly map hw resources and zap them once the ucontext is destroyed
enabling the drivers to safely free the hw resources.

However, this meant the drivers need to delay freeing the resource to the
ucontext destroy phase to ensure they were no longer mapped.  The new
mechanism for a common way of handling user/driver address mapping enabled
notifying the driver if all umap_priv mappings were removed, and enabled
freeing the hw resources when they are done with and not delay it until
ucontext destroy.

Since not all drivers use the mechanism, NULL can be sent to the
rdma_user_mmap_io interface to continue working as before.  Drivers that
use the mmap_xa interface can pass the entry being mapped to the
rdma_user_mmap_io function to be linked together.

Link: https://lore.kernel.org/r/20191030094417.16866-4-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:08:01 -04:00
Michal Kalderon
3411f9f01b RDMA/core: Create mmap database and cookie helper functions
Create some common API's for adding entries to a xa_mmap. Searching for
an entry and freeing one.

The general approach is copied from the EFA driver and improved to be more
general and do more to help the drivers. Integration with the core allows
a reference counted scheme with a free function so that the driver can
know when its mmaps are all gone.

This significant new functionality will be helpful for drivers to have the
correct lifetime model for mmap objects.

Link: https://lore.kernel.org/r/20191030094417.16866-3-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:08:00 -04:00
Michal Kalderon
b86deba977 RDMA/core: Move core content from ib_uverbs to ib_core
Move functionality that is called by the driver, which is
related to umap, to a new file that will be linked in ib_core.
This is a first step in later enabling ib_uverbs to be optional.
vm_ops is now initialized in ib_uverbs_mmap instead of
priv_init to avoid having to move all the rdma_umap functions
as well.

Link: https://lore.kernel.org/r/20191030094417.16866-2-michal.kalderon@marvell.com
Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-05 09:59:26 -04:00
Greg Kroah-Hartman
09b0965ee8 IB: mlx5: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Cc: Leon Romanovsky <leon@kernel.org>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-rdma@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-04 08:38:58 +01:00
Parav Pandit
e53a9d26cf IB/mlx5: Introduce and use mlx5_core_is_vf()
Instead of deciding a given device is virtual function or
not based on a device is PF or not, use already defined
MLX5_COREDEV_VF by introducing an helper API mlx5_core_is_vf().

This enables to clearly identify PF, VF and non virtual functions.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-01 14:40:27 -07:00
Michael Guralnik
11f552e217 IB/mlx5: Test write combining support
Linux can run in all sorts of physical machines and VMs where write
combining may or may not be supported. Currently there is no way to
reliably tell if the system supports WC, or not. The driver uses WC to
optimize posting work to the HCA, and getting this wrong in either
direction can cause a significant performance loss.

Add a test in mlx5_ib initialization process to test whether
write-combining is supported on the machine.  The test will run as part of
the enable_driver callback to ensure that the test runs after the device
is setup and can create and modify the QP needed, but runs before the
device is exposed to the users.

The test opens UD QP and posts NOP WQEs, the WQE written to the BlueFlame
is different from the WQE in memory, requesting CQE only on the BlueFlame
WQE. By checking whether we received a completion on one of these WQEs we
can know if BlueFlame succeeded and this write-combining must be
supported.

Change reporting of BlueFlame support to be dependent on write-combining
support instead of the FW's guess as to what the machine can do.

Link: https://lore.kernel.org/r/20191027062234.10993-1-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-31 15:52:51 -03:00
Leon Romanovsky
546d30099e RDMA/mlx5: Return proper error value
Returned value from mlx5_mr_cache_alloc() is checked to be error or real
pointer. Return proper error code instead of NULL which is not checked
later.

Fixes: 81713d3788 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20191029055721.7192-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-31 15:31:49 -03:00
Arnd Bergmann
d5b60e26e8 RDMA/hns: Fix build error again
This is not the first attempt to fix building random configurations,
unfortunately the attempt in commit a07fc0bb48 ("RDMA/hns: Fix build
error") caused a new problem when CONFIG_INFINIBAND_HNS_HIP06=m and
CONFIG_INFINIBAND_HNS_HIP08=y:

drivers/infiniband/hw/hns/hns_roce_main.o:(.rodata+0xe60): undefined reference to `__this_module'

Revert commits a07fc0bb48 ("RDMA/hns: Fix build error") and
a3e2d4c7e7 ("RDMA/hns: remove obsolete Kconfig comment") to get back to
the previous state, then fix the issues described there differently, by
adding more specific dependencies: INFINIBAND_HNS can now only be built-in
if at least one of HNS or HNS3 are built-in, and the individual back-ends
are only available if that code is reachable from the main driver.

Fixes: a07fc0bb48 ("RDMA/hns: Fix build error")
Fixes: a3e2d4c7e7 ("RDMA/hns: remove obsolete Kconfig comment")
Fixes: dd74282df5 ("RDMA/hns: Initialize the PCI device for hip08 RoCE")
Fixes: 08805fdbeb ("RDMA/hns: Split hw v1 driver from hns roce driver")
Link: https://lore.kernel.org/r/20191007211826.3361202-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-29 16:16:54 -03:00
Jason Gunthorpe
bb3dba3300 Merge branch 'odp_rework' into rdma.git for-next
Jason Gunthorpe says:

====================
In order to hoist the interval tree code out of the drivers and into the
mmu_notifiers it is necessary for the drivers to not use the interval tree
for other things.

This series replaces the interval tree with an xarray and along the way
re-aligns all the locking to use a sensible SRCU model where the 'update'
step is done by modifying an xarray.

The result is overall much simpler and with less locking in the critical
path. Many functions were reworked for clarity and small details like
using 'imr' to refer to the implicit MR make the entire code flow here
more readable.

This also squashes at least two race bugs on its own, and quite possibily
more that haven't been identified.
====================

Merge conflicts with the odp statistics patch resolved.

* branch 'odp_rework':
  RDMA/odp: Remove broken debugging call to invalidate_range
  RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy
  RDMA/mlx5: Do not store implicit children in the odp_mkeys xarray
  RDMA/mlx5: Rework implicit ODP destroy
  RDMA/mlx5: Avoid double lookups on the pagefault path
  RDMA/mlx5: Reduce locking in implicit_mr_get_data()
  RDMA/mlx5: Use an xarray for the children of an implicit ODP
  RDMA/mlx5: Split implicit handling from pagefault_mr
  RDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree
  RDMA/mlx5: Lift implicit_mr_alloc() into the two routines that call it
  RDMA/mlx5: Rework implicit_mr_get_data
  RDMA/mlx5: Delete struct mlx5_priv->mkey_table
  RDMA/mlx5: Use a dedicated mkey xarray for ODP
  RDMA/mlx5: Split sig_err MR data into its own xarray
  RDMA/mlx5: Use SRCU properly in ODP prefetch

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:47:52 -03:00
Jason Gunthorpe
46870b2391 RDMA/odp: Remove broken debugging call to invalidate_range
invalidate_range() also obtains the umem_mutex which is being held at this
point, so if this path were was ever called it would deadlock. Thus
conclude the debugging never triggers and rework it into a simple WARN_ON
and leave things as they are.

While here add a note to explain how we could possibly get inconsistent
page pointers.

Link: https://lore.kernel.org/r/20191009160934.3143-16-jgg@ziepe.ca
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:14 -03:00
Jason Gunthorpe
09689703d2 RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy
For creation, as soon as the umem_odp is created the notifier can be
called, however the underlying MR may not have been setup yet. This would
cause problems if mlx5_ib_invalidate_range() runs. There is some
confusing/ulocked/racy code that might by trying to solve this, but
without locks it isn't going to work right.

Instead trivially solve the problem by short-circuiting the invalidation
if there are not yet any DMA mapped pages. By definition there is nothing
to invalidate in this case.

The create code will have the umem fully setup before anything is DMA
mapped, and npages is fully locked by the umem_mutex.

For destroy, invalidate the entire MR at the HW to stop DMA then DMA unmap
the pages before destroying the MR. This drives npages to zero and
prevents similar racing with invalidate while the MR is undergoing
destruction.

Arguably it would be better if the umem was created after the MR and
destroyed before, but that would require a big rework of the MR code.

Fixes: 6aec21f6a8 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191009160934.3143-15-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:14 -03:00
Jason Gunthorpe
d561987f34 RDMA/mlx5: Do not store implicit children in the odp_mkeys xarray
These mkeys are entirely internal and are never used by the HW for
page fault. They should also never be used by userspace for prefetch.
Simplify & optimize things by not including them in the xarray.

Since the prefetch path can now never see a child mkey there is no need
for the second synchronize_srcu() during imr destroy.

Link: https://lore.kernel.org/r/20191009160934.3143-14-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:14 -03:00
Jason Gunthorpe
5256edcb98 RDMA/mlx5: Rework implicit ODP destroy
Use SRCU in a sensible way by removing all MRs in the implicit tree from
the two xarrays (the update operation), then a synchronize, followed by a
normal single threaded teardown.

This is only a little unusual from the normal pattern as there can still
be some work pending in the unbound wq that may also require a workqueue
flush. This is tracked with a single atomic, consolidating the redundant
existing atomics and wait queue.

For understand-ability the entire ODP implicit create/destroy flow now
largely exists in a single pair of functions within odp.c, with a few
support functions for tearing down an unused child.

Link: https://lore.kernel.org/r/20191009160934.3143-13-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:14 -03:00
Jason Gunthorpe
b70d785d23 RDMA/mlx5: Avoid double lookups on the pagefault path
Now that the locking is simplified combine pagefault_implicit_mr() with
implicit_mr_get_data() so that we sweep over the idx range only once,
and do the single xlt update at the end, after the child umems are
setup.

This avoids double iteration/xa_loads plus the sketchy failure path if the
xa_load() fails.

Link: https://lore.kernel.org/r/20191009160934.3143-12-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:14 -03:00
Jason Gunthorpe
3389baa831 RDMA/mlx5: Reduce locking in implicit_mr_get_data()
Now that the child MRs are stored in an xarray we can rely on the SRCU
lock to protect the xa_load and use xa_cmpxchg on the slow allocation path
to resolve races with concurrent page fault.

This reduces the scope of the critical section of umem_mutex for implicit
MRs to only cover mlx5_ib_update_xlt, and avoids taking a lock at all if
the child MR is already in the xarray. This makes it consistent with the
normal ODP MR critical section for umem_lock, and the locking approach
used for destroying an unusued implicit child MR.

The MLX5_IB_UPD_XLT_ATOMIC is no longer needed in implicit_get_child_mr()
since it is no longer called with any locks.

Link: https://lore.kernel.org/r/20191009160934.3143-11-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:14 -03:00
Jason Gunthorpe
423f52d650 RDMA/mlx5: Use an xarray for the children of an implicit ODP
Currently the child leaves are stored in the shared interval tree and
every lookup for a child must be done under the interval tree rwsem.

This is further complicated by dropping the rwsem during iteration (ie the
odp_lookup(), odp_next() pattern), which requires a very tricky an
difficult to understand locking scheme with SRCU.

Instead reserve the interval tree for the exclusive use of the mmu
notifier related code in umem_odp.c and give each implicit MR a xarray
containing all the child MRs.

Since the size of each child is 1GB of VA, a 1 level xarray will index 64G
of VA, and a 2 level will index 2TB, making xarray a much better
data structure choice than an interval tree.

The locking properties of xarray will be used in the next patches to
rework the implicit ODP locking scheme into something simpler.

At this point, the xarray is locked by the implicit MR's umem_mutex, and
read can also be locked by the odp_srcu.

Link: https://lore.kernel.org/r/20191009160934.3143-10-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:14 -03:00
Jason Gunthorpe
54375e7382 RDMA/mlx5: Split implicit handling from pagefault_mr
The single routine has a very confusing scheme to advance to the next
child MR when working on an implicit parent. This scheme can only be used
when working with an implicit parent and must not be triggered when
working on a normal MR.

Re-arrange things by directly putting all the single-MR stuff into one
function and calling it in a loop for the implicit case. Simplify some of
the error handling in the new pagefault_real_mr() to remove unneeded gotos.

Link: https://lore.kernel.org/r/20191009160934.3143-9-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Jason Gunthorpe
9162420dde RDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree
Instead of rewriting all the IOVA's to 0 as things progress down the tree
make the IOVA of the children equal to placement in the tree. This makes
things easier to understand by keeping mmkey.iova == HW configuration.

Link: https://lore.kernel.org/r/20191009160934.3143-8-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Jason Gunthorpe
c2edcd6935 RDMA/mlx5: Lift implicit_mr_alloc() into the two routines that call it
This makes the routines easier to understand, particularly with respect
the locking requirements of the entire sequence. The implicit_mr_alloc()
had a lot of ifs specializing it to each of the callers, and only a very
small amount of code was actually shared.

Following patches will cause the flow in the two functions to diverge
further.

Link: https://lore.kernel.org/r/20191009160934.3143-7-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Jason Gunthorpe
3d5f3c54e7 RDMA/mlx5: Rework implicit_mr_get_data
This function is intended to loop across each MTT chunk in the implicit
parent that intersects the range [io_virt, io_virt+bnct).  But it is has a
confusing construction, so:

- Consistently use imr and odp_imr to refer to the implicit parent
  to avoid confusion with the normal mr and odp of the child
- Directly compute the inclusive start/end indexes by shifting. This is
  clearer to understand the intent and avoids any errors from unaligned
  values of addr
- Iterate directly over the range of MTT indexes, do not make a loop
  out of goto
- Follow 'success oriented flow', with goto error unwind
- Directly calculate the range of idx's that need update_xlt
- Ensure that any leaf MR added to the interval tree always results in an
  update to the XLT

Link: https://lore.kernel.org/r/20191009160934.3143-6-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Jason Gunthorpe
74bddb3682 RDMA/mlx5: Delete struct mlx5_priv->mkey_table
No users are left, delete it.

Link: https://lore.kernel.org/r/20191009160934.3143-5-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Jason Gunthorpe
806b101b2b RDMA/mlx5: Use a dedicated mkey xarray for ODP
There is a per device xarray storing mkeys that is used to store every
mkey in the system. However, this xarray is now only read by ODP for
certain ODP designated MRs (ODP, implicit ODP, MW, DEVX_INDIRECT).

Create an xarray only for use by ODP, that only contains ODP related
MKeys. This xarray is protected by SRCU and all erases are protected by a
synchronize.

This improves performance:

 - All MRs in the odp_mkeys xarray are ODP MRs, so some tests for is_odp()
   can be deleted. The xarray will also consume fewer nodes.

 - normal MR's are never mixed with ODP MRs in a SRCU data structure so
   performance sucking synchronize_srcu() on every MR destruction is not
   needed.

 - No smp_load_acquire(live) and xa_load() double barrier on read

Due to the SRCU locking scheme care must be taken with the placement of
the xa_store(). Once it completes the MR is immediately visible to other
threads and only through a xa_erase() & synchronize_srcu() cycle could it
be destroyed.

Link: https://lore.kernel.org/r/20191009160934.3143-4-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Jason Gunthorpe
50211ec944 RDMA/mlx5: Split sig_err MR data into its own xarray
The locking model for signature is completely different than ODP, do not
share the same xarray that relies on SRCU locking to support ODP.

Simply store the active mlx5_core_sig_ctx's in an xarray when signature
MRs are created and rely on trivial xarray locking to serialize
everything.

The overhead of storing only a handful of SIG related MRs is going to be
much less than an xarray full of every mkey.

Link: https://lore.kernel.org/r/20191009160934.3143-3-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Jason Gunthorpe
fb985e278a RDMA/mlx5: Use SRCU properly in ODP prefetch
When working with SRCU protected xarrays the xarray itself should be the
SRCU 'update' point. Instead prefetch is using live as the SRCU update
point and this prevents switching the locking design to use the xarray
instead.

To solve this the prefetch must only read from the xarray once, and hold
on to the actual MR pointer for the duration of the async
operation. Incrementing num_pending_prefetch delays destruction of the MR,
so it is suitable.

Prefetch calls directly to the pagefault_mr using the MR pointer and only
does a single xarray lookup.

All the testing if a MR is prefetchable or not is now done only in the
prefetch code and removed from the pagefault critical path.

Link: https://lore.kernel.org/r/20191009160934.3143-2-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Jason Gunthorpe
036313316d Linux 5.4-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl210Z8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGv+kIAKRpO7EuDokQL4qp
 hxEEaCMJA1T055EMlNU6FVAq/ZbmapzreUyNYiRMpPWKGTWNMkhIcZQfysYeGZz5
 y/KRxAiVxlcB+3v3yRmoZd/XoQmhgvJmqD4zhaGI2Utonow4f/SGSEFFZqqs9WND
 4HJROjZHgQ4JBxg9Z+QMo0FxbV/DCZpEOUq51N9WJywyyDRb18zotE83stpU+pE2
 fjqT7mk0NLrnYXuDRAbFC1Aau9ed4H6LlwLmxaqxq/Pt5Rz7wIKwKL9HIT4Dm/0a
 qpani6phhHWL7MwUpa2wkEonFCD03rJFl3DUVJo64Ijh4up5D/jyXQ+GKV2P4WKJ
 275Rb5Q=
 =WiZZ
 -----END PGP SIGNATURE-----

Merge tag 'v5.4-rc5' into rdma.git for-next

Linux 5.4-rc5

For dependencies in the next patches

Conflict resolved by keeping the delete of the unlock.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:36:29 -03:00
Bryan Tan
a52dc3a100 RDMA/vmw_pvrdma: Use resource ids from physical device if available
This change allows the RDMA stack to use physical resource numbers if they
are passed up from the device. This is accomplished by separating the
concept of the QP number from the QP handle. Previously, the two were the
same, as the QP number was exposed to the guest and also used to reference
a virtual QP in the device backend.

With physical resource numbers exposed, the QP number given to the guest
is the number assigned from the physical HCA's QP, while the QP handle is
still the internal handle used to reference a virtual QP. Regardless of
whether the device is exposing physical ids, the driver will still try to
pick up the QP handle from the backend if possible. The MR keys exposed to
the guest will also be the MR keys created by the physical HCA, instead of
virtual MR keys. The distinction between handle and keys is already
present for MRs so there is no need to do anything special here.

A new version of the create QP response has been added to the device API
to pass up the QP number and handle. The driver will also report these to
userspace in the udata response if userspace supports it or not create the
queuepair if not. I also had to do a refactor of the destroy qp code to
reuse it if we fail to copy to userspace.

Link: https://lore.kernel.org/r/20191028181444.19448-1-aditr@vmware.com
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:09:23 -03:00
Lijun Ou
b681a05299 RDMA/hns: Prevent memory leaks of eq->buf_list
eq->buf_list->buf and eq->buf_list should also be freed when eqe_hop_num
is set to 0, or there will be memory leaks.

Fixes: a5073d6054 ("RDMA/hns: Add eq support of hip08")
Link: https://lore.kernel.org/r/1572072995-11277-3-git-send-email-liweihang@hisilicon.com
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 15:06:38 -03:00
Bart Van Assche
c9121262d5 RDMA/core: Set DMA parameters correctly
The dma_set_max_seg_size() call in setup_dma_device() does not have any
effect since device->dev.dma_parms is NULL. Fix this by initializing
device->dev.dma_parms first.

Link: https://lore.kernel.org/r/20191025225830.257535-5-bvanassche@acm.org
Fixes: d10bcf947a ("RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 14:52:04 -03:00
Bart Van Assche
a401fb819c RDMA/siw: Increase DMA max_segment_size parameter
Increase the DMA max_segment_size parameter from 64 KB to 2 GB.

Link: https://lore.kernel.org/r/20191025225830.257535-4-bvanassche@acm.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 14:52:04 -03:00
Bart Van Assche
97458fd510 RDMA/rxe: Increase DMA max_segment_size parameter
Increase the DMA max_segment_size parameter from 64 KB to 2 GB.

Link: https://lore.kernel.org/r/20191025225830.257535-3-bvanassche@acm.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 14:52:03 -03:00
Bernard Metzler
0edefddbae RDMA/siw: Fix post_recv QP state locking
Do not release qp state lock if not previously acquired.

Fixes: cf049bb31f ("RDMA/siw: Fix SQ/RQ drain logic")
Link: https://lore.kernel.org/r/20191025142903.20625-1-bmt@zurich.ibm.com
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 14:34:33 -03:00
Potnuri Bharat Teja
d4934f4569 RDMA/iw_cxgb4: Avoid freeing skb twice in arp failure case
_put_ep_safe() and _put_pass_ep_safe() free the skb before it is freed by
process_work(). fix double free by freeing the skb only in process_work().

Fixes: 1dad0ebeea ("iw_cxgb4: Avoid touch after free error in ARP failure handlers")
Link: https://lore.kernel.org/r/1572006880-5800-1-git-send-email-bharat@chelsio.com
Signed-off-by: Dakshaja Uppalapati <dakshaja@chelsio.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 14:13:33 -03:00
Potnuri Bharat Teja
5212c3fda2 RDMA/iw_cxgb4: Report correct port speed/width
Query speed/width from corresponding netdev.

Link: https://lore.kernel.org/r/1572001022-4533-1-git-send-email-bharat@chelsio.com
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 14:11:46 -03:00
Jason Gunthorpe
1524b12a6e RDMA/mlx5: Use irq xarray locking for mkey_table
The mkey_table xarray is touched by the reg_mr_callback() function which
is called from a hard irq. Thus all other uses of xa_lock must use the
_irq variants.

  WARNING: inconsistent lock state
  5.4.0-rc1 #12 Not tainted
  --------------------------------
  inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
  python3/343 [HC0[0]:SC0[0]:HE1:SE1] takes:
  ffff888182be1d40 (&(&xa->xa_lock)->rlock#3){?.-.}, at: xa_erase+0x12/0x30
  {IN-HARDIRQ-W} state was registered at:
    lock_acquire+0xe1/0x200
    _raw_spin_lock_irqsave+0x35/0x50
    reg_mr_callback+0x2dd/0x450 [mlx5_ib]
    mlx5_cmd_exec_cb_handler+0x2c/0x70 [mlx5_core]
    mlx5_cmd_comp_handler+0x355/0x840 [mlx5_core]
   [..]

   Possible unsafe locking scenario:

         CPU0
         ----
    lock(&(&xa->xa_lock)->rlock#3);
    <Interrupt>
      lock(&(&xa->xa_lock)->rlock#3);

   *** DEADLOCK ***

  2 locks held by python3/343:
   #0: ffff88818eb4bd38 (&uverbs_dev->disassociate_srcu){....}, at: ib_uverbs_ioctl+0xe5/0x1e0 [ib_uverbs]
   #1: ffff888176c76d38 (&file->hw_destroy_rwsem){++++}, at: uobj_destroy+0x2d/0x90 [ib_uverbs]

  stack backtrace:
  CPU: 3 PID: 343 Comm: python3 Not tainted 5.4.0-rc1 #12
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0x86/0xca
   print_usage_bug.cold.50+0x2e5/0x355
   mark_lock+0x871/0xb50
   ? match_held_lock+0x20/0x250
   ? check_usage_forwards+0x240/0x240
   __lock_acquire+0x7de/0x23a0
   ? __kasan_check_read+0x11/0x20
   ? mark_lock+0xae/0xb50
   ? mark_held_locks+0xb0/0xb0
   ? find_held_lock+0xca/0xf0
   lock_acquire+0xe1/0x200
   ? xa_erase+0x12/0x30
   _raw_spin_lock+0x2a/0x40
   ? xa_erase+0x12/0x30
   xa_erase+0x12/0x30
   mlx5_ib_dealloc_mw+0x55/0xa0 [mlx5_ib]
   uverbs_dealloc_mw+0x3c/0x70 [ib_uverbs]
   uverbs_free_mw+0x1a/0x20 [ib_uverbs]
   destroy_hw_idr_uobject+0x49/0xa0 [ib_uverbs]
   [..]

Fixes: 0417791536 ("RDMA/mlx5: Add missing synchronize_srcu() for MW cases")
Link: https://lore.kernel.org/r/20191024234910.GA9038@ziepe.ca
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 14:08:48 -03:00
Michal Kalderon
24e412c1e0 RDMA/qedr: Fix memory leak in user qp and mr
User QPs pbl's weren't freed properly.
MR pbls weren't freed properly.

Fixes: e0290cce6a ("qedr: Add support for memory registeration verbs")
Link: https://lore.kernel.org/r/20191027200451.28187-5-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 14:01:36 -03:00
Michal Kalderon
82af6d19d8 RDMA/qedr: Fix synchronization methods and memory leaks in qedr
Re-design of the iWARP CM related objects reference counting and
synchronization methods, to ensure operations are synchronized correctly
and that memory allocated for "ep" is properly released. Also makes sure
QP memory is not released before ep is finished accessing it.

Where as the QP object is created/destroyed by external operations, the ep
is created/destroyed by internal operations and represents the tcp
connection associated with the QP.

QP destruction flow:
- needs to wait for ep establishment to complete (either successfully or
  with error)
- needs to wait for ep disconnect to be fully posted to avoid a race
  condition of disconnect being called after reset.
- both the operations above don't always happen, so we use atomic flags to
  indicate whether the qp destruction flow needs to wait for these
  completions or not, if the destroy is called before these operations
  began, the flows will check the flags and not execute them ( connect /
  disconnect).

We use completion structure for waiting for the completions mentioned
above.

The QP refcnt was modified to kref object.  The EP has a kref added to it
to handle additional worker thread accessing it.

Memory Leaks - https://www.spinics.net/lists/linux-rdma/msg83762.html

Concurrency not managed correctly -
https://www.spinics.net/lists/linux-rdma/msg67949.html

Fixes: de0089e692 ("RDMA/qedr: Add iWARP connection management qp related callbacks")
Link: https://lore.kernel.org/r/20191027200451.28187-4-michal.kalderon@marvell.com
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Reported-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 14:01:35 -03:00
Michal Kalderon
5fdff18b4d RDMA/qedr: Fix qpids xarray api used
The qpids xarray isn't accessed from irq context and therefore there
is no need to use the xa_XXX_irq version of the apis.
Remove the _irq.

Fixes: b6014f9e5f ("qedr: Convert qpidr to XArray")
Link: https://lore.kernel.org/r/20191027200451.28187-3-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 14:01:35 -03:00
Michal Kalderon
73ab512f72 RDMA/qedr: Fix srqs xarray initialization
There was a missing initialization for the srqs xarray.
SRQs xarray can also be called from irq context when searching
for an element and uses the xa_XXX_irq apis, therefore should
be initialized with IRQ flags.

Fixes: 9fd15987ed ("qedr: Convert srqidr to XArray")
Link: https://lore.kernel.org/r/20191027200451.28187-2-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 14:01:27 -03:00
Colin Ian King
994195e153 RDMA/hns: Fix memory leak on 'context' on error return path
Currently, the error return path when the call to function
dev->dfx->query_cqc_info fails will leak object 'context'. Fix this by
making the error return path via 'err' return return codes rather than
-EMSGSIZE, set ret appropriately for all error return paths and for the
memory leak now return via 'err' rather than just returning without
freeing context.

Link: https://lore.kernel.org/r/20191024131034.19989-1-colin.king@canonical.com
Addresses-Coverity: ("Resource leak")
Fixes: e1c9a0dc29 ("RDMA/hns: Dump detailed driver-specific CQ")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 13:41:23 -03:00
Yangyang Li
887803db86 RDMA/hns: Bugfix for qpc/cqc timer configuration
qpc/cqc timer entry size needs one page, but currently they are fixedly
configured to 4096, which is not appropriate in 64K page scenarios. So
they should be modified to PAGE_SIZE.

Fixes: 0e40dc2f70 ("RDMA/hns: Add timer allocation support for hip08")
Link: https://lore.kernel.org/r/1571908917-16220-3-git-send-email-liweihang@hisilicon.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 13:33:06 -03:00
Lijun Ou
5c7e76fb7c RDMA/hns: Fix to support 64K page for srq
SRQ's page size configuration of BA and buffer should depend on current
PAGE_SHIFT, or it can't work in scenario of 64K page.

Fixes: c7bcb13442 ("RDMA/hns: Add SRQ support for hip08 kernel mode")
Link: https://lore.kernel.org/r/1571908917-16220-2-git-send-email-liweihang@hisilicon.com
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 13:33:06 -03:00
Bart Van Assche
79d81ef42c RDMA/srpt: Fix TPG creation
Unlike the iSCSI target driver, for the SRP target driver it is sufficient
if a single TPG can be associated with each RDMA port name. However, users
started associating multiple TPGs with RDMA port names. Support this by
converting the single TPG in struct srpt_port_id into a list. This patch
fixes the following list corruption issue:

 list_add corruption. prev->next should be next (ffffffffc0a080c0), but was ffffa08a994ce6f0. (prev=ffffa08a994ce6f0).
 WARNING: CPU: 2 PID: 2597 at lib/list_debug.c:28 __list_add_valid+0x6a/0x70
 CPU: 2 PID: 2597 Comm: targetcli Not tainted 5.4.0-rc1.3bfa3c9602a7 #1
 RIP: 0010:__list_add_valid+0x6a/0x70
 Call Trace:
  core_tpg_register+0x116/0x200 [target_core_mod]
  srpt_make_tpg+0x3f/0x60 [ib_srpt]
  target_fabric_make_tpg+0x41/0x290 [target_core_mod]
  configfs_mkdir+0x158/0x3e0
  vfs_mkdir+0x108/0x1a0
  do_mkdirat+0x77/0xe0
  do_syscall_64+0x55/0x1d0
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Link: https://lore.kernel.org/r/20191023204106.23326-1-bvanassche@acm.org
Reported-by: Honggang LI <honli@redhat.com>
Fixes: a42d985bd5 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Honggang Li <honli@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 13:30:00 -03:00
Leon Romanovsky
f9e66db143 RDMA/hns: Delete BITS_PER_BYTE redefinition
HNS redefined available in bits.h define and didn't use it, we can safely
delete it.

Link: https://lore.kernel.org/r/20191023054239.31648-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 13:28:24 -03:00
Jason Gunthorpe
515f60004e RDMA/hns: Prevent undefined behavior in hns_roce_set_user_sq_size()
The "ucmd->log_sq_bb_count" variable is a user controlled variable in the
0-255 range.  If we shift more than then number of bits in an int then
it's undefined behavior (it shift wraps), and potentially the int could
become negative.

Fixes: 9a4435375c ("IB/hns: Add driver files for hns RoCE driver")
Link: https://lore.kernel.org/r/20190608092514.GC28890@mwanda
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
2019-10-28 11:20:34 -03:00
Leon Romanovsky
24f5214923 RDMA/cm: Update copyright together with SPDX tag
Add Mellanox to lust of copyright holders and replace copyright
boilerplate with relevant SPDX tag.

Link: https://lore.kernel.org/r/20191020071559.9743-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 10:15:11 -03:00
Leon Romanovsky
a916051191 RDMA/cm: Use specific keyword to check define
There is a specific define keyword to check if define exists or not,
let's use it instead of open-coded variant.

Link: https://lore.kernel.org/r/20191020071559.9743-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 10:15:10 -03:00
Leon Romanovsky
8d625101a7 RDMA/cm: Delete unused cm_is_active_peer function
Function cm_is_active_peer is not used, delete it.

Link: https://lore.kernel.org/r/20191020071559.9743-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 10:15:10 -03:00
Parav Pandit
549af00833 IB/core: Avoid deadlock during netlink message handling
When rdmacm module is not loaded, and when netlink message is received to
get char device info, it results into a deadlock due to recursive locking
of rdma_nl_mutex with the below call sequence.

[..]
  rdma_nl_rcv()
  mutex_lock()
   [..]
   rdma_nl_rcv_msg()
      ib_get_client_nl_info()
         request_module()
           iw_cm_init()
             rdma_nl_register()
               mutex_lock(); <- Deadlock, acquiring mutex again

Due to above call sequence, following call trace and deadlock is observed.

  kernel: __mutex_lock+0x35e/0x860
  kernel: ? __mutex_lock+0x129/0x860
  kernel: ? rdma_nl_register+0x1a/0x90 [ib_core]
  kernel: rdma_nl_register+0x1a/0x90 [ib_core]
  kernel: ? 0xffffffffc029b000
  kernel: iw_cm_init+0x34/0x1000 [iw_cm]
  kernel: do_one_initcall+0x67/0x2d4
  kernel: ? kmem_cache_alloc_trace+0x1ec/0x2a0
  kernel: do_init_module+0x5a/0x223
  kernel: load_module+0x1998/0x1e10
  kernel: ? __symbol_put+0x60/0x60
  kernel: __do_sys_finit_module+0x94/0xe0
  kernel: do_syscall_64+0x5a/0x270
  kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe

  process stack trace:
  [<0>] __request_module+0x1c9/0x460
  [<0>] ib_get_client_nl_info+0x5e/0xb0 [ib_core]
  [<0>] nldev_get_chardev+0x1ac/0x320 [ib_core]
  [<0>] rdma_nl_rcv_msg+0xeb/0x1d0 [ib_core]
  [<0>] rdma_nl_rcv+0xcd/0x120 [ib_core]
  [<0>] netlink_unicast+0x179/0x220
  [<0>] netlink_sendmsg+0x2f6/0x3f0
  [<0>] sock_sendmsg+0x30/0x40
  [<0>] ___sys_sendmsg+0x27a/0x290
  [<0>] __sys_sendmsg+0x58/0xa0
  [<0>] do_syscall_64+0x5a/0x270
  [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe

To overcome this deadlock and to allow multiple netlink messages to
progress in parallel, following scheme is implemented.

1. Split the lock protecting the cb_table into a per-index lock, and make
   it a rwlock. This lock is used to ensure no callbacks are running after
   unregistration returns. Since a module will not be registered once it
   is already running callbacks, this avoids the deadlock.

2. Use smp_store_release() to update the cb_table during registration so
   that no lock is required. This avoids lockdep problems with thinking
   all the rwsems are the same lock class.

Fixes: 0e2d00eb6f ("RDMA: Add NLDEV_GET_CHARDEV to allow char dev discovery and autoload")
Link: https://lore.kernel.org/r/20191015080733.18625-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-24 20:49:37 -03:00
Mark Zhang
a15542bb72 RDMA/nldev: Skip counter if port doesn't match
The counter resource should return -EAGAIN if it was requested for a
different port, this is similar to how QP works if the users provides a
port filter.

Otherwise port filtering in netlink will return broken counter nests.

Fixes: c4ffee7c9b ("RDMA/netlink: Implement counter dumpit calback")
Link: https://lore.kernel.org/r/20191020062800.8065-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-24 15:31:23 -03:00
Leon Romanovsky
dc2f7edcc0 RDMA/rxe: Remove useless rxe_init_device_param assignments
IB devices are allocated with kzalloc and don't need explicit zero
assignments for their parameters. It can be removed safely.

Link: https://lore.kernel.org/r/20191020055724.7410-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-24 15:20:18 -03:00
Leon Romanovsky
ac71ffcfb4 RDMA/core: Check that process is still alive before sending it to the users
The PID information can disappear asynchronously because the task can be
killed and moved to zombie state. In this case, PID will be zero in
similar way to the kernel tasks. Recognize such situation where we are
asking to return orphaned object and simply skip filling PID attribute.

As part of this change, document the same scenario in counter.c code.

Link: https://lore.kernel.org/r/20191010071105.25538-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-23 16:02:12 -03:00
Leon Romanovsky
cf7e93c12f RDMA/restrack: Remove PID namespace support
IB resources are bounded to IB device and file descriptors, both entities
are unaware to PID namespaces and to task lifetime.

The difference in model caused to unpredictable behavior for the following
scenario:
 1. Create FD and context
 2. Share it with ephemeral child
 3. Create any object and exit that child

The end result of this flow, that those newly created objects will be
tracked by restrack, but won't be visible for users because task_struct
associated with them already exited.

The right thing is to rely on net namespace only for any filtering
purposes and drop PID namespace.

Link: https://lore.kernel.org/r/20191010071105.25538-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-23 15:58:31 -03:00
Arnd Bergmann
1832f2d8ff compat_ioctl: move more drivers to compat_ptr_ioctl
The .ioctl and .compat_ioctl file operations have the same prototype so
they can both point to the same function, which works great almost all
the time when all the commands are compatible.

One exception is the s390 architecture, where a compat pointer is only
31 bit wide, and converting it into a 64-bit pointer requires calling
compat_ptr(). Most drivers here will never run in s390, but since we now
have a generic helper for it, it's easy enough to use it consistently.

I double-checked all these drivers to ensure that all ioctl arguments
are used as pointers or are ignored, but are not interpreted as integer
values.

Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: David Sterba <dsterba@suse.com>
Acked-by: Darren Hart (VMware) <dvhart@infradead.org>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-10-23 17:23:44 +02:00
Parav Pandit
c4c8aff5a9 IB/core: Do not notify GID change event of an unregistered device
When IB device is undergoing unregistration, the GID cache is always
cleaned up after all clients are unregistered with the below flow.

__ib_unregister_device()
  disable_device()
  ib_cache_cleanup_one()
    gid_table_cleanup_one()
      cleanup_gid_table_port()

There is no use in generating a GID change event at this stage, where
there is no active client of the device and device is nearly unregistered.

Link: https://lore.kernel.org/r/20191020065427.8772-4-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 16:58:22 -03:00
Michael Guralnik
3f89b01f4b IB/mlx5: Align usage of QP1 create flags with rest of mlx5 defines
There is little value in keeping separate function for one flag, provide
it directly like any other mlx5 define.

Link: https://lore.kernel.org/r/20191020064400.8344-2-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 16:39:49 -03:00
Ran Rozenstein
68abaa765e IB/mlx5: Remove dead code
mlx5_ib_dc_atomic_is_supported function is not used anywhere. Remove the
dead code.

Fixes: a60109dc9a ("IB/mlx5: Add support for extended atomic operations")
Link: https://lore.kernel.org/r/20191020064454.8551-1-leon@kernel.org
Signed-off-by: Ran Rozenstein <ranro@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 16:37:48 -03:00
Chuhong Yuan
a29e1012c1 RDMA/uverbs: Add a check for uverbs_attr_get to uverbs_copy_to_struct_or_zero
All current callers for uverbs_copy_to_struct_or_zero() already check that
the attribute exists, but it make sense to verify the result like the
other functions do.

Link: https://lore.kernel.org/r/20191018081533.8544-1-hslester96@gmail.com
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 16:10:03 -03:00
Parav Pandit
d3bd939670 IB/cma: Honor traffic class from lower netdevice for RoCE
When a macvlan netdevice is used for RoCE, consider the tos->prio->tc
mapping as SL using its lower netdevice.

1. If the lower netdevice is a VLAN netdevice, consider the VLAN netdevice
   and it's parent netdevice for mapping
2. If the lower netdevice is not a VLAN netdevice, consider tc mapping
   directly from the lower netdevice

Link: https://lore.kernel.org/r/20191015072058.17347-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 15:56:22 -03:00
Erez Alfasi
4061ff7aa3 RDMA/nldev: Provide MR statistics
Add RDMA nldev netlink interface for dumping MR statistics information.

Output example:

$ ./ibv_rc_pingpong -o -P -s 500000000
  local address:  LID 0x0001, QPN 0x00008a, PSN 0xf81096, GID ::

$ rdma stat show mr
dev mlx5_0 mrn 2 page_faults 122071 page_invalidations 0

Link: https://lore.kernel.org/r/20191016062308.11886-5-leon@kernel.org
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 15:33:31 -03:00
Erez Alfasi
e1b95ae0b0 RDMA/mlx5: Return ODP type per MR
Provide an ODP explicit/implicit type as part of 'rdma -dd resource show
mr' dump.

For example:

$ rdma -dd resource show mr
dev mlx5_0 mrn 1 rkey 0xa99a lkey 0xa99a mrlen 50000000
pdn 9 pid 7372 comm ibv_rc_pingpong drv_odp explicit

For non-ODP MRs, we won't print "drv_odp ..." at all.

Link: https://lore.kernel.org/r/20191016062308.11886-4-leon@kernel.org
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 15:22:47 -03:00
Erez Alfasi
fb91069088 RDMA/nldev: Allow different fill function per resource
So far res_get_common_{dumpit, doit} was using the default resource fill
function which was defined as part of the nldev_fill_res_entry
fill_entries.

Add a fill function pointer as an argument allows us to use different fill
function in case we want to dump different values then 'rdma resource'
flow do, but still use the same existing general resources dumping flow.

Link: https://lore.kernel.org/r/20191016062308.11886-3-leon@kernel.org
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 15:22:47 -03:00
Erez Alfasi
a3de94e3d6 IB/mlx5: Introduce ODP diagnostic counters
Introduce ODP diagnostic counters and count the following
per MR within IB/mlx5 driver:
 1) Page faults:
	Total number of faulted pages.
 2) Page invalidations:
	Total number of pages invalidated by the OS during all
	invalidation events. The translations can be no longer
	valid due to either non-present pages or mapping changes.

Link: https://lore.kernel.org/r/20191016062308.11886-2-leon@kernel.org
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 15:22:47 -03:00
Dan Carpenter
a9018adfde RDMA/uverbs: Prevent potential underflow
The issue is in drivers/infiniband/core/uverbs_std_types_cq.c in the
UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE) function.  We check that:

        if (attr.comp_vector >= attrs->ufile->device->num_comp_vectors) {

But we don't check if "attr.comp_vector" is negative.  It could
potentially lead to an array underflow.  My concern would be where
cq->vector is used in the create_cq() function from the cxgb4 driver.

And really "attr.comp_vector" is appears as a u32 to user space so that's
the right type to use.

Fixes: 9ee79fce36 ("IB/core: Add completion queue (cq) object actions")
Link: https://lore.kernel.org/r/20191011133419.GA22905@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 15:05:36 -03:00
rd.dunlab@gmail.com
7c21072dde infiniband: fix sw/rdmavt/ kernel-doc notation
Add kernel-doc for missing function parameters.
Remove excess kernel-doc descriptions.
Fix expected kernel-doc formatting (use ':' instead of '-' after @funcarg).

../drivers/infiniband/sw/rdmavt/ah.c:138: warning: Excess function parameter 'udata' description in 'rvt_destroy_ah'
../drivers/infiniband/sw/rdmavt/vt.c:698: warning: Function parameter or member 'pkey_table' not described in 'rvt_init_port'
../drivers/infiniband/sw/rdmavt/cq.c:561: warning: Excess function parameter 'rdi' description in 'rvt_driver_cq_init'
../drivers/infiniband/sw/rdmavt/cq.c:575: warning: Excess function parameter 'rdi' description in 'rvt_cq_exit'
../drivers/infiniband/sw/rdmavt/qp.c:2573: warning: Function parameter or member 'qp' not described in 'rvt_add_rnr_timer'
../drivers/infiniband/sw/rdmavt/qp.c:2573: warning: Function parameter or member 'aeth' not described in 'rvt_add_rnr_timer'
../drivers/infiniband/sw/rdmavt/qp.c:2591: warning: Function parameter or member 'qp' not described in 'rvt_stop_rc_timers'
../drivers/infiniband/sw/rdmavt/qp.c:2624: warning: Function parameter or member 'qp' not described in 'rvt_del_timers_sync'
../drivers/infiniband/sw/rdmavt/qp.c:2697: warning: Function parameter or member 'cb' not described in 'rvt_qp_iter_init'
../drivers/infiniband/sw/rdmavt/qp.c:2728: warning: Function parameter or member 'iter' not described in 'rvt_qp_iter_next'
../drivers/infiniband/sw/rdmavt/qp.c:2796: warning: Function parameter or member 'rdi' not described in 'rvt_qp_iter'
../drivers/infiniband/sw/rdmavt/qp.c:2796: warning: Function parameter or member 'v' not described in 'rvt_qp_iter'
../drivers/infiniband/sw/rdmavt/qp.c:2796: warning: Function parameter or member 'cb' not described in 'rvt_qp_iter'

Link: https://lore.kernel.org/r/20191010035240.251184229@gmail.com
Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:52:56 -03:00
rd.dunlab@gmail.com
d6537c1a9c infiniband: fix core/ kernel-doc notation
Correct function parameter names (typos or renames).
Add kernel-doc notation for missing function parameters.

../drivers/infiniband/core/sa_query.c:1263: warning: Function parameter or member 'gid_attr' not described in 'ib_init_ah_attr_from_path'
../drivers/infiniband/core/sa_query.c:1263: warning: Excess function parameter 'sgid_attr' description in 'ib_init_ah_attr_from_path'

../drivers/infiniband/core/device.c:145: warning: Function parameter or member 'dev' not described in 'rdma_dev_access_netns'
../drivers/infiniband/core/device.c:145: warning: Excess function parameter 'device' description in 'rdma_dev_access_netns'
../drivers/infiniband/core/device.c:1333: warning: Function parameter or member 'name' not described in 'ib_register_device'
../drivers/infiniband/core/device.c:1461: warning: Function parameter or member 'ib_dev' not described in 'ib_unregister_device'
../drivers/infiniband/core/device.c:1461: warning: Excess function parameter 'device' description in 'ib_unregister_device'
../drivers/infiniband/core/device.c:1483: warning: Function parameter or member 'ib_dev' not described in 'ib_unregister_device_and_put'
../drivers/infiniband/core/device.c:1550: warning: Function parameter or member 'ib_dev' not described in 'ib_unregister_device_queued'

Link: https://lore.kernel.org/r/20191010035240.191542461@gmail.com
Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:52:56 -03:00
rd.dunlab@gmail.com
b24da1a0d4 infiniband: fix ulp/iser/iser_initiator.c kernel-doc warnings
Add kernel-doc notation for missing function parameters:

../drivers/infiniband/ulp/iser/iser_initiator.c:365: warning: Function parameter or member 'conn' not described in 'iser_send_command'
../drivers/infiniband/ulp/iser/iser_initiator.c:365: warning: Function parameter or member 'task' not described in 'iser_send_command'
../drivers/infiniband/ulp/iser/iser_initiator.c:437: warning: Function parameter or member 'conn' not described in 'iser_send_data_out'
../drivers/infiniband/ulp/iser/iser_initiator.c:437: warning: Function parameter or member 'task' not described in 'iser_send_data_out'
../drivers/infiniband/ulp/iser/iser_initiator.c:437: warning: Function parameter or member 'hdr' not described in 'iser_send_data_out'

Link: https://lore.kernel.org/r/20191010035240.132033937@gmail.com
Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:52:56 -03:00
rd.dunlab@gmail.com
134a42a66b infiniband: fix ulp/iser/iser_verbs.c kernel-doc notation
Various kernel-doc fixes:

- fix typos
- don't use /** for internal structs or functions
- fix Return: kernel-doc formatting
- add kernel-doc notation for missing function parameters

../drivers/infiniband/ulp/iser/iser_verbs.c:159: warning: Function parameter or member 'ib_conn' not described in 'iser_alloc_fmr_pool'
../drivers/infiniband/ulp/iser/iser_verbs.c:159: warning: Function parameter or member 'cmds_max' not described in 'iser_alloc_fmr_pool'
../drivers/infiniband/ulp/iser/iser_verbs.c:159: warning: Function parameter or member 'size' not described in 'iser_alloc_fmr_pool'
../drivers/infiniband/ulp/iser/iser_verbs.c:221: warning: Function parameter or member 'ib_conn' not described in 'iser_free_fmr_pool'
../drivers/infiniband/ulp/iser/iser_verbs.c:304: warning: Function parameter or member 'ib_conn' not described in 'iser_alloc_fastreg_pool'
../drivers/infiniband/ulp/iser/iser_verbs.c:304: warning: Function parameter or member 'cmds_max' not described in 'iser_alloc_fastreg_pool'
../drivers/infiniband/ulp/iser/iser_verbs.c:304: warning: Function parameter or member 'size' not described in 'iser_alloc_fastreg_pool'
../drivers/infiniband/ulp/iser/iser_verbs.c:338: warning: Function parameter or member 'ib_conn' not described in 'iser_free_fastreg_pool'
../drivers/infiniband/ulp/iser/iser_verbs.c:568: warning: Function parameter or member 'iser_conn' not described in 'iser_conn_release'
../drivers/infiniband/ulp/iser/iser_verbs.c:603: warning: Function parameter or member 'iser_conn' not described in 'iser_conn_terminate'
../drivers/infiniband/ulp/iser/iser_verbs.c:1040: warning: Function parameter or member 'signal' not described in 'iser_post_send'
../drivers/infiniband/ulp/iser/iser_verbs.c:1040: warning: Function parameter or member 'ib_conn' not described in 'iser_post_send'
../drivers/infiniband/ulp/iser/iser_verbs.c:1040: warning: Function parameter or member 'tx_desc' not described in 'iser_post_send'

Link: https://lore.kernel.org/r/20191010035240.070520193@gmail.com
Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:52:56 -03:00
rd.dunlab@gmail.com
094c88f3c5 infiniband: fix core/verbs.c kernel-doc notation
Add missing function parameter descriptions:

../drivers/infiniband/core/verbs.c:257: warning: Function parameter or member 'flags' not described in '__ib_alloc_pd'
../drivers/infiniband/core/verbs.c:257: warning: Function parameter or member 'caller' not described in '__ib_alloc_pd'

Link: https://lore.kernel.org/r/20191010035240.011497492@gmail.com
Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:52:56 -03:00
rd.dunlab@gmail.com
96f4b0b68d infiniband: fix ulp/srpt/ib_srpt.h kernel-doc notation
Fix kernel-doc warnings (typos or renames) in ib_srpt.h:

../drivers/infiniband/ulp/srpt/ib_srpt.h:419: warning: Function parameter or member 'port_guid_id' not described in 'srpt_port'
../drivers/infiniband/ulp/srpt/ib_srpt.h:419: warning: Function parameter or member 'port_gid_id' not described in 'srpt_port'

Link: https://lore.kernel.org/r/20191010035239.950150496@gmail.com
Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:52:56 -03:00
rd.dunlab@gmail.com
dfa4344da3 infiniband: fix ulp/opa_vnic/opa_vnic_internal.h kernel-doc notation
Remove kernel-doc notation on 4 structs since they are internal and
none of the struct fields/members are described.
This removes 45 kernel-doc warnings.

Link: https://lore.kernel.org/r/20191010035239.818405496@gmail.com
Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:45:31 -03:00
rd.dunlab@gmail.com
28f2a6aeed infiniband: fix ulp/iser/iscsi_iser.h kernel-doc warnings
Fix kernel-doc warnings and typos/spellos.

../drivers/infiniband/ulp/iser/iscsi_iser.h:254: warning: Function parameter or member 'dma_addr' not described in 'iser_tx_desc'
../drivers/infiniband/ulp/iser/iscsi_iser.h:254: warning: Function parameter or member 'cqe' not described in 'iser_tx_desc'
../drivers/infiniband/ulp/iser/iscsi_iser.h:254: warning: Function parameter or member 'reg_wr' not described in 'iser_tx_desc'
../drivers/infiniband/ulp/iser/iscsi_iser.h:254: warning: Function parameter or member 'send_wr' not described in 'iser_tx_desc'
../drivers/infiniband/ulp/iser/iscsi_iser.h:254: warning: Function parameter or member 'inv_wr' not described in 'iser_tx_desc'
../drivers/infiniband/ulp/iser/iscsi_iser.h:277: warning: Function parameter or member 'cqe' not described in 'iser_rx_desc'
../drivers/infiniband/ulp/iser/iscsi_iser.h:296: warning: Function parameter or member 'rsp' not described in 'iser_login_desc'
../drivers/infiniband/ulp/iser/iscsi_iser.h:339: warning: Function parameter or member 'reg_mem' not described in 'iser_reg_ops'
../drivers/infiniband/ulp/iser/iscsi_iser.h:399: warning: Function parameter or member 'all_list' not described in 'iser_fr_desc'
../drivers/infiniband/ulp/iser/iscsi_iser.h:413: warning: Function parameter or member 'all_list' not described in 'iser_fr_pool'
../drivers/infiniband/ulp/iser/iscsi_iser.h:439: warning: Function parameter or member 'reg_cqe' not described in 'ib_conn'
../drivers/infiniband/ulp/iser/iscsi_iser.h:491: warning: Function parameter or member 'snd_w_inv' not described in 'iser_conn'

This leaves 2 "member not described" warnings that I don't know how to fix:

../drivers/infiniband/ulp/iser/iscsi_iser.h:401: warning: Function parameter or member 'all_list' not described in 'iser_fr_desc'
../drivers/infiniband/ulp/iser/iscsi_iser.h:415: warning: Function parameter or member 'all_list' not described in 'iser_fr_pool'

Link: https://lore.kernel.org/r/20191010035239.756365352@gmail.com
Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:45:31 -03:00
rd.dunlab@gmail.com
526f2c5063 infiniband: fix core/ipwm_util.h kernel-doc warnings
Fix kernel-doc warnings and expected formatting.

../drivers/infiniband/core/iwpm_util.h:219: warning: Function parameter or member 'a_sockaddr' not described in 'iwpm_compare_sockaddr'
../drivers/infiniband/core/iwpm_util.h:219: warning: Function parameter or member 'b_sockaddr' not described in 'iwpm_compare_sockaddr'
../drivers/infiniband/core/iwpm_util.h:280: warning: Function parameter or member 'iwpm_pid' not described in 'iwpm_send_hello'

Link: https://lore.kernel.org/r/20191010035239.695604406@gmail.com
Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:45:31 -03:00
rd.dunlab@gmail.com
df130f878e infiniband: fix ulp/iser/iscsi_iser.[hc] kernel-doc notation
Fix struct name in kernel-doc notation to match the struct name below it.
Fix one typo (spello).
Fix formatting as expected for kernel-doc notation.
Fix parameter name to match the function's parameter name to eliminate a
kernel-doc warning.

../drivers/infiniband/ulp/iser/iscsi_iser.c:815: warning: Function parameter or member 'non_blocking' not described in 'iscsi_iser_ep_connect'

Link: https://lore.kernel.org/r/20191010035239.623888112@gmail.com
Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:45:31 -03:00
Jason Gunthorpe
a2aca4d7f0 Merge branch 'mlx5-rd-sgl' into rdma.git for-next
From Yamin Friedman:

====================
This series from Yamin implements long standing "TODO" existed in rw.c. It
allows the driver to specify a cut-over point where it is faster to build
a lkey MR rather than do a large SGL for RDMA READ operations.

mlx5 HW gets a notable performane boost by switching to MRs.
====================

Based on the mlx5-next branch from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for
dependencies

* branch 'mlx5-rd-sgl': (3 commits)
  RDMA/mlx5: Add capability for max sge to get optimized performance
  RDMA/rw: Support threshold for registration vs scattering to local pages
  net/mlx5: Expose optimal performance scatter entries capability
2019-10-22 14:27:25 -03:00
Yamin Friedman
366090564b RDMA/mlx5: Add capability for max sge to get optimized performance
Allows the IB device to provide a value of maximum scatter gather entries
per RDMA READ.

In certain cases it may be preferable for a device to perform UMR memory
registration rather than have many scatter entries in a single RDMA READ.
This provides a significant performance increase in devices capable of
using different memory registration schemes based on the number of scatter
gather entries. This general capability allows each device vendor to fine
tune when it is better to use memory registration.

Link: https://lore.kernel.org/r/20191007135933.12483-4-leon@kernel.org
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:27:00 -03:00
Yamin Friedman
00bd1439f4 RDMA/rw: Support threshold for registration vs scattering to local pages
If there are more scatter entries than the recommended limit provided by
the ib device, UMR registration is used. This will provide optimal
performance when performing large RDMA READs over devices that advertise
the threshold capability.

With ConnectX-5 running NVMeoF RDMA with FIO single QP 128KB writes:
Without use of cap: 70Gb/sec
With use of cap: 84Gb/sec

Link: https://lore.kernel.org/r/20191007135933.12483-3-leon@kernel.org
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:26:52 -03:00
Bernard Metzler
cf049bb31f RDMA/siw: Fix SQ/RQ drain logic
Storage ULPs (e.g. iSER & NVMeOF) use ib_drain_qp() to drain
QP/CQ. Current SIW's own drain routines do not properly wait until all
SQ/RQ elements are completed and reaped from the CQ. This may cause touch
after free issues.  New logic relies on generic
__ib_drain_sq()/__ib_drain_rq() posting a final work request, which SIW
immediately flushes to CQ.

Fixes: 303ae1cdfd ("rdma/siw: application interface")
Link: https://lore.kernel.org/r/20191004125356.20673-1-bmt@zurich.ibm.com
Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 13:43:10 -03:00
Donald Dutile
5a0d523781 ib/srp: Add missing new line after displaying fast_io_fail_tmo param
Long-time missing new-line in sysfs output.
Simply add it.

Signed-off-by: Donald Dutile <ddutile@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20191009164937.21989-1-ddutile@redhat.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-21 16:55:16 -04:00
Yangyang Li
d302c6e3a6 RDMA/hns: Release qp resources when failed to destroy qp
Even if no response from hardware, we should make sure that qp related
resources are released to avoid memory leaks.

Fixes: 926a01dc00 ("RDMA/hns: Add QP operations support for hip08 SoC")
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Link: https://lore.kernel.org/r/1570584110-3659-1-git-send-email-liweihang@hisilicon.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-21 15:45:22 -04:00
Yixing Liu
3dcad1f842 RDMA/hns: Fix a spelling mistake in a macro
HNS_ROCE_ALOGN_UP should be HNS_ROCE_ALIGN_UP, this patch fix it.

Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Link: https://lore.kernel.org/r/1567566885-23088-6-git-send-email-liweihang@hisilicon.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-21 15:29:38 -04:00
Lang Cheng
cfd82da4e7 RDMA/hns: Modify return value of restrack functions
The restrack function return EINVAL instead of EMSGSIZE when the driver
operation fails.

Fixes: 4b42d05d0b ("RDMA/hns: Remove unnecessary kzalloc")
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Link: https://lore.kernel.org/r/1567566885-23088-5-git-send-email-liweihang@hisilicon.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-21 15:29:38 -04:00
Weihang Li
32883228b9 RDMA/hns: Modify variable/field name from vlan to vlan_id
The name of vlan and vlan_tag is not clear enough, it's actually means
vlan id.

Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Link: https://lore.kernel.org/r/1567566885-23088-4-git-send-email-liweihang@hisilicon.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-21 15:29:38 -04:00
Weihang Li
e8a07de57e RDMA/hns: Fix wrong parameters when initial mtt of srq->idx_que
The parameters npages used to initial mtt of srq->idx_que shouldn't be
same with srq's. And page_shift should be calculated from idx_buf_pg_sz.
This patch fixes above issues and use field named npage and page_shift
in hns_roce_buf instead of two temporary variables to let us use them
anywhere.

Fixes: 18df508c79 ("RDMA/hns: Remove if-else judgment statements for creating srq")
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Link: https://lore.kernel.org/r/1567566885-23088-3-git-send-email-liweihang@hisilicon.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-21 15:29:37 -04:00
Weihang Li
9f7d706400 RDMA/hns: remove a redundant le16_to_cpu
Type of ah->av.vlan is u16, there will be a problem using le16_to_cpu
on it.

Fixes: 82e620d9c3 ("RDMA/hns: Modify the data structure of hns_roce_av")
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Link: https://lore.kernel.org/r/1567566885-23088-2-git-send-email-liweihang@hisilicon.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-21 15:29:37 -04:00
Parav Pandit
777a8b32bc IB/core: Use rdma_read_gid_l2_fields to compare GID L2 fields
Current code tries to derive VLAN ID and compares it with GID
attribute for matching entry. This raw search fails on macvlan
netdevice as its not a VLAN device, but its an upper device of a VLAN
netdevice.

Due to this limitation, incoming QP1 packets fail to match in the
GID table. Such packets are dropped.

Hence, to support it, use the existing rdma_read_gid_l2_fields()
that takes care of diffferent device types.

Fixes: dbf727de74 ("IB/core: Use GID table in AH creation and dmac resolution")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191002121750.17313-1-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-18 14:55:33 -04:00
Kamal Heib
b806c94ee4 RDMA/qedr: Fix reported firmware version
Remove spaces from the reported firmware version string.
Actual value:
$ cat /sys/class/infiniband/qedr0/fw_ver
8. 37. 7. 0

Expected value:
$ cat /sys/class/infiniband/qedr0/fw_ver
8.37.7.0

Fixes: ec72fce401 ("qedr: Add support for RoCE HW init")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Link: https://lore.kernel.org/r/20191007210730.7173-1-kamalheib1@gmail.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-18 14:50:14 -04:00
Krishnamraju Eraparaju
e17fa5c95e RDMA/siw: free siw_base_qp in kref release routine
As siw_free_qp() is the last routine to access 'siw_base_qp' structure,
freeing this structure early in siw_destroy_qp() could cause
touch-after-free issue.
Hence, moved kfree(siw_base_qp) from siw_destroy_qp() to siw_free_qp().

Fixes: 303ae1cdfd ("rdma/siw: application interface")
Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Link: https://lore.kernel.org/r/20191007104229.29412-1-krishna2@chelsio.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-18 14:49:01 -04:00
Krishnamraju Eraparaju
54102dd410 RDMA/iwcm: move iw_rem_ref() calls out of spinlock
kref release routines usually perform memory release operations,
hence, they should not be called with spinlocks held.
one such case is: SIW kref release routine siw_free_qp(), which
can sleep via vfree() while freeing queue memory.

Hence, all iw_rem_ref() calls in IWCM are moved out of spinlocks.

Fixes: 922a8e9fb2 ("RDMA: iWARP Connection Manager.")
Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20191007102627.12568-1-krishna2@chelsio.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-18 14:40:01 -04:00
Potnuri Bharat Teja
612e0486ad iw_cxgb4: fix ECN check on the passive accept
pass_accept_req() is using the same skb for handling accept request and
sending accept reply to HW. Here req and rpl structures are pointing to
same skb->data which is over written by INIT_TP_WR() and leads to
accessing corrupt req fields in accept_cr() while checking for ECN flags.
Reordered code in accept_cr() to fetch correct req fields.

Fixes: 92e7ae7172 ("iw_cxgb4: Choose appropriate hw mtu index and ISS for iWARP connections")
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Link: https://lore.kernel.org/r/20191003104353.11590-1-bharat@chelsio.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-18 14:35:19 -04:00
Mike Marciniszyn
22bb136534 IB/hfi1: Use a common pad buffer for 9B and 16B packets
There is no reason for a different pad buffer for the two
packet types.

Expand the current buffer allocation to allow for both
packet types.

Fixes: f8195f3b14 ("IB/hfi1: Eliminate allocation while atomic")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Link: https://lore.kernel.org/r/20191004204934.26838.13099.stgit@awfm-01.aw.intel.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-17 16:32:25 -04:00
Kaike Wan
9ed5bd7d22 IB/hfi1: Avoid excessive retry for TID RDMA READ request
A TID RDMA READ request could be retried under one of the following
conditions:
- The RC retry timer expires;
- A later TID RDMA READ RESP packet is received before the next
  expected one.
For the latter, under normal conditions, the PSN in IB space is used
for comparison. More specifically, the IB PSN in the incoming TID RDMA
READ RESP packet is compared with the last IB PSN of a given TID RDMA
READ request to determine if the request should be retried. This is
similar to the retry logic for noraml RDMA READ request.

However, if a TID RDMA READ RESP packet is lost due to congestion,
header suppresion will be disabled and each incoming packet will raise
an interrupt until the hardware flow is reloaded. Under this condition,
each packet KDETH PSN will be checked by software against r_next_psn
and a retry will be requested if the packet KDETH PSN is later than
r_next_psn. Since each TID RDMA READ segment could have up to 64
packets and each TID RDMA READ request could have many segments, we
could make far more retries under such conditions, and thus leading to
RETRY_EXC_ERR status.

This patch fixes the issue by removing the retry when the incoming
packet KDETH PSN is later than r_next_psn. Instead, it resorts to
RC timer and normal IB PSN comparison for any request retry.

Fixes: 9905bf06e8 ("IB/hfi1: Add functions to receive TID RDMA READ response")
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Link: https://lore.kernel.org/r/20191004204035.26542.41684.stgit@awfm-01.aw.intel.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-17 16:31:17 -04:00
Rafi Wiener
c8973df2da RDMA/mlx5: Clear old rate limit when closing QP
Before QP is closed it changes to ERROR state, when this happens
the QP was left with old rate limit that was already removed from
the table.

Fixes: 7d29f349a4 ("IB/mlx5: Properly adjust rate limit on QP state transitions")
Signed-off-by: Rafi Wiener <rafiw@mellanox.com>
Signed-off-by: Oleg Kuporosov <olegk@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191002120243.16971-1-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-17 16:07:25 -04:00
Parav Pandit
03232cc43c IB/mlx5: Introduce and use mkey context setting helper routine
Introduce and use set_mkc_access_pd_addr_fields() which sets mkey
context's access rights, PD, address fields.  Thereby avoid the code
duplication.

Link: https://lore.kernel.org/r/20191006155443.31068-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-08 16:45:07 -03:00
Max Gurtovoy
3466c060ef RDMA/iser: Use iser_err instead of pr_err for logging
Make sure all the debug prints in ib_iser module use the common driver
logger.

Link: https://lore.kernel.org/r/1570366580-24097-1-git-send-email-maxg@mellanox.com
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-08 16:35:52 -03:00
Devesh Sharma
39c48c5146 RDMA/bnxt_re: Enable SRIOV VF support on Broadcom's 57500 adapter series
Broadcom's 575xx adapter series has support for SRIOV VFs.  Making changes
to enable SRIOV VF support. There are two major area where changes are
done:

 - Added new DB location for control-path and data-path DB ring

 - New devices do not need to issue the sriov-config slow-path command
   thus, skipping to call that firmware command.

For now enabling support for 64 RoCE VFs.

Link: https://lore.kernel.org/r/1570081715-14301-1-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-08 16:22:24 -03:00
Honggang Li
b2e872f451 RDMA/srp: Calculate max_it_iu_size if remote max_it_iu length available
The default maximum immediate size is too big for old srp clients, which
do not support immediate data.

According to the SRP and SRP-2 specifications, the IOControllerProfile
attributes for SRP target ports contains the maximum initiator to target
iu length.

The maximum initiator to target iu length can be obtained by sending MAD
packets to query subnet manager port and SRP target ports. We should
calculate the max_it_iu_size instead of the default value, when remote
maximum initiator to target iu length available.

Link: https://lore.kernel.org/r/20190927174352.7800-2-honli@redhat.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Honggang Li <honli@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-08 15:17:32 -03:00
Honggang Li
547ed331bb RDMA/srp: Add parse function for maximum initiator to target IU size
According to SRP specifications 'srp-r16a' and 'srp2r06',
IOControllerProfile attributes for SRP target port include the maximum
initiator to target IU size.

SRP connection daemons, such as srp_daemon, can get the value from the
subnet manager. The SRP connection daemon can pass this value to kernel.

This patch adds a parse function for it.

Upstream commit [1] enables the kernel parameter, 'use_imm_data', by
default. [1] also use (8 * 1024) as the default value for kernel parameter
'max_imm_data'. With those default values, the maximum initiator to target
IU size will be 8260.

In case the SRPT modules, which include the in-tree 'ib_srpt.ko' module,
do not support SRP-2 'immediate data' feature, the default maximum
initiator to target IU size is significantly smaller than 8260. For
'ib_srpt.ko' module, which built from source before [2], the default
maximum initiator to target IU is 2116.

[1] introduces a regression issue for old srp targets with default kernel
parameters, as the connection will be rejected because of a too large
maximum initiator to target IU size.

[1] commit 882981f4a4 ("RDMA/srp: Add support for immediate data")
[2] commit 5dabcd0456 ("RDMA/srpt: Add support for immediate data")

Link: https://lore.kernel.org/r/20190927174352.7800-1-honli@redhat.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Honggang Li <honli@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-08 15:17:32 -03:00
Jason Gunthorpe
0417791536 RDMA/mlx5: Add missing synchronize_srcu() for MW cases
While MR uses live as the SRCU 'update', the MW case uses the xarray
directly, xa_erase() causes the MW to become inaccessible to the pagefault
thread.

Thus whenever a MW is removed from the xarray we must synchronize_srcu()
before freeing it.

This must be done before freeing the mkey as re-use of the mkey while the
pagefault thread is using the stale mkey is undesirable.

Add the missing synchronizes to MW and DEVX indirect mkey and delete the
bogus protection against double destroy in mlx5_core_destroy_mkey()

Fixes: 534fd7aac5 ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP")
Fixes: 6aec21f6a8 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-7-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:54:22 -03:00
Jason Gunthorpe
aa603815c7 RDMA/mlx5: Put live in the correct place for ODP MRs
live is used to signal to the pagefault thread that the MR is initialized
and ready for use. It should be after the umem is assigned and all other
setup is completed. This prevents races (at least) of the form:

    CPU0                                     CPU1
mlx5_ib_alloc_implicit_mr()
 implicit_mr_alloc()
  live = 1
 imr->umem = umem
                                    num_pending_prefetch_inc()
                                      if (live)
				        atomic_inc(num_pending_prefetch)
 atomic_set(num_pending_prefetch,0) // Overwrites other thread's store

Further, live is being used with SRCU as the 'update' in an
acquire/release fashion, so it can not be read and written raw.

Move all live = 1's to after MR initialization is completed and use
smp_store_release/smp_load_acquire() for manipulating it.

Add a missing live = 0 when an implicit MR child is deleted, before
queuing work to do synchronize_srcu().

The barriers in update_odp_mr() were some broken attempt to create a
acquire/release, but were not even applied consistently and missed the
point, delete it as well.

Fixes: 6aec21f6a8 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-6-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:54:22 -03:00
Jason Gunthorpe
aa116b810a RDMA/mlx5: Order num_pending_prefetch properly with synchronize_srcu
During destroy setting live = 0 and then synchronize_srcu() prevents
num_pending_prefetch from incrementing, and also, ensures that all work
holding that count is queued on the WQ. Testing before causes races of the
form:

    CPU0                                         CPU1
  dereg_mr()
                                          mlx5_ib_advise_mr_prefetch()
            				   srcu_read_lock()
                                            num_pending_prefetch_inc()
					      if (!live)
   live = 0
   atomic_read() == 0
     // skip flush_workqueue()
                                              atomic_inc()
 					      queue_work();
            				   srcu_read_unlock()
   WARN_ON(atomic_read())  // Fails

Swap the order so that the synchronize_srcu() prevents this.

Fixes: a6bc3875f1 ("IB/mlx5: Protect against prefetch of invalid MR")
Link: https://lore.kernel.org/r/20191001153821.23621-5-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:54:22 -03:00
Jason Gunthorpe
9dc775e7f5 RDMA/odp: Lift umem_mutex out of ib_umem_odp_unmap_dma_pages()
This fixes a race of the form:
    CPU0                               CPU1
mlx5_ib_invalidate_range()     mlx5_ib_invalidate_range()
				 // This one actually makes npages == 0
				 ib_umem_odp_unmap_dma_pages()
				 if (npages == 0 && !dying)
  // This one does nothing
  ib_umem_odp_unmap_dma_pages()
  if (npages == 0 && !dying)
     dying = 1;
                                    dying = 1;
				    schedule_work(&umem_odp->work);
     // Double schedule of the same work
     schedule_work(&umem_odp->work);  // BOOM

npages and dying must be read and written under the umem_mutex lock.

Since whenever ib_umem_odp_unmap_dma_pages() is called mlx5 must also call
mlx5_ib_update_xlt, and both need to be done in the same locking region,
hoist the lock out of unmap.

This avoids an expensive double critical section in
mlx5_ib_invalidate_range().

Fixes: 81713d3788 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20191001153821.23621-4-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:54:21 -03:00
Jason Gunthorpe
f28b1932ea RDMA/mlx5: Fix a race with mlx5_ib_update_xlt on an implicit MR
mlx5_ib_update_xlt() must be protected against parallel free of the MR it
is accessing, also it must be called single threaded while updating the
HW. Otherwise we can have races of the form:

    CPU0                               CPU1
  mlx5_ib_update_xlt()
   mlx5_odp_populate_klm()
     odp_lookup() == NULL
     pklm = ZAP
                                      implicit_mr_get_data()
 				        implicit_mr_alloc()
 					  <update interval tree>
					mlx5_ib_update_xlt
					  mlx5_odp_populate_klm()
					    odp_lookup() != NULL
					    pklm = VALID
					   mlx5_ib_post_send_wait()

    mlx5_ib_post_send_wait() // Replaces VALID with ZAP

This can be solved by putting both the SRCU and the umem_mutex lock around
every call to mlx5_ib_update_xlt(). This ensures that the content of the
interval tree relavent to mlx5_odp_populate_klm() (ie mr->parent == mr)
will not change while it is running, and thus the posted WRs to update the
KLM will always reflect the correct information.

The race above will resolve by either having CPU1 wait till CPU0 completes
the ZAP or CPU0 will run after the add and instead store VALID.

The pagefault path adding children already holds the umem_mutex and SRCU,
so the only missed lock is during MR destruction.

Fixes: 81713d3788 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20191001153821.23621-3-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:54:21 -03:00
Jason Gunthorpe
880505cfef RDMA/mlx5: Do not allow rereg of a ODP MR
This code is completely broken, the umem of a ODP MR simply cannot be
discarded without a lot more locking, nor can an ODP mkey be blithely
destroyed via destroy_mkey().

Fixes: 6aec21f6a8 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-2-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:54:21 -03:00
Mohamad Heib
1cbe866cbc IB/core: Fix wrong iterating on ports
rdma_for_each_port is already incrementing the iterator's value it
receives therefore, after the first iteration the iterator is increased by
2 which eventually causing wrong queries and possible traces.

Fix the above by removing the old redundant incrementation that was used
before rdma_for_each_port() macro.

Cc: <stable@vger.kernel.org>
Fixes: ea1075edcb ("RDMA: Add and use rdma_for_each_port")
Link: https://lore.kernel.org/r/20191002122127.17571-1-leon@kernel.org
Signed-off-by: Mohamad Heib <mohamadh@mellanox.com>
Reviewed-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:50:27 -03:00