linux/drivers/infiniband/hw
Håkon Bugge 4542e3c79a IB/mlx4: Fix CM REQ retries in paravirt mode
CM REQs cannot be successfully retried, because a new pv_cm_id is
created for each request, without checking if one already exists.

By checking if an id exists before creating one, the bug is fixed.

This bug can be provoked by running an RDMA CM user-land application,
but inserting a five seconds delay before the rdma_accept() call on
the passive side. This delay is larger than the default CMA timeout,
and triggers a retry from the active side. The retried REQ will use
another pv_cm_id (the cm_id on the wire). This confuses the CM
protocol and two REJs are sent from the passive side.

Here is an excerpt from ibdump running without the patch:

3.285092       LID: 4 -> LID: 4       SDP 290 CM: ConnectRequest(SDP Hello)
7.382711       LID: 4 -> LID: 4       SDP 290 CM: ConnectRequest(SDP Hello)
7.382861       LID: 4 -> LID: 4       InfiniBand 290 CM: ConnectReject
7.387644       LID: 4 -> LID: 4       InfiniBand 290 CM: ConnectReject

and here is the same with bug fix applied:

3.251010       LID: 4 -> LID: 4       SDP 290 CM: ConnectRequest(SDP Hello)
7.349387       LID: 4 -> LID: 4       SDP 290 CM: ConnectRequest(SDP Hello)
8.258443       LID: 4 -> LID: 4       SDP 290 CM: ConnectReply(SDP Hello)
8.259890       LID: 4 -> LID: 4       InfiniBand 290 CM: ReadyToUse

Suggested-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reported-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Tested-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:50 -04:00
..
bnxt_re RDMA/bnxt_re: Fix the value reported for local ack delay 2017-07-20 11:20:50 -04:00
cxgb3 IB/cxgb3: Fix error codes in iwch_alloc_mr() 2017-07-20 11:20:49 -04:00
cxgb4 cxgb4: Fix error codes in c4iw_create_cq() 2017-07-20 11:20:49 -04:00
hfi1 IB/{rdmavt, qib, hfi1}: Remove gfp flags argument 2017-07-17 21:21:23 -04:00
hns IB/hns: Fix for checkpatch.pl comment style warnings 2017-07-17 21:21:29 -04:00
i40iw IB/i40iw: Fix error code in i40iw_create_cq() 2017-07-20 11:20:49 -04:00
mlx4 IB/mlx4: Fix CM REQ retries in paravirt mode 2017-07-20 11:20:50 -04:00
mlx5 IB/mlx5: Fix a warning message 2017-07-20 11:20:49 -04:00
mthca IB/core: Define 'ib' and 'roce' rdma_ah_attr types 2017-05-01 14:32:43 -04:00
nes IB: Convert msleep below 20ms to usleep_range 2017-07-17 21:21:22 -04:00
ocrdma RDMA/ocrdma: Fix error codes in ocrdma_create_srq() 2017-07-20 11:20:49 -04:00
qedr Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-06-21 17:35:22 -04:00
qib IB/{rdmavt, qib, hfi1}: Remove gfp flags argument 2017-07-17 21:21:23 -04:00
usnic IB/core: Rename struct ib_ah_attr to rdma_ah_attr 2017-05-01 14:32:43 -04:00
vmw_pvrdma IB/core: Define 'ib' and 'roce' rdma_ah_attr types 2017-05-01 14:32:43 -04:00
Makefile RDMA/bnxt_re: Add bnxt_re driver build support 2017-02-14 09:51:28 -05:00