Commit Graph

241 Commits

Author SHA1 Message Date
Yishai Hadas
76dc5a8406 IB/mlx5: Manage device uid for DEVX white list commands
Manage device uid for DEVX white list commands.  The created device uid
will be used on white list commands if the user didn't supply its own uid.

This will enable the firmware to filter out non privileged functionality
as of the recognition of the uid.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-27 13:01:33 -06:00
Jason Gunthorpe
5a738b5d47 RDMA/drivers: Use dev_err/dbg/etc instead of pr_* + ibdev->name
Kernel convention is that a driver for a subsystem will print using
dev_* on the subsystem's struct device, or with dev_* on the physical
device. Drivers should rarely use a pr_* function.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-26 13:51:48 -06:00
Yishai Hadas
d00614c057 IB/mlx5: Set uid as part of XRCD commands
Set uid as part of XRCD commands so that the firmware can manage the
XRCD object in a secured way.

That will enable using an XRCD that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25 14:06:04 -06:00
Yishai Hadas
5deba86ee2 IB/mlx5: Set uid as part of RQT commands
Set uid as part of RQT commands so that the firmware can manage the
RQT object in a secured way.

That will enable using an RQT that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25 14:06:04 -06:00
Yishai Hadas
a1069c1c75 IB/mlx5: Use uid as part of PD commands
Use uid as part of PD commands so that the firmware can manage the
PD object in a secured way.

For example when a QP is created its uid must match the CQ uid which it
uses.

Next patches in this series will use the uid from the PD, then will come
a patch to set the uid on the PD so that all objects will be properly
work in one change.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25 14:06:04 -06:00
Doug Ledford
f9882bb506 Merge branch 'mlx5-vport-loopback' into rdma.get
For dependencies, branch based on 'mlx5-next' of
    git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git

mlx5 mcast/ucast loopback control enhancements from Leon Romanovsky:

====================
This is short series from Mark which extends handling of loopback
traffic. Originally mlx5 IB dynamically enabled/disabled both unicast
and multicast based on number of users. However RAW ethernet QPs need
more granular access.
====================

Fixed failed automerge in mlx5_ib.h (minor context conflict issue)

mlx5-vport-loopback branch:
    RDMA/mlx5: Enable vport loopback when user context or QP mandate
    RDMA/mlx5: Allow creating RAW ethernet QP with loopback support
    RDMA/mlx5: Refactor transport domain bookkeeping logic
    net/mlx5: Rename incorrect naming in IFC file

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 20:41:58 -04:00
Mark Bloch
0042f9e458 RDMA/mlx5: Enable vport loopback when user context or QP mandate
A user can create a QP which can accept loopback traffic, but that's not
enough. We need to enable loopback on the vport as well. Currently vport
loopback is enabled only when more than 1 users are using the IB device,
update the logic to consider whatever a QP which supports loopback was
created, if so enable vport loopback even if there is only a single user.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 20:20:59 -04:00
Mark Bloch
175edba856 RDMA/mlx5: Allow creating RAW ethernet QP with loopback support
Expose two new flags:
MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC
MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_MC

Those flags can be used at creation time in order to allow a QP
to be able to receive loopback traffic (unicast and multicast).
We store the state in the QP to be used on the destroy path
to indicate with which flags the QP was created with.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 20:20:59 -04:00
Mark Bloch
a560f1d9af RDMA/mlx5: Refactor transport domain bookkeeping logic
In preparation to enable loopback on a single user context move the logic
that enables/disables loopback to separate functions and group variables
under a single struct.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 20:20:59 -04:00
Jason Gunthorpe
b5231b019d RDMA/umem: Use ib_umem_odp in all function signatures connected to ODP
All of these functions already require the ODP version of the umem struct,
make this very clear by having the signature require it. This paves the
way to using the container_of() pattern to link umem_odp and umem
together.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 11:54:46 -04:00
Jason Gunthorpe
e2cd1d1ad2 RDMA/mlx5: Use rdma_user_mmap_io
Rely on the new core code helper to map BAR memory from the driver.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-20 16:19:30 -04:00
Mark Bloch
b47fd4ffe2 RDMA/mlx5: Add NIC TX namespace when getting a flow table
Add the ability to get a NIC TX flow table when using _get_flow_table().
This will allow to create a matcher and a flow rule on the NIC TX path.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11 09:28:07 -06:00
Mark Bloch
b823dd6d86 RDMA/mlx5: Refactor raw flow creation
Move struct mlx5_flow_act to be passed from the method entry point,
this will allow to add support for flow action for the raw create flow
path.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11 09:28:07 -06:00
Mark Bloch
2ea2620390 RDMA/mlx5: Refactor flow action parsing to be more generic
Make the parsing of flow actions more generic so it could be used by
mlx5 raw create flow.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11 09:28:06 -06:00
Mark Bloch
78dd0c430f RDMA/mlx5: Add NIC TX steering support
Just like ingress steering, allow a user to create steering rules that
match egress vport traffic. We expose the same number of priorities as
the bypass (NIC RX) steering.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11 09:28:06 -06:00
Jason Gunthorpe
af68ccbc11 Merge branch 'mlx5-flow-mutate' into rdma.git for-next
For dependencies, branch based on 'mellanox/mlx5-next' of
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git

Pull Flow actions to mutate packets from Leon Romanovsky:

====================
This series exposes the ability to create flow actions which can
mutate packet headers. We do that by exposing two new verbs:
 * modify header - can change existing packet headers. packet
 * reformat - can encapsulate or decapsulate a packet.
              Once created a flow action must be attached to a steering
              rule for it to take effect.

The first 10 patches refactor mlx5_core code, rename internal structures
to better reflect their operation and export needed functions so the RDMA
side can allocate the action.

The last 5 patches expose via the IOCTL infrastructure mlx5_ib methods
which do the actual allocation of resources and return an handle to the
user. A user of this API is expected to know how to work with the device's
spec as the input to those function is HW depended.

An example usage of the modify header action is routing, A user can create
an action which edits the L2 header and decrease the TTL.

An example usage of the packet reformat action is VXLAN encap/decap which
is done by the HW.
====================

* branch 'mlx5-flow-mutate':
  RDMA/mlx5: Extend packet reformat verbs
  RDMA/mlx5: Add new flow action verb - packet reformat
  RDMA/uverbs: Add generic function to fill in flow action object
  RDMA/mlx5: Add a new flow action verb - modify header
  RDMA/uverbs: Add UVERBS_ATTR_CONST_IN to the specs language
  net/mlx5: Export packet reformat alloc/dealloc functions
  net/mlx5: Pass a namespace for packet reformat ID allocation
  net/mlx5: Expose new packet reformat capabilities
  {net, RDMA}/mlx5: Rename encap to reformat packet
  net/mlx5: Move header encap type to IFC header file
  net/mlx5: Break encap/decap into two separated flow table creation flags
  net/mlx5: Add support for more namespaces when allocating modify header
  net/mlx5: Export modify header alloc/dealloc functions
  net/mlx5: Add proper NIC TX steering flow tables support
  net/mlx5: Cleanup flow namespace getter switch logic

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-05 15:24:58 -06:00
Mark Bloch
a090d0d859 RDMA/mlx5: Extend packet reformat verbs
We expose new actions:

L2_TO_L2_TUNNEL - A generic encap from L2 to L2, the data passed should
		  be the encapsulating headers.

L3_TUNNEL_TO_L2 - Will do decap where the inner packet starts from L3,
		  the data should be mac or mac + vlan (14 or 18 bytes).

L2_TO_L3_TUNNEL - Will do encap where is L2 of the original packet will
		  not be included, the data should be the encapsulating
		  header.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-05 15:23:59 -06:00
Mark Bloch
08aeb97cb8 RDMA/mlx5: Add new flow action verb - packet reformat
For now, only add L2_TUNNEL_TO_L2 option. This will allow to perform
generic decap operation if the encapsulating protocol is L2 based, and the
inner packet is also L2 based. For example this can be used to decap VXLAN
packets.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-05 15:23:59 -06:00
Mark Bloch
b4749bf256 RDMA/mlx5: Add a new flow action verb - modify header
Expose the ability to create a flow action which changes packet
headers. The data passed from userspace should be modify header actions as
defined by HW specification.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-05 15:23:58 -06:00
Majd Dibbiny
c6a21c3864 IB/mlx5: Change TX affinity assignment in RoCE LAG mode
In the current code, the TX affinity is per RoCE device, which can cause
unfairness between different contexts. e.g. if we open two contexts, and
each open 10 QPs concurrently, all of the QPs of the first context might
end up on the first port instead of distributed on the two ports as
expected

To overcome this unfairness between processes, we maintain per device TX
affinity, and per process TX affinity.

The allocation algorithm is as follow:

1. Hold two tx_port_affinity atomic variables, one per RoCE device and one
   per ucontext. Both initialized to 0.

2. In mlx5_ib_alloc_ucontext do:
 2.1. ucontext.tx_port_affinity = device.tx_port_affinity
 2.2. device.tx_port_affinity += 1

3. In modify QP INIT2RST:
 3.1. qp.tx_port_affinity = ucontext.tx_port_affinity % MLX5_PORT_NUM
 3.2. ucontext.tx_port_affinity += 1

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-04 16:26:14 -06:00
Jason Gunthorpe
7d96c9b176 IB/uverbs: Have the core code create the uverbs_root_spec
There is no reason for drivers to do this, the core code should take of
everything. The drivers will provide their information from rodata to
describe their modifications to the core's base uapi specification.

The core uses this to build up the runtime uapi for each device.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-10 16:06:24 -06:00
Bart Van Assche
d34ac5cd3a RDMA, core and ULPs: Declare ib_post_send() and ib_post_recv() arguments const
Since neither ib_post_send() nor ib_post_recv() modify the data structure
their second argument points at, declare that argument const. This change
makes it necessary to declare the 'bad_wr' argument const too and also to
modify all ULPs that call ib_post_send(), ib_post_recv() or
ib_post_srq_recv(). This patch does not change any functionality but makes
it possible for the compiler to verify whether the
ib_post_(send|recv|srq_recv) really do not modify the posted work request.

To make this possible, only one cast had to be introduce that casts away
constness, namely in rpcrdma_post_recvs(). The only way I can think of to
avoid that cast is to introduce an additional loop in that function or to
change the data type of bad_wr from struct ib_recv_wr ** into int
(an index that refers to an element in the work request list). However,
both approaches would require even more extensive changes than this
patch.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:09:34 -06:00
Bart Van Assche
f696bf6d64 RDMA: Constify the argument of the work request conversion functions
When posting a send work request, the work request that is posted is not
modified by any of the RDMA drivers. Make this explicit by constifying
most ib_send_wr pointers in RDMA transport drivers.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:00:20 -06:00
Yishai Hadas
cb80fb1892 IB/mlx5: Enable driver uapi commands for flow steering
Expose the mlx5 flow steering parsing trees, exposing the functionality to
user space.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 14:33:52 -06:00
Yishai Hadas
d4be3f4466 IB/mlx5: Support adding flow steering rule by raw description
Add support to set a public flow steering rule when its destination is a
TIR by using raw specification data.

The logic follows the verbs API but instead of using ib_spec(s) the raw,
device specific, description is used.

This allows supporting specialty matchers without having to define new
matches in the verbs struct based language.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 14:03:56 -06:00
Yishai Hadas
3226944124 IB/mlx5: Introduce driver create and destroy flow methods
Introduce driver create and destroy flow methods on the uverbs flow
object.

This allows the driver to get its specific device attributes to match the
underlay specification while still using the generic ib_flow object for
cleanup and code sharing.

The IB object's attributes are set via the ib_set_flow() helper function.

The specific implementation for the given specification is added in
downstream patches.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 14:03:49 -06:00
Yishai Hadas
fd44e3853c IB/mlx5: Introduce flow steering matcher uapi object
Introduce flow steering matcher object and its create and destroy methods.

This matcher object holds some mlx5 specific driver properties that
matches the underlay device specification when an mlx5 flow steering group
is created.

It will be used in downstream patches to be part of mlx5 specific create
flow method.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 13:34:37 -06:00
Leon Romanovsky
05f58ceba1 RDMA/mlx5: Check that supplied blue flame index doesn't overflow
User's supplied index is checked again total number of system pages, but
this number already includes num_static_sys_pages, so addition of that
value to supplied index causes to below error while trying to access
sys_pages[].

BUG: KASAN: slab-out-of-bounds in bfregn_to_uar_index+0x34f/0x400
Read of size 4 at addr ffff880065561904 by task syz-executor446/314

CPU: 0 PID: 314 Comm: syz-executor446 Not tainted 4.18.0-rc1+ #256
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
Call Trace:
 dump_stack+0xef/0x17e
 print_address_description+0x83/0x3b0
 kasan_report+0x18d/0x4d0
 bfregn_to_uar_index+0x34f/0x400
 create_user_qp+0x272/0x227d
 create_qp_common+0x32eb/0x43e0
 mlx5_ib_create_qp+0x379/0x1ca0
 create_qp.isra.5+0xc94/0x22d0
 ib_uverbs_create_qp+0x21b/0x2a0
 ib_uverbs_write+0xc2c/0x1010
 vfs_write+0x1b0/0x550
 ksys_write+0xc6/0x1a0
 do_syscall_64+0xa7/0x590
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x433679
Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 91 fd ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fff2b3d8e48 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00000000004002f8 RCX: 0000000000433679
RDX: 0000000000000040 RSI: 0000000020000240 RDI: 0000000000000003
RBP: 00000000006d4018 R08: 00000000004002f8 R09: 00000000004002f8
R10: 00000000004002f8 R11: 0000000000000217 R12: 0000000000000000
R13: 000000000040cb00 R14: 000000000040cb90 R15: 0000000000000006

Allocated by task 314:
 kasan_kmalloc+0xa0/0xd0
 __kmalloc+0x1a9/0x510
 mlx5_ib_alloc_ucontext+0x966/0x2620
 ib_uverbs_get_context+0x23f/0xa60
 ib_uverbs_write+0xc2c/0x1010
 __vfs_write+0x10d/0x720
 vfs_write+0x1b0/0x550
 ksys_write+0xc6/0x1a0
 do_syscall_64+0xa7/0x590
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 1:
 __kasan_slab_free+0x12e/0x180
 kfree+0x159/0x630
 kvfree+0x37/0x50
 single_release+0x8e/0xf0
 __fput+0x2d8/0x900
 task_work_run+0x102/0x1f0
 exit_to_usermode_loop+0x159/0x1c0
 do_syscall_64+0x408/0x590
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff880065561100
 which belongs to the cache kmalloc-4096 of size 4096
The buggy address is located 2052 bytes inside of
 4096-byte region [ffff880065561100, ffff880065562100)
The buggy address belongs to the page:
page:ffffea0001955800 count:1 mapcount:0 mapping:ffff88006c402480 index:0x0 compound_mapcount: 0
flags: 0x4000000000008100(slab|head)
raw: 4000000000008100 ffffea0001a7c000 0000000200000002 ffff88006c402480
raw: 0000000000000000 0000000080070007 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff880065561800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff880065561880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff880065561900: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                   ^
 ffff880065561980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff880065561a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

Cc: <stable@vger.kernel.org> # 4.15
Fixes: 1ee47ab3e8 ("IB/mlx5: Enable QP creation with a given blue flame index")
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-13 11:54:50 -06:00
Leon Romanovsky
ffaf58def0 RDMA/mlx5: Melt consecutive calls to alloc_bfreg() in one call
There is no need for three consecutive calls to alloc_bfreg(). It can be
implemented with one function.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-13 11:54:50 -06:00
Yishai Hadas
d0e84c0ad3 IB/mlx5: Add support for drain SQ & RQ
This patch follows the logic from ib_core but considers the internal
device state upon executing the involved commands.

Specifically,
Upon internal error state modify QP to an error state can be assumed to
be success as each in-progress WR going to be flushed in error in any
case as expected by that modify command.

In addition,
As the drain should never fail the driver makes sure that post_send/recv
will succeed even if the device is already in an internal error state.
As such once the driver will supply the simulated/SW CQEs the CQE for
the drain WR will be handled as well.

In case of an internal error state the CQE for the drain WR may be
completed as part of the main task that handled the error state or by
the task that issued the drain WR.

As the above depends on scheduling the code takes the relevant locks and
actions to make sure that the completion handler for that WR will always
be called after that the post_send/recv were issued but not in parallel
to the other task that handles the error flow.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-25 14:32:36 -06:00
Jason Gunthorpe
4d7dff2b8b Merge branch 'icrc-counter' into rdma.git for-next
For dependencies, branch based on 'mellanox/mlx5-next' of
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git

Pull RoCE ICRC counters from Leon Romanovsky:

====================
This series exposes RoCE ICRC counter through existing RDMA hw_counters
sysfs interface.

The first patch has all HW definitions in mlx5_ifc.h file and second patch
is the actual counter implementation.
====================

* branch 'icrc-counter':
  IB/mlx5: Support RoCE ICRC encapsulated error counter
  net/mlx5: Add RoCE RX ICRC encapsulated counter
2018-06-22 08:53:27 -06:00
Talat Batheesh
9f876f3de6 IB/mlx5: Support RoCE ICRC encapsulated error counter
This patch adds support to query the counter that counts the
RoCE packets with corrupted ICRC (Invariant Cyclic Redundancy Code).

This counter will be under
/sys/class/infiniband/<mlx5-dev>/ports/<port>/hw_counters/

rx_icrc_encapsulated - The number of RoCE packets with ICRC
error.

Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-22 08:51:14 -06:00
Yishai Hadas
c59450c463 IB/mlx5: Expose DEVX tree
Expose DEVX tree to be used by upper layers.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Yishai Hadas
7c043e908a IB/mlx5: Add support for DEVX query UAR
Return a device UAR index for a given user index via the DEVX interface.

Security note:
The hardware protection mechanism works like this: Each device object that
is subject to UAR doorbells (QP/SQ/CQ) gets a UAR ID (called uar_page in
the device specification manual) upon its creation. Then upon doorbell,
hardware fetches the object context for which the doorbell was rang, and
validates that the UAR through which the DB was rang matches the UAR ID
of the object.

If no match the doorbell is silently ignored by the hardware.  Of
course, the user cannot ring a doorbell on a UAR that was not mapped to
it.

Now in devx, as the devx kernel does not manipulate the QP/SQ/CQ command
mailboxes (except tagging them with UID), we expose to the user its UAR
ID, so it can embed it in these objects in the expected specification
format. So the only thing the user can do is hurt itself by creating a
QP/SQ/CQ with a UAR ID other than his, and then in this case other users
may ring a doorbell on its objects.

The consequence of that will be that another user can schedule a QP/SQ
of the buggy user for execution (just insert it to the hardware schedule
queue or arm its CQ for event generation), no further harm is expected.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Yishai Hadas
a8b92ca1b0 IB/mlx5: Introduce DEVX
Introduce DEVX to enable direct device commands in downstream patches
from this series.

In that mode of work the firmware manages the isolation between
processes' resources and as such a DEVX user id is created and assigned
to the given user context upon allocation request.

A capability check is done to make sure that this feature is really
supported by the firmware prior to creating the DEVX user id.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Parav Pandit
47ec386662 RDMA: Convert drivers to use sgid_attr instead of sgid_index
The core code now ensures that all driver callbacks that receive an
rdma_ah_attrs will have a sgid_attr's pointer if there is a GRH present.

Drivers can use this pointer instead of calling a query function with
sgid_index. This simplifies the drivers and also avoids races where a
gid_index lookup may return different data if it is changed.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-18 11:11:26 -06:00
Raed Salem
5e95af5f7b IB/mlx5: Add flow counters read support
Implements the flow counters read wrapper.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-02 07:35:37 +03:00
Raed Salem
3b3233fbf0 IB/mlx5: Add flow counters binding support
Associates a counters with a flow when IB_FLOW_SPEC_ACTION_COUNT is part
of the flow specifications.

The counters user space placements of location and description (index,
description) pairs are passed as private data of the counters flow
specification.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-02 07:35:32 +03:00
Raed Salem
b29e2a1309 IB/mlx5: Add counters create and destroy support
This patch implements the device counters create and destroy APIs and
introducing some internal management structures.

Downstream patches in this series will add the functionality to support
flow counters binding and reading.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-02 07:33:57 +03:00
Ariel Levkovich
6c29f57ea4 IB/mlx5: Device memory mr registration support
Adding mlx5_ib driver implementation for reg_dm_mr callback
which allows registering device memory (DM) as an MR for
local and remote access.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05 13:04:49 -06:00
Ariel Levkovich
24da00164f IB/mlx5: Device memory support in mlx5_ib
This patch adds the mlx5_ib driver implementation for the device
memory allocation API.
It implements the ib_device callbacks for allocation and deallocation
operations as well as a new mmap command support which allows mapping
an allocated device memory to a VMA.

The change also adds reporting of device memory maximum size and
alignment parameters reported in device capabilities.

The allocation/deallocation operations are using new firmware
commands to allocate MEMIC memory on the device.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05 13:04:49 -06:00
Aviad Yehezkel
802c212568 IB/mlx5: Add IPsec support for egress and ingress
This commit introduces support for the esp_aes_gcm flow
specification for the Innova device. To that end we add
support for egress steering and some validations that an
IPsec rule is indeed valid.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:27 -06:00
Aviad Yehezkel
c6475a0bca IB/mlx5: Add implementation for create and destroy action_xfrm
Adding implementation in mlx5 driver to create and destroy action_xfrm
object. This merely call the accel layer.

A user may pass MLX5_IB_XFRM_FLAGS_REQUIRE_METADATA flag which states
that [s]he expects a metadata header to be added to the payload. This
header represents information regarding the transformation's state.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:26 -06:00
Matan Barak
8c84660bb4 IB/mlx5: Initialize the parsing tree root without the help of uverbs
In order to have a custom parsing tree, a provider driver needs to
assign its parsing tree to ib_device specs_tree field. Otherwise, the
uverbs client assigns a common default parsing tree for it.
In downstream patches, the mlx5_ib driver gains a custom parsing tree,
which contains both the common objects and a new flags field for the
UVERBS_FLOW_ACTION_ESP_CREATE command.
This patch makes mlx5_ib assign its own tree to specs_root, which
later on will be extended.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:23 -06:00
Bodong Wang
61147f391a IB/mlx5: Packet packing enhancement for RAW QP
Enable RAW QP to be able to configure burst control by modify_qp. By
using burst control with rate limiting, user can achieve best
performance and accuracy. The burst control information is passed by
user through udata.

This patch also reports burst control capability for mlx5 related
hardwares, burst control is only marked as supported when both
packet_pacing_burst_bound and packet_pacing_typical_size are
supported.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 11:55:13 -06:00
Doug Ledford
2d873449a2 Merge branch 'k.o/wip/dl-for-rc' into k.o/wip/dl-for-next
Due to bug fixes found by the syzkaller bot and taken into the for-rc
branch after development for the 4.17 merge window had already started
being taken into the for-next branch, there were fairly non-trivial
merge issues that would need to be resolved between the for-rc branch
and the for-next branch.  This merge resolves those conflicts and
provides a unified base upon which ongoing development for 4.17 can
be based.

Conflicts:
	drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f95
	(IB/mlx5: Fix cleanup order on unload) added to for-rc and
	commit b5ca15ad7e (IB/mlx5: Add proper representors support)
	add as part of the devel cycle both needed to modify the
	init/de-init functions used by mlx5.  To support the new
	representors, the new functions added by the cleanup patch
	needed to be made non-static, and the init/de-init list
	added by the representors patch needed to be modified to
	match the init/de-init list changes made by the cleanup
	patch.
Updates:
	drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function
	prototypes added by representors patch to reflect new function
	names as changed by cleanup patch
	drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init
	stage list to match new order from cleanup patch

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 19:28:58 -04:00
Mark Bloch
42cea83f95 IB/mlx5: Fix cleanup order on unload
On load we create private CQ/QP/PD in order to be used by UMR, we create
those resources after we register ourself as an IB device, and we destroy
them after we unregister as an IB device. This was changed by commit
16c1975f10 ("IB/mlx5: Create profile infrastructure to add and remove
stages") which moved the destruction before we unregistration. This
allowed to trigger an invalid memory access when unloading mlx5_ib while
there are open resources:

BUG: unable to handle kernel paging request at 00000001002c012c
...
Call Trace:
 mlx5_ib_post_send_wait+0x75/0x110 [mlx5_ib]
 __slab_free+0x9a/0x2d0
 delay_time_func+0x10/0x10 [mlx5_ib]
 unreg_umr.isra.15+0x4b/0x50 [mlx5_ib]
 mlx5_mr_cache_free+0x46/0x150 [mlx5_ib]
 clean_mr+0xc9/0x190 [mlx5_ib]
 dereg_mr+0xba/0xf0 [mlx5_ib]
 ib_dereg_mr+0x13/0x20 [ib_core]
 remove_commit_idr_uobject+0x16/0x70 [ib_uverbs]
 uverbs_cleanup_ucontext+0xe8/0x1a0 [ib_uverbs]
 ib_uverbs_cleanup_ucontext.isra.9+0x19/0x40 [ib_uverbs]
 ib_uverbs_remove_one+0x162/0x2e0 [ib_uverbs]
 ib_unregister_device+0xd4/0x190 [ib_core]
 __mlx5_ib_remove+0x2e/0x40 [mlx5_ib]
 mlx5_remove_device+0xf5/0x120 [mlx5_core]
 mlx5_unregister_interface+0x37/0x90 [mlx5_core]
 mlx5_ib_cleanup+0xc/0x225 [mlx5_ib]
 SyS_delete_module+0x153/0x230
 do_syscall_64+0x62/0x110
 entry_SYSCALL_64_after_hwframe+0x21/0x86
...

We restore the original behavior by breaking the UMR stage into two parts,
pre and post IB registration stages, this way we can restore the original
functionality and maintain clean separation of logic between stages.

Fixes: 16c1975f10 ("IB/mlx5: Create profile infrastructure to add and remove stages")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 16:44:02 -04:00
Ilya Lesokhin
c44ef998f2 IB/mlx5: Maintain a single emergency page
The mlx5 driver needs to be able to issue invalidation to ODP MRs
even if it cannot allocate memory. To this end it preallocates
emergency pages to use when the situation arises.

This flow should be extremely rare enough, that we don't need
to worry about contention and therefore a single emergency page
is good enough.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 16:05:16 -04:00
Mark Bloch
b5ca15ad7e IB/mlx5: Add proper representors support
This commit adds full support for IB representor:

1) Representors profile, We add two new profiles:
   nic_rep_profile - This profile will be used to create an IB device that
   represents the PF/UPLINK.
   rep_profile - This profile will be used to create an IB device that
   represents VFs. Each VF will be its own representor.
2) Proper load/unload callbacks, Those are called by the E-Switch when
   moving to/from switchdev mode.
3) Different flow DB handling for when we in switchdev mode.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:39 -08:00
Mark Bloch
b96c9dde17 IB/mlx5: E-Switch, Add rule to forward traffic to vport
In order to forward traffic from representor's SQ to the right virtual
function, every time an SQ is created also add the corresponding flow rule
to the FDB.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:39 -08:00
Mark Bloch
8e6efa3a31 IB/mlx5: When in switchdev mode, expose only raw packet capabilities
Currently in switchdev mode we allow only for raw packet QPs.
Expose the right capabilities and set the gid table length to 0, also
make sure we don't try to enable RoCE, so split the function
to enable RoCE so representors can enable only the notifier needed for
net device events.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:39 -08:00
Mark Bloch
9a4ca38d77 IB/mlx5: Allocate flow DB only on PF IB device
A flow DB is a shared resource between PF and representors,
need to allocate it only when creating the PF IB device.
Once we add IB representors, they will use the flow db which was
created by the PF.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:39 -08:00
Mark Bloch
fc385b7ac4 IB/mlx5: Add basic regiser/unregister representors code
Create the basic infrastructure of registering and unregistering
IB representors. The load/unload callbacks are left empty and
proper implementation will be introduced in following patches.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:39 -08:00
Yonatan Cohen
388ca8be00 IB/mlx5: Implement fragmented completion queue (CQ)
The current implementation of create CQ requires contiguous
memory, such requirement is problematic once the memory is
fragmented or the system is low in memory, it causes for
failures in dma_zalloc_coherent().

This patch implements new scheme of fragmented CQ to overcome
this issue by introducing new type: 'struct mlx5_frag_buf_ctrl'
to allocate fragmented buffers, rather than contiguous ones.

Base the Completion Queues (CQs) on this new fragmented buffer.

It fixes following crashes:
kworker/29:0: page allocation failure: order:6, mode:0x80d0
CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0
Workqueue: ib_cm cm_work_handler [ib_cm]
Call Trace:
[<>] dump_stack+0x19/0x1b
[<>] warn_alloc_failed+0x110/0x180
[<>] __alloc_pages_slowpath+0x6b7/0x725
[<>] __alloc_pages_nodemask+0x405/0x420
[<>] dma_generic_alloc_coherent+0x8f/0x140
[<>] x86_swiotlb_alloc_coherent+0x21/0x50
[<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core]
[<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core]
[<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core]
[<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core]
[<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib]
[<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib]

Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-15 00:30:03 -08:00
Jason Gunthorpe
beb801ac51 RDMA: Move enum ib_cq_creation_flags to uapi headers
The flags field the enum is used with comes directly from the uapi
so it belongs in the uapi headers for clarity and so userspace can
use it.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-29 12:58:34 -07:00
Feras Daoud
5c99eaecb1 IB/mlx5: Mmap the HCA's clock info to user-space
This patch maps the new page to user space applications to
allow converting a user space completion timestamp to system wall
time at the lowest possible latency cost.
By using a versioning scheme we allow compatibility between current
and future userspace libraries.
The change moves mlx5_ib_mmap_cmd enum from mlx5_ib.h to the
abi header file mlx5-abi.h.

Reviewed-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Eitan Rabin <rabin@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-18 14:49:21 -05:00
Daniel Jurgens
aac4492ef2 IB/mlx5: Update counter implementation for dual port RoCE
Update the counter interface for multiple ports. Some counter sets
always comes from the primary device.

Port specific counters should be accessed per mlx5_core_dev not always
through the IB master mdev.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:23 -07:00
Parav Pandit
a9e546e73a IB/mlx5: Change debugfs to have per port contents
When there are multiple ports for single IB(RoCE) device, support
debugfs entries to be available for each port.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:22 -07:00
Daniel Jurgens
32f69e4be2 {net, IB}/mlx5: Manage port association for multiport RoCE
When mlx5_ib_add is called determine if the mlx5 core device being
added is capable of dual port RoCE operation. If it is, determine
whether it is a master device or a slave device using the
num_vhca_ports and affiliate_nic_vport_criteria capabilities.

If the device is a slave, attempt to find a master device to affiliate it
with. Devices that can be affiliated will share a system image guid. If
none are found place it on a list of unaffiliated ports. If a master is
found bind the port to it by configuring the port affiliation in the NIC
vport context.

Similarly when mlx5_ib_remove is called determine the port type. If it's
a slave port, unaffiliate it from the master device, otherwise just
remove it from the unaffiliated port list.

The IB device is registered as a multiport device, even if a 2nd port is
not available for affiliation. When the 2nd port is affiliated later the
GID cache must be refreshed in order to get the default GIDs for the 2nd
port in the cache. Export roce_rescan_device to provide a mechanism to
refresh the cache after a new port is bound.

In a multiport configuration all IB object (QP, MR, PD, etc) related
commands should flow through the master mlx5_core_dev, other commands
must be sent to the slave port mlx5_core_mdev, an interface is provide
to get the correct mdev for non IB object commands.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:22 -07:00
Daniel Jurgens
7fd8aefb7c IB/mlx5: Make netdev notifications multiport capable
When multiple RoCE ports are supported registration for events on
multiple netdevs is required. Refactor the event registration and
handling to support multiple ports.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:21 -07:00
Moni Shoua
776a3906b6 IB/mlx5: Add support for DC target QP
A DC Target (DCT) QP is represented in the hardware as a unique object.
This object is created by CREATE_DCT command and destroyed by DESTROY_DCT
command. However, in the driver we describe it as a QP.

The hardware command that creates a DCT needs parameters that the verb
create_qp() does not provide. Those remaining parameters are provided
with the call to the verb modify_qp(). Therefore we delay the actual
creation of a DCT in the hardware until the stage of modify_qp() to RTR.

A support for query_qp() was added as well. It uses QUERY_DCT command to
retrieve the applicable fields.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:38:51 -07:00
Moni Shoua
b4aaa1f0b4 IB/mlx5: Handle type IB_QPT_DRIVER when creating a QP
The QP type IB_QPT_DRIVER doesn't describe the transport or the service
that the QP provides but those are known only to the hardware driver.
The actual type of the QP is stored in the hardware driver context (i.e.
mlx5_qp) under the field qp_sub_type.

Take the real QP type and any extra data that is required to create the QP
from the driver channel and modify the QP initial attributes before continuing
with create_qp().

Downstream patches from this series will add support for both DCI and
DCT driver QPs.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:38:50 -07:00
Mark Bloch
3cc297db97 IB/mlx5: Move locks initialization to the corresponding stage
Unconditional locks/list and ODP srcu initialization should be done in
the INIT stage. Remove those from the CAPS stage and move them to the
proper stage.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-03 17:26:59 -07:00
Mark Bloch
c8b8992446 IB/mlx5: Move loopback initialization to the corresponding stage
The loopback stage only initializes a lock, move it to be in
the CAPS initialization phase and get rid loopback step completely.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-03 17:26:59 -07:00
Mark Bloch
16c1975f10 IB/mlx5: Create profile infrastructure to add and remove stages
Today we have single function which is used when we add an IB interface,
break this function into multiple functions.

Create stages and a generic mechanism to execute each stage.
This is in preparation for RDMA/IB representors which might not need
all stages or will do things differently in some of the stages.

This patch doesn't change any functionality.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-03 17:26:58 -07:00
Yishai Hadas
1ee47ab3e8 IB/mlx5: Enable QP creation with a given blue flame index
This patch enables QP creation with a given BF index, this allows the
user space driver to share same BF between few QPs or alternatively have
a dedicated BF per QP.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-28 11:37:46 -07:00
Yishai Hadas
4ed131d0bb IB/mlx5: Expose dynamic mmap allocation
This patch exposes the option to dynamic allocates a UAR, this
functionality will be used in downstream patch in this series as
part of QP creation.

Specifically, the user space driver asks for a UAR allocation in a given
page index, upon success this UAR and its bfregs can be used as part of
QP creation by the user space driver.

To enable allocating more than 256 UARs the page index is encoded in an
extra one byte just after the command byte.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-28 11:37:46 -07:00
Yishai Hadas
31a78a5a79 IB/mlx5: Extend UAR stuff to support dynamic allocation
This patch extends the alloc context flow to be prepared for working
with dynamic UAR allocations.

Currently upon alloc context there is some fix size of UARs that are
allocated (named 'static allocation') and there is no option to user
application to ask for more or control which UAR will be used by which
QP.

In this patch the driver prepares its data structures to manage both the
static and the dynamic allocations and let the user driver knows about
the max value of dynamic blue-flame registers that are allowed.

Downstream patches from this series will enable the dynamic allocation
and the association as part of QP creation.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-28 11:33:14 -07:00
Majd Dibbiny
ad9a3668a4 IB/mlx5: Serialize access to the VMA list
User-space applications can do mmap and munmap directly at
any time.

Since the VMA list is not protected with a mutex, concurrent
accesses to the VMA list from the mmap and munmap can cause
data corruption. Add a mutex around the list.

Cc: <stable@vger.kernel.org> # v4.7
Fixes: 7c2344c3bb ("IB/mlx5: Implements disassociate_ucontext API")
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-27 15:24:40 -07:00
Noa Osherovich
b1383aa641 IB/mlx5: Add PCI write end padding support
Add the PCI write end padding flag to device_cap_flags enum and set it
during mlx5_ib_query_device so it will be reported to user-space.

During WQ/QP creation, set that capability for WQ/QP if user requested
it and HW supports it.

PCI write end padding modification is not supported for now. There's no
such flag for a QP but for a WQ, create and modify use the same flag.
Return an error if PCI write end padding flag is set during modify_wq.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 13:50:27 -05:00
Maor Gottlieb
f95ef6cbae IB/mlx5: Add tunneling offloads support
The device can support receive Stateless Offloads for the inner
packet's fields only when the packet is processed by TIR which is
enabled to support tunneling. Otherwise, the device treats the
packet as an ordinary non-tunneling packet and receive offloads
can be done only for the outer packet's field.
In order to enable receive Stateless Offloading support for incoming
tunneling traffic the TIR should be created with tunneled_offload_en.
Tunneling offloads is supported only be raw ethernet QP.

This patch includes:
* New QP creation flag for tunneling offloads.
* Reports device capabilities.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-25 14:19:31 -04:00
Guy Levi
7a0c8f4244 IB/mlx5: Support padded 128B CQE feature
In some benchmarks and some CPU architectures, writing the CQE on a full
cache line size improves performance by saving memory access operations
(read-modify-write) relative to partial cache line change. This patch
lets the user to configure the device to pad the CQE up to 128B in case
its content is less than 128B. Currently the driver supports only padding
for a CQE size of 128B.

Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-25 14:17:06 -04:00
Noa Osherovich
ccc8708790 IB/mlx5: Allow creation of a multi-packet RQ
Allow creation of a multi-packet receive queue.

In order to create a multi-packet RQ, the following fields in
the mlx5_ib_rwq should be set:
- log_num_strides: Log of number of strides per WQE
- single_stride_log_num_of_bytes: Log of a single stride size
- two_byte_shift_en: When enabled, hardware pads 2 bytes of zeros
  before writing the message to memory (e.g. for the IP alignment).

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-25 14:03:44 -04:00
Noa Osherovich
b4f34597a5 IB/mlx5: Expose multi-packet RQ capabilities
This patch reports the device's striding RQ capabilities to
the user-space:
- min/max_single_stride_log_num_of_bytes: Log of min/max number of
  bytes in a single stride.
- min/max_single_wqe_log_num_of_strides: Log of min/max number of
  strides in a single WQE.
- supported_qpts: A bit mask to know which QP types support multi-
  packet RQ, for now only Raw Packet QPs.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-25 14:03:44 -04:00
Artemy Kovalyov
eb76189435 IB/mlx5: Fill XRQ capabilities
Provide driver specific values for XRQ capabilities.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29 08:30:19 -04:00
Ilya Lesokhin
8b7ff7f3b3 IB/mlx5: Enable UMR for MRs created with reg_create
This patch is the first step in decoupling UMR usage and
allocation from the MR cache. The only functional change
in this patch is to enables UMR for MRs created with
reg_create.

This change fixes a bug where ODP memory regions that
were not allocated from the MR cache did not have UMR
enabled.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 17:47:34 -04:00
Yishai Hadas
c2e53b2ce1 IB/mlx5: Add support for QP with a given source QPN
Allow user space applications to accelerate send and receive
traffic which is typically handled by IPoIB ULP by creating
a UD QP with a given source QPN of the IPoIB UD QP.

UD QP with a given source QPN should basically be similar to
RAW QP from point of view of its created resources.

However,
- Its TIS should point to the source QPN.
- Modify can be done only on its state as the transport attributes
  are managed by its source QP.

This patch manages below:
- Creating/destroying/modifying UD QP with a given source QPN.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:40:46 -04:00
Maor Gottlieb
fe248c3a58 IB/mlx5: Add delay drop configuration and statistics
Add debugfs interface for monitor the number of delay drop timeout
events and the number of existing dropless RQs in the system.

In addition add debugfs interface for configuring the global timeout value
which is used in the SET_DELAY_DROP command.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:40:22 -04:00
Maor Gottlieb
03404e8ae6 IB/mlx5: Add support to dropless RQ
RQs that were configured for "delay drop" will prevent packet drops
when their WQEs are depleted.
Marking an RQ to be drop-less is done by setting delay_drop_en in RQ
context using CREATE_RQ command.

Since this feature is globally activated/deactivated by using the
SET_DELAY_DROP command on all the marked RQs, we activated/deactivated
it according to the number of RQs with 'delay_drop' enabled.

When timeout is expired, then the feature is deactivated. Therefore
the driver handles the delay drop timeout event and reactivate it.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:39:53 -04:00
Parav Pandit
4a2da0b8c0 IB/mlx5: Add debug control parameters for congestion control
This patch adds debug control parameters for congestion control which
can be read or written through debugfs. They are for reaction point and
notification point nodes.

These control parameters are as below:
 +------------------------------+-----------------------------------------+
 |      Name                    |           Description                   |
 |------------------------------+-----------------------------------------|
 |rp_clamp_tgt_rate             | When set target rate is updated to      |
 |                              | current rate                            |
 |------------------------------+-----------------------------------------|
 |rp_clamp_tgt_rate_ati         | When set update target rate based on    |
 |                              | timer as well                           |
 |------------------------------+-----------------------------------------|
 |rp_time_reset                 | time between rate increase if no        |
 |                              | CNP is received unit in usec            |
 |------------------------------+-----------------------------------------|
 |rp_byte_reset                 | Number of bytes between rate inease if  |
 |                              | no CNP is received                      |
 |------------------------------+-----------------------------------------|
 |rp_threshold                  | Threshold for reaction point rate       |
 |                              | control                                 |
 |------------------------------+-----------------------------------------|
 |rp_ai_rate                    | Rate for target rate, unit in Mbps      |
 |------------------------------+-----------------------------------------|
 |rp_hai_rate                   | Rate for hyper increase state           |
 |                              | unit in Mbps                            |
 |------------------------------+-----------------------------------------|
 |rp_min_dec_fac                | Minimum factor by which the current     |
 |                              | transmit rate can be changed when       |
 |                              | processing a CNP, unit is percerntage   |
 |------------------------------+-----------------------------------------|
 |rp_min_rate                   | Minimum value for rate limit,           |
 |                              | unit in Mbps                            |
 |------------------------------+-----------------------------------------|
 |rp_rate_to_set_on_first_cnp   | Rate that is set when first CNP is      |
 |                              | received, unit is Mbps                  |
 |------------------------------+-----------------------------------------|
 |rp_dce_tcp_g                  | Used to calculate alpha                 |
 |------------------------------+-----------------------------------------|
 |rp_dce_tcp_rtt                | Time between updates of alpha value,    |
 |                              | unit is usec                            |
 |------------------------------+-----------------------------------------|
 |rp_rate_reduce_monitor_period | Minimum time between consecutive rate   |
 |                              | reductions                              |
 |------------------------------+-----------------------------------------|
 |rp_initial_alpha_value        | Initial value of alpha                  |
 |------------------------------+-----------------------------------------|
 |rp_gd                         | When CNP is received, flow rate is      |
 |                              | reduced based on gd, rp_gd is given as  |
 |                              | log2(rp_gd)                             |
 |------------------------------+-----------------------------------------|
 |np_cnp_dscp                   | dscp code point for generated cnp       |
 |------------------------------+-----------------------------------------|
 |np_cnp_prio_mode              | 802.1p priority for generated cnp       |
 |------------------------------+-----------------------------------------|
 |np_cnp_prio                   | cnp priority mode                       |
 +------------------------------+-----------------------------------------+

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:34:28 -04:00
Moni Shoua
fd65f1b8ee IB/mlx5: Change logic for dispatching IB events for port state
The old logic ignored link state. This led to missing IB events like
when link goes down on the switch while admin state is up or to redundant
events like when admin state goes up while link is down.
To fix that, probe the port state on NETDEV events and compare to last
known state to decide if IB events needs to be dispatched.

FIxes: 5ec8c83e3a ("IB/mlx5: Port events in RoCE now rely on netdev events")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:29:18 -04:00
Huy Nguyen
c85023e153 IB/mlx5: Add raw ethernet local loopback support
Currently, unicast/multicast loopback raw ethernet
(non-RDMA) packets are sent back to the vport.
A unicast loopback packet is the packet with destination
MAC address the same as the source MAC address.
For multicast, the destination MAC address is in the
vport's multicast filter list.

Moreover, the local loopback is not needed if
there is one or none user space context.

After this patch, the raw ethernet unicast and multicast
local loopback are disabled by default. When there is more
than one user space context, the local loopback is enabled.

Note that when local loopback is disabled, raw ethernet
packets are not looped back to the vport and are forwarded
to the next routing level (eswitch, or multihost switch,
or out to the wire depending on the configuration).

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:29:18 -04:00
Max Gurtovoy
6e8484c5cf RDMA/mlx5: set UMR wqe fence according to HCA cap
Cache the needed umr_fence and set the wqe ctrl segmennt
accordingly.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-01 17:19:57 -04:00
Dasaratharaman Chandramouli
90898850ec IB/core: Rename struct ib_ah_attr to rdma_ah_attr
This patch simply renames struct ib_ah_attr to
rdma_ah_attr as these fields specify attributes that are
not necessarily specific to IB.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Artemy Kovalyov
db570d7dea IB/mlx5: Add ODP support to MW
Internally MW implemented as KLM MKey and filled by userspace UMR
postsends.  Handle pagefault trigered by operations on this MKeys.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 15:40:28 -04:00
Parav Pandit
e1f24a79f4 IB/mlx5: Support congestion related counters
This patch adds support to query the congestion related hardware counters
through new command and links them with other hw counters being available
in hw_counters sysfs location.

In order to reuse existing infrastructure it renames related q_counter
data structures to more generic counters to reflect q_counters and
congestion counters and maybe some other counters in the future.

New hardware counters:
 * rp_cnp_handled - CNP packets handled by the reaction point
 * rp_cnp_ignored - CNP packets ignored by the reaction point
 * np_cnp_sent    - CNP packets sent by notification point to respond to
                     CE marked RoCE packets
 * np_ecn_marked_roce_packets - CE marked RoCE packets received by
                                notification point

It also avoids returning ENOSYS which is specific for invalid
system call and produces the following checkpatch.pl warning.

WARNING: ENOSYS means 'invalid syscall nr' and nothing else
+		return -ENOSYS;

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:29:31 -04:00
Saeed Mahameed
258545449b net/mlx5e: IPoIB, Xmit flow
Implement mlx5e's IPoIB SKB transmit using the helper functions provided
by mlx5e ethernet tx flow, the only difference in the code between
mlx5e_xmit and mlx5i_xmit is that IPoIB has some extra fields to fill
(UD datagram segment) in the TX descriptor (WQE) and it doesn't need to
have any vlan handling.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:31 -04:00
Artemy Kovalyov
81713d3788 IB/mlx5: Add implicit MR support
Add implicit MR, covering entire user address space.
The MR is implemented as an indirect KSM MR consisting of
1GB direct MRs.
Pages and direct MRs are added/removed to MR by ODP.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:19 -05:00
Artemy Kovalyov
49780d42df IB/mlx5: Expose MR cache for mlx5_ib
Allow other parts of mlx5_ib to use MR cache mechanism.
* Add new functions mlx5_mr_cache_alloc and mlx5_mr_cache_free
* Traditional MTT MKey buckets are limited by MAX_UMR_CACHE_ENTRY
  Additinal buckets may be added above.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:18 -05:00
Noa Osherovich
e4cc4fa7cc IB/mlx5: Enable QP creation with cvlan offload
Enable creating a RAW Ethernet QP with cvlan stripping offload when
it's supported by the hardware.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:15 -05:00
Kamal Heib
7c16f47779 IB/mlx5: Expose Q counters groups only if they are supported by FW
This patch modify the Q counters implementation, so each one of the
three Q counters groups will be exposed by the driver only if they are
supported by the firmware.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:40:56 -05:00
Majd Dibbiny
ed88451e1f IB/mlx5: Assign DSCP for R-RoCE QPs Address Path
For Routable RoCE QPs, the DSCP should be set in the QP's
address path.

The DSCP's value is derived from the traffic class.

Fixes: 2811ba51b0 ("IB/mlx5: Add RoCE fields to Address Vector")
Cc: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 10:14:25 -05:00
Eli Cohen
b037c29a80 IB/mlx5: Allow future extension of libmlx5 input data
Current check requests that new fields in struct
mlx5_ib_alloc_ucontext_req_v2 that are not known to the driver be zero.
This was introduced so new libraries passing additional information to
the kernel through struct mlx5_ib_alloc_ucontext_req_v2 will be notified
by old kernels that do not support their request by failing the
operation. This schecme is problematic since it requires libmlx5 to issue
the requests with descending input size for struct
mlx5_ib_alloc_ucontext_req_v2.

To avoid this, we require that new features that will obey the following
rules:
If the feature requires one or more fields in the response and the at
least one of the fields can be encoded such that a zero value means the
kernel ignored the request then this field will provide the indication
to the library. If no response is required or if zero is a valid
response, a new field should be added that indicates to the library
whether its request was processed.

Fixes: b368d7cb8c ('IB/mlx5: Add hca_core_clock_offset to udata in init_ucontext')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-09 20:25:09 +02:00
Eli Cohen
5fe9dec0d0 IB/mlx5: Use blue flame register allocator in mlx5_ib
Make use of the blue flame registers allocator at mlx5_ib. Since blue
flame was not really supported we remove all the code that is related to
blue flame and we let all consumers to use the same blue flame register.
Once blue flame is supported we will add the code. As part of this patch
we also move the definition of struct mlx5_bf to mlx5_ib.h as it is only
used by mlx5_ib.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-09 20:25:08 +02:00
Eli Cohen
2f5ff26478 mlx5: Fix naming convention with respect to UARs
This establishes a solid naming conventions for UARs. A UAR (User Access
Region) can have size identical to a system page or can be fixed 4KB
depending on a value queried by firmware. Each UAR always has 4 blue
flame register which are used to post doorbell to send queue. In
addition, a UAR has section used for posting doorbells to CQs or EQs. In
this patch we change names to reflect this conventions.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-08 11:21:26 +02:00
Artemy Kovalyov
d9aaed8387 {net,IB}/mlx5: Refactor page fault handling
* Update page fault event according to last specification.
* Separate code path for page fault EQ, completion EQ and async EQ.
* Move page fault handling work queue from mlx5_ib static variable
  into mlx5_core page fault EQ.
* Allocate memory to store ODP event dynamically as the
  events arrive, since in atomic context - use mempool.
* Make mlx5_ib page fault handler run in process context.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02 15:51:20 -05:00
Artemy Kovalyov
7d0cc6edcc IB/mlx5: Add MR cache for large UMR regions
In this change we turn mlx5_ib_update_mtt() into generic
mlx5_ib_update_xlt() to perfrom HCA translation table modifiactions
supporting both atomic and process contexts and not limited by number
of modified entries.
Using this function we increase preallocated MRs up to 16GB.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02 15:51:20 -05:00
Artemy Kovalyov
c438fde1c2 IB/mlx5: Add support for big MRs
Make use of extended UMR translation offset.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02 15:51:20 -05:00
Artemy Kovalyov
3161625589 IB/mlx5: Refactor UMR post send format
* Update struct mlx5_wqe_umr_ctrl_seg.
* Currenlty UMR send_flags aim only certain use cases: enabled/disable
  cached MR, modifying XLT for ODP. By making flags independent make UMR
  more flexible allowing arbitrary manipulations.
* Since different UMR formats have different entry sizes UMR request
  should receive exact size of translation table update instead of
  number of entries. Rename field npages to xlt_size in struct mlx5_umr_wr
  and update relevant code accordingly.
* Add support of length64 bit.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02 15:51:20 -05:00
Linus Torvalds
4d5b57e05a Updates for 4.10 kernel merge window
- Shared mlx5 updates with net stack (will drop out on merge if Dave's
   tree has already been merged)
 - Driver updates: cxgb4, hfi1, hns-roce, i40iw, mlx4, mlx5, qedr, rxe
 - Debug cleanups
 - New connection rejection helpers
 - SRP updates
 - Various misc fixes
 - New paravirt driver from vmware
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYUbAPAAoJELgmozMOVy/dMXcP/iuG5MNzfN8Ny1JftyBQGWg3
 cqoQ2OLj9CsXjwVB+5EqbcZHRZY852lKONaLoDKkIOx4YAXO2YuIKOp944vN7EQx
 96wfqzT1F5jzAcy5mYZXgLaStGFDAwejKMqeHd0LfJj3OEtemGnVPWYzyqSQmSKo
 dzJraS1Z9GIRppzU5WaRpB9PtRBkqIqGJ5vZ0EKLGhed5hYY5r0iMJB0GfriMRDO
 lJ4UUVfpsAoLPnqDBFH6IMn2V2UeAw9IR5zNa1mrM1RBfvt/uYTxrw1w3p9WoaNs
 GRodhk4DCeAfeyqzVPNBLyXZ4Zq4FzGe3UWM4qysJ1RR4oFNw9Cuw0Fqk8mrfznr
 7hv5TpGIckRZiKf8l6e+qLirF0qGtXJg29j2vPVQI9i5nSj95g1agA81PnLQlLLb
 flWyxeMj81my7lfMHN1xcV6pqPEKMCOysZmfcvVfJd2XxpjuVD7ekl/YXWp8o8kU
 YPdQMqPD626XsD8VpPdMszb9FPmx0JD0HEv+Y1rIFX8JegEI+c3H2X0dqC27T/Ou
 FEPWOy025EgHm0Fh/7eIzkG6tjZ4JHoCugJAcxNZGj2XW4eB6r5vY8UwJ8iQRv+n
 PVYHiy0UoIRePh0mrdOSSphGZMi/GO/DsqKwCtAMEK43WqZQju6wR7QSIGkh66mp
 4uSHJqpf3YEYylxGMhk3
 =QeGy
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "This is the complete update for the rdma stack for this release cycle.

  Most of it is typical driver and core updates, but there is the
  entirely new VMWare pvrdma driver. You may have noticed that there
  were changes in DaveM's pull request to the bnxt Ethernet driver to
  support a RoCE RDMA driver. The bnxt_re driver was tentatively set to
  be pulled in this release cycle, but it simply wasn't ready in time
  and was dropped (a few review comments still to address, and some
  multi-arch build issues like prefetch() not working across all
  arches).

  Summary:

   - shared mlx5 updates with net stack (will drop out on merge if
     Dave's tree has already been merged)

   - driver updates: cxgb4, hfi1, hns-roce, i40iw, mlx4, mlx5, qedr, rxe

   - debug cleanups

   - new connection rejection helpers

   - SRP updates

   - various misc fixes

   - new paravirt driver from vmware"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (210 commits)
  IB: Add vmw_pvrdma driver
  IB/mlx4: fix improper return value
  IB/ocrdma: fix bad initialization
  infiniband: nes: return value of skb_linearize should be handled
  MAINTAINERS: Update Intel RDMA RNIC driver maintainers
  MAINTAINERS: Remove Mitesh Ahuja from emulex maintainers
  IB/core: fix unmap_sg argument
  qede: fix general protection fault may occur on probe
  IB/mthca: Replace pci_pool_alloc by pci_pool_zalloc
  mlx5, calc_sq_size(): Make a debug message more informative
  mlx5: Remove a set-but-not-used variable
  mlx5: Use { } instead of { 0 } to init struct
  IB/srp: Make writing the add_target sysfs attr interruptible
  IB/srp: Make mapping failures easier to debug
  IB/srp: Make login failures easier to debug
  IB/srp: Introduce a local variable in srp_add_one()
  IB/srp: Fix CONFIG_DYNAMIC_DEBUG=n build
  IB/multicast: Check ib_find_pkey() return value
  IPoIB: Avoid reading an uninitialized member variable
  IB/mad: Fix an array index check
  ...
2016-12-15 12:03:32 -08:00