linux

Author	SHA1	Message	Date
Leon Romanovsky	04b222f957	RDMA/mlx5: Remove IB representors dead code Delete dead code. Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2020-12-06 07:43:54 +02:00
Leon Romanovsky	e87114022e	net/mlx5: Simplify eswitch mode check Provide mlx5_core device instead of "priv" pointer while checking eswith mode. Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2020-12-06 07:43:54 +02:00
Leon Romanovsky	93f8244431	RDMA/mlx5: Convert mlx5_ib to use auxiliary bus The conversion to auxiliary bus solves long standing issue with existing mlx5_ib<->mlx5_core coupling. It required to have both modules in initramfs if one of them needed for the boot. Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2020-12-06 07:43:50 +02:00
Leon Romanovsky	2b1f747071	RDMA/core: Allow drivers to disable restrack DB Driver QP types are special case with no IBTA restrictions. For example, EFA implemented creation of this QP type as regular one, while mlx5 separated create to two step: create and modify. That separation causes to the situation where DC QP (mlx5) is always added to the same xarray index zero. This change allows to drivers like mlx5 simply disable restrack DB tracking, but it doesn't disable kref on the memory. Fixes: `52e0a118a2` ("RDMA/restrack: Track driver QP types in resource tracker") Link: https://lore.kernel.org/r/20201117070148.1974114-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-27 11:38:46 -04:00
Parav Pandit	7ec3df174f	RDMA/mlx5: Use PCI device for dma mappings DMA operation of the IB device is done using ib_device->dma_device. Instead of accessing parent of the IB device, use the PCI dma device which is setup to ib_device->dma_device during IB device registration. Link: https://lore.kernel.org/r/20201125064628.8431-1-leon@kernel.org Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-26 15:49:05 -04:00
Leon Romanovsky	d4b2d19dc5	RDMA/mlx5: Silence the overflow warning while building offset mask Coverity reports "Potentially overflowing expression ..." warning, which is correct thing to complain from the compiler point of view, but this is not possible in the current code. Still, this is a small error as there are some future situations that might need to use a 32 bit offset. Use ULL so the calculation works up to 63. Fixes: `b045db62f6` ("RDMA/mlx5: Use ib_umem_find_best_pgoff() for SRQ") Link: https://lore.kernel.org/r/20201125061704.6580-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-26 15:49:05 -04:00
Jason Gunthorpe	d0b7721c5e	RDMA/mlx5: Check for ERR_PTR from uverbs_zalloc() The return code from uverbs_zalloc() was wrongly checked, it is ERR_PTR not NULL like other allocators: drivers/infiniband/hw/mlx5/devx.c:2110 devx_umem_reg_cmd_alloc() warn: passing zero to 'PTR_ERR' Fixes: `878f7b31c3` ("RDMA/mlx5: Use ib_umem_find_best_pgsz() for devx") Link: https://lore.kernel.org/r/0-v1-4d05ccc1c223+173-devx_err_ptr_jgg@nvidia.com Reported-by: kernel test robot <lkp@intel.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-26 15:49:04 -04:00
Avihai Horon	f957d4d09a	RDMA/mlx5: Enable querying AH for XRC QP types Address handle is set for connected QP types such as RC and UC, and thus can also be queried. Since XRC QP types INI and TGT are connected, it should be possible to query their address handle as well. Until now it was not the case, and although the firmware supported it, the driver allowed querying the address handle only for RC and UC. Hence, we enable it now for INI and TGT QPs as well. Link: https://lore.kernel.org/r/20201115121425.139833-2-leon@kernel.org Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-26 11:58:19 -04:00
Gustavo A. R. Silva	808b2c925d	IB/mlx5: Fix fall-through warnings for Clang In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning by explicitly adding the new pseudo-keyword fallthrough; instead of letting the code fall through to the next case. Link: https://lore.kernel.org/r/2b0c87362bc86f6adfe56a5a6685837b71022bbf.1605896059.git.gustavoars@kernel.org Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-23 15:54:11 -04:00
Jason Gunthorpe	bf3b7b7ba9	Merge branch 'for-rc' into rdma.git From https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git The rc RDMA branch is needed due to dependencies on the next patches. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-17 15:20:26 -04:00
Jason Gunthorpe	8a7904a672	RDMA/mlx5: Lower setting the umem's PAS for SRQ Some of the SRQ types are created using a WQ, and the WQ requires a different parameter set to mlx5_umem_find_best_quantized_pgoff() as it has a 5 bit page_offset. Add the umem to the mlx5_srq_attr and defer computing the PAS data until the code has figured out what kind of mailbox to use. Compute the PAS directly from the umem for each of the four unique mailbox types. This also avoids allocating memory to store the user PAS, instead it is written directly to the mailbox as in most other cases. Fixes: `01949d0109` ("net/mlx5_core: Enable XRCs and SRQs when using ISSI > 0") Link: https://lore.kernel.org/r/20201115114311.136250-8-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-16 16:53:30 -04:00
Jason Gunthorpe	878f7b31c3	RDMA/mlx5: Use ib_umem_find_best_pgsz() for devx Since devx uses the new rdma_for_each_block() to fill the PAS it can also use ib_umem_find_best_pgsz(). However, the umem constructionin devx is complicated, the umem must still respect all the HW limits such as page_offset_quantized and the IOVA alignment. Since we don't know what the user intends to use the umem for we have to limit it to PAGE_SIZE. There are users trying to mix umem's with mkeys so this makes them work reliably, at least for an identity IOVA, by ensuring the IOVA matches the selected page size. Last user of mlx5_ib_get_buf_offset() so it can also be removed. Fixes: `aeae94579c` ("IB/mlx5: Add DEVX support for memory registration") Link: https://lore.kernel.org/r/20201115114311.136250-7-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-16 16:53:30 -04:00
Jason Gunthorpe	c08fbdc577	RDMA/mlx5: mlx5_umem_find_best_quantized_pgoff() for CQ This fixes a bug where the page_offset was not being considered when building a CQ. The HW specification says it 'must be zero', so use a variant of mlx5_umem_find_best_quantized_pgoff() with a 0 pgoff_bitmask to force this result. Fixes: `e126ba97db` ("mlx5: Add driver for Mellanox Connect-IB adapters") Link: https://lore.kernel.org/r/20201115114311.136250-6-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-16 16:53:30 -04:00
Jason Gunthorpe	a59b7b05ef	RDMA/mlx5: Use mlx5_umem_find_best_quantized_pgoff() for QP Delete custom logic in the QP in favor of more general variant. Link: https://lore.kernel.org/r/20201115114311.136250-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-16 16:53:30 -04:00
Jason Gunthorpe	7579dcdf73	RDMA/mlx5: Directly compute the PAS list for raw QP RQ's The RQ WQ created when making a raw ethernet QP copies the PAS list from a dummy QPC command created earlier in the flow. The WQC and QPC PAS lists are not fully compatible as the page_offset is a different size. Create the RQ WQ's PAS list directly and do not try to copy it from another command structure. Like the prior patch, this also means that badly aligned buffers were not correctly rejected. Link: https://lore.kernel.org/r/20201115114311.136250-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-16 16:53:29 -04:00
Jason Gunthorpe	ad480ea5d6	RDMA/mlx5: Use mlx5_umem_find_best_quantized_pgoff() for WQ This fixes a subtle bug, the WQ mailbox has only 5 bits to describe the page_offset, while mlx5_ib_get_buf_offset() is hard wired to only work with 6 bit page_offsets. Thus it did not properly reject badly aligned buffers. Fixes: `79b20a6c30` ("IB/mlx5: Add receive Work Queue verbs") Fixes: `0fb2ed66a1` ("IB/mlx5: Add create and destroy functionality for Raw Packet QP") Link: https://lore.kernel.org/r/20201115114311.136250-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-16 16:53:29 -04:00
Jason Gunthorpe	b045db62f6	RDMA/mlx5: Use ib_umem_find_best_pgoff() for SRQ SRQ uses a quantized and scaled page_offset, which is another variation of ib_umem_find_best_pgsz(). Add mlx5_umem_find_best_quantized_pgoff() to perform this calculation for each mailbox. A macro shows how the calculation is directly connected to the mailbox format. This new routine replaces the limited mlx5_ib_cont_pages() and mlx5_ib_get_buf_offset() pairing which would reject valid configurations rather than adjust the page_size to make it work. In turn this is much more aggressive about choosing large page sizes for these objects and when THP is enabled it will now often find a single page solution. Link: https://lore.kernel.org/r/20201115114311.136250-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-16 16:53:29 -04:00
Leon Romanovsky	c5633a72a1	RDMA/core: Make FD destroy callback void All FD object destroy implementations return 0, so declare this callback void. Link: https://lore.kernel.org/r/20201104144556.3809085-3-leon@kernel.org Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-12 12:32:17 -04:00
Leon Romanovsky	efa968ee20	RDMA/core: Postpone uobject cleanup on failure till FD close Remove the ib_is_destroyable_retryable() concept. The idea here was to allow the drivers to forcibly clean the HW object even if they otherwise didn't want to (eg because of usecnt). This was an attempt to clean up in a world where drivers were not allowed to fail HW object destruction. Now that we are going back to allowing HW objects to fail destroy this doesn't make sense. Instead if a uobject's HW object can't be destroyed it is left on the uobject list and it is up to uverbs_destroy_ufile_hw() to clean it. Multiple passes over the uobject list allow hidden dependencies to be resolved. If that fails the HW driver is broken, throw a WARN_ON and leak the HW object memory. All the other tricky failure paths (eg on creation error unwind) have already been updated to this new model. Link: https://lore.kernel.org/r/20201104144556.3809085-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-12 12:32:17 -04:00
Meir Lichtinger	f946e45f59	IB/mlx5: Add support for NDR link speed The IBTA specification has new speed - NDR. That speed supports signaling rate of 100Gb. mlx5 IB driver translates link modes reported by ConnectX device to IB speed and width. Added translation of new 100Gb, 200Gb and 400Gb link modes to NDR IB type and width of x1, x2 or x4 respectively. Link: https://lore.kernel.org/r/20201026133738.1340432-3-leon@kernel.org Signed-off-by: Meir Lichtinger <meirl@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-02 15:48:57 -04:00
Jason Gunthorpe	d5c7916fe4	RDMA/mlx5: Use ib_umem_find_best_pgsz() for mkc's Now that all the PAS arrays or UMR XLT's for mkcs are filled using rdma_for_each_block() we can use the common ib_umem_find_best_pgsz() algorithm. Link: https://lore.kernel.org/r/20201026132314.1336717-6-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-02 15:10:50 -04:00
Jason Gunthorpe	f1eaac37da	RDMA/mlx5: Split mlx5_ib_update_xlt() into ODP and non-ODP cases Mixing these together is just a mess, make a dedicated version, mlx5_ib_update_mr_pas(), which directly loads the whole MTT for a non-ODP MR. The split out version can trivially use a simple loop with rdma_for_each_block() which allows using the core code to compute the MR pages and avoids seeking in the SGL list after each chunk as the __mlx5_ib_populate_pas() call required. Significantly speeds loading large MTTs. Link: https://lore.kernel.org/r/20201026132314.1336717-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-02 15:10:50 -04:00
Jason Gunthorpe	8010d74b99	RDMA/mlx5: Split the WR setup out of mlx5_ib_update_xlt() The memory allocation is quite complicated, and makes this function hard to understand. Refactor things so that a function call sets up the WR, SG, DMA mapping and buffer, further splitting that into buffer and DMA/wr. This also slightly changes the buffer allocation logic to try an order 0 page allocation (with OOM warnings on) before going to the emergency page. Link: https://lore.kernel.org/r/20201026132314.1336717-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-02 14:53:55 -04:00
Jason Gunthorpe	f22c30aa6d	RDMA/mlx5: Move xlt_emergency_page_mutex into mr.c This is the only user, so remove the wrappers. Link: https://lore.kernel.org/r/20201026132314.1336717-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-02 14:53:54 -04:00
Jason Gunthorpe	aab8d3966d	RDMA/mlx5: Change mlx5_ib_populate_pas() to use rdma_for_each_block() This routine converts the umem SGL into a list of fixed pages for DMA, which is exactly what rdma_umem_for_each_dma_block() is for, use the common code directly. Link: https://lore.kernel.org/r/20201026132314.1336717-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-02 14:53:54 -04:00
Jason Gunthorpe	f8fb311063	RDMA/mlx5: Remove npages from mlx5_ib_cont_pages() Most callers don't need this, and the few that do can get it as ib_umem_num_pages(umem). Link: https://lore.kernel.org/r/20201026131936.1335664-8-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-02 14:52:26 -04:00
Jason Gunthorpe	7db0eea916	RDMA/mlx5: Remove ncont from mlx5_ib_cont_pages() This is the same as ib_umem_num_dma_blocks(umem, 1UL << page_shift), have the callers compute it directly. Link: https://lore.kernel.org/r/20201026131936.1335664-7-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-02 14:52:26 -04:00
Jason Gunthorpe	95741ee3f0	RDMA/mlx5: Remove order from mlx5_ib_cont_pages() Only alloc_mr_from_cache() needs order and can trivially compute it, so lift it to the one call site and remove the NULL arguments. Link: https://lore.kernel.org/r/20201026131936.1335664-6-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-02 14:52:26 -04:00
Jason Gunthorpe	f0093fb1a7	RDMA/mlx5: Move mlx5_ib_cont_pages() to the creation of the mlx5_ib_mr For the user MR path, instead of calling this after getting the umem, call it as part of creating the struct mlx5_ib_mr and distill its output to a single page_shift stored inside the mr. This avoids passing around the tuple of its output. Based on the umem and page_shift, the output arguments can be computed using: count == ib_umem_num_pages(mr->umem) shift == mr->page_shift ncont == ib_umem_num_dma_blocks(mr->umem, 1 << mr->page_shift) order == order_base_2(ncont) And since mr->page_shift == umem_odp->page_shift then ncont == ib_umem_num_dma_blocks() == ib_umem_odp_num_pages() for ODP umems. Link: https://lore.kernel.org/r/20201026131936.1335664-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-02 14:52:26 -04:00
Jason Gunthorpe	1c3d247eee	RDMA/mlx5: Remove mlx5_ib_mr->npages This is the same value as ib_umem_num_pages(mr->umem), use that instead. Link: https://lore.kernel.org/r/20201026131936.1335664-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-02 14:52:26 -04:00
Jason Gunthorpe	fc3325701a	RDMA/mlx5: Fix corruption of reg_pages in mlx5_ib_rereg_user_mr() reg_pages should always contain mr->npage since when the mr is finally de-reg'd it is always subtracted out. If there were any error exits then mlx5_ib_rereg_user_mr() would leave the reg_pages adjusted and this will cause it to be double subtracted eventually. The manipulation of reg_pages is inherently connected to the umem, so lift it out of set_mr_fields() and only adjust it around creating/destroying a umem. reg_pages is only used for diagnostics in sysfs. Fixes: `7d0cc6edcc` ("IB/mlx5: Add MR cache for large UMR regions") Link: https://lore.kernel.org/r/20201026131936.1335664-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-02 14:31:40 -04:00
Jason Gunthorpe	b4d031cdae	RDMA/mlx5: Remove mlx5_ib_mr->order The is only ever set to non-zero if the MR is from the cache, and if it is cached then the order is in cached_ent->order. Make it clearer that use_umr_mtt_update() only returns true for cached MRs and remove the redundant data. Link: https://lore.kernel.org/r/20201026131936.1335664-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-02 14:31:40 -04:00
Joe Perches	1c7fd72687	RDMA: Convert sysfs device * show functions to use sysfs_emit() Done with cocci script: @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device dev, struct device_attribute attr, char buf) { <... return - sprintf(buf, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device dev, struct device_attribute attr, char buf) { <... return - snprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device dev, struct device_attribute attr, char buf) { <... return - scnprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; expression chr; @@ ssize_t d_show(struct device dev, struct device_attribute attr, char buf) { <... return - strcpy(buf, chr); + sysfs_emit(buf, chr); ...> } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device dev, struct device_attribute attr, char buf) { <... len = - sprintf(buf, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device dev, struct device_attribute attr, char buf) { <... len = - snprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device dev, struct device_attribute attr, char buf) { <... len = - scnprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device dev, struct device_attribute attr, char buf) { <... - len += scnprintf(buf + len, PAGE_SIZE - len, + len += sysfs_emit_at(buf, len, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; expression chr; @@ ssize_t d_show(struct device dev, struct device_attribute attr, char *buf) { ... - strcpy(buf, chr); - return strlen(buf); + return sysfs_emit(buf, chr); } Link: https://lore.kernel.org/r/7f406fa8e3aa2552c022bec680f621e38d1fe414.1602122879.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-10-26 19:53:21 -03:00
Jason Gunthorpe	676a80adba	RDMA: Remove AH from uverbs_cmd_mask Drivers that need a uverbs AH should instead set the create_user_ah() op similar to reg_user_mr(). MODIFY_AH and QUERY_AH cmds were never implemented so are just deleted. Link: https://lore.kernel.org/r/11-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-10-26 19:28:00 -03:00
Jason Gunthorpe	1f11a7610e	RDMA: Check create_flags during create_qp Each driver should check that the QP attrs create_flags is supported. Unfortuantely when create_flags was added to the QP attrs the drivers were not updated. uverbs_ex_cmd_mask was used to block it - even though kernel drivers use these flags too. Check that flags is zero in all drivers that don't use it, remove IB_USER_VERBS_EX_CMD_CREATE_QP from uverbs_ex_cmd_mask. Fix the error code to be EOPNOTSUPP. Link: https://lore.kernel.org/r/8-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-10-26 19:27:59 -03:00
Jason Gunthorpe	1c407cb5d7	RDMA: Check flags during create_cq Each driver should check that the CQ attrs is supported. Unfortuantely when flags was added to the CQ attrs the drivers were not updated, uverbs_ex_cmd_mask was used to block it. This was missed when create CQ was converted to ioctl, so non-zero flags could have been passed into drivers. Check that flags is zero in all drivers that don't use it, remove IB_USER_VERBS_EX_CMD_CREATE_CQ from uverbs_ex_cmd_mask. Fixes: `41b2a71fc8` ("IB/uverbs: Move ioctl path of create_cq and destroy_cq to a new file") Link: https://lore.kernel.org/r/7-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-10-26 19:27:59 -03:00
Jason Gunthorpe	26e990badd	RDMA: Check attr_mask during modify_qp Each driver should check that it can support the provided attr_mask during modify_qp. IB_USER_VERBS_EX_CMD_MODIFY_QP was being used to block modify_qp_ex because the driver didn't check RATE_LIMIT. Link: https://lore.kernel.org/r/6-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-10-26 19:27:58 -03:00
Jason Gunthorpe	652caba5b5	RDMA: Check srq_type during create_srq uverbs was blocking srq_types the driver doesn't support based on the CREATE_XSRQ cmd_mask. Fix all drivers to check for supported srq_types during create_srq and move CREATE_XSRQ to the core code. Link: https://lore.kernel.org/r/5-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-10-26 19:27:58 -03:00
Jason Gunthorpe	44ce37bc8b	RDMA: Move more uverbs_cmd_mask settings to the core These functions all depend on the driver providing a specific op: - REREG_MR is rereg_user_mr(). bnxt_re set this without providing the op - ATTACH/DEATCH_MCAST is attach_mcast()/detach_mcast(). usnic set this without providing the op - OPEN_QP doesn't involve the driver but requires a XRCD. qedr provides xrcd but forgot to set it, usnic doesn't provide XRCD but set it anyhow. - OPEN/CLOSE_XRCD are the ops alloc_xrcd()/dealloc_xrcd() - CREATE_SRQ/DESTROY_SRQ are the ops create_srq()/destroy_srq() - QUERY/MODIFY_SRQ is op query_srq()/modify_srq(). hns sets this but sometimes supplies a NULL op. - RESIZE_CQ is op resize_cq(). bnxt_re sets this boes doesn't supply an op - ALLOC/DEALLOC_MW is alloc_mw()/dealloc_mw(). cxgb4 provided an (now deleted) implementation but no userspace All drivers were checked that no drivers provide the op without also setting uverbs_cmd_mask so this should have no functional change. Link: https://lore.kernel.org/r/4-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-10-26 19:27:58 -03:00
Jason Gunthorpe	c074bb1e30	RDMA: Remove elements in uverbs_cmd_mask that all drivers set This is a step toward eliminating uverbs_cmd_mask. Preset this list in the core code. Only the op reg_user_mr wasn't already being required from the drivers. Link: https://lore.kernel.org/r/3-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-10-26 19:27:57 -03:00
Jason Gunthorpe	b8e3130dd9	RDMA: Remove uverbs_ex_cmd_mask values that are linked to functions Since a while now the uverbs layer checks if the driver implements a function before allowing the ucmd to proceed. This largely obsoletes the cmd_mask stuff, but there is some tricky bits in drivers preventing it from being removed. Remove the easy elements of uverbs_ex_cmd_mask by pre-setting them in the core code. These are triggered soley based on the related ops function pointer. query_device_ex is not triggered based on an op, but all drivers already implement something compatible with the extension, so enable it globally too. Link: https://lore.kernel.org/r/2-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-10-26 19:27:56 -03:00
Parav Pandit	fbdd0049d9	RDMA/mlx5: Fix devlink deadlock on net namespace deletion When a mlx5 core devlink instance is reloaded in different net namespace, its associated IB device is deleted and recreated. Example sequence is: $ ip netns add foo $ devlink dev reload pci/0000:00:08.0 netns foo $ ip netns del foo mlx5 IB device needs to attach and detach the netdevice to it through the netdev notifier chain during load and unload sequence. A below call graph of the unload flow. cleanup_net() down_read(&pernet_ops_rwsem); <- first sem acquired ops_pre_exit_list() pre_exit() devlink_pernet_pre_exit() devlink_reload() mlx5_devlink_reload_down() mlx5_unload_one() [...] mlx5_ib_remove() mlx5_ib_unbind_slave_port() mlx5_remove_netdev_notifier() unregister_netdevice_notifier() down_write(&pernet_ops_rwsem);<- recurrsive lock Hence, when net namespace is deleted, mlx5 reload results in deadlock. When deadlock occurs, devlink mutex is also held. This not only deadlocks the mlx5 device under reload, but all the processes which attempt to access unrelated devlink devices are deadlocked. Hence, fix this by mlx5 ib driver to register for per net netdev notifier instead of global one, which operats on the net namespace without holding the pernet_ops_rwsem. Fixes: `4383cfcc65` ("net/mlx5: Add devlink reload") Link: https://lore.kernel.org/r/20201026134359.23150-1-parav@nvidia.com Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-10-26 19:18:19 -03:00
Jason Gunthorpe	e0477b34d9	RDMA: Explicitly pass in the dma_device to ib_register_device The code in setup_dma_device has become rather convoluted, move all of this to the drivers. Drives now pass in a DMA capable struct device which will be used to setup DMA, or drivers must fully configure the ibdev for DMA and pass in NULL. Other than setting the masks in rvt all drivers were doing this already anyhow. mthca, mlx4 and mlx5 were already setting up maximum DMA segment size for DMA based on their hardweare limits in: __mthca_init_one() dma_set_max_seg_size (1G) __mlx4_init_one() dma_set_max_seg_size (1G) mlx5_pci_init() set_dma_caps() dma_set_max_seg_size (2G) Other non software drivers (except usnic) were extended to UINT_MAX [1, 2] instead of 2G as was before. [1] https://lore.kernel.org/linux-rdma/20200924114940.GE9475@nvidia.com/ [2] https://lore.kernel.org/linux-rdma/20200924114940.GE9475@nvidia.com/ Link: https://lore.kernel.org/r/20201008082752.275846-1-leon@kernel.org Link: https://lore.kernel.org/r/6b2ed339933d066622d5715903870676d8cc523a.1602590106.git.mchehab+huawei@kernel.org Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-10-16 13:53:46 -03:00
Avihai Horon	1c15b4f2a4	RDMA/core: Modify enum ib_gid_type and enum rdma_network_type Separate IB_GID_TYPE_IB and IB_GID_TYPE_ROCE to two different values, so enum ib_gid_type will match the gid types of the new query GID table API which will be introduced in the following patches. This change in enum ib_gid_type requires to change also enum rdma_network_type by separating RDMA_NETWORK_IB and RDMA_NETWORK_ROCE_V1 values. Link: https://lore.kernel.org/r/20200923165015.2491894-3-leon@kernel.org Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-10-01 21:20:11 -03:00
Yishai Hadas	a03bfc37d5	RDMA/mlx5: Sync device with CPU pages upon ODP MR registration Sync device with CPU pages upon ODP MR registration. mlx5 already has to zero the HW's version of the PAS list, may as well deliver a PAS list that matches the current CPU page tables configuration. Link: https://lore.kernel.org/r/20200930163828.1336747-5-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-10-01 16:44:44 -03:00
Yishai Hadas	677cf51f71	RDMA/mlx5: Extend advice MR to support non faulting mode Extend advice MR to support non faulting mode, this can improve performance by increasing the populated page tables in the device. Link: https://lore.kernel.org/r/20200930163828.1336747-4-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-10-01 16:44:05 -03:00
Yishai Hadas	8bfafde086	IB/core: Enable ODP sync without faulting Enable ODP sync without faulting, this improves performance by reducing the number of page faults in the system. The gain from this option is that the device page table can be aligned with the presented pages in the CPU page table without causing page faults. As of that, the overhead on data path from hardware point of view to trigger a fault which end-up by calling the driver to bring the pages will be dropped. Link: https://lore.kernel.org/r/20200930163828.1336747-3-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-10-01 16:44:05 -03:00
Yishai Hadas	36f30e486d	IB/core: Improve ODP to use hmm_range_fault() Move to use hmm_range_fault() instead of get_user_pags_remote() to improve performance in a few aspects: This includes: - Dropping the need to allocate and free memory to hold its output - No need any more to use put_page() to unpin the pages - The logic to detect contiguous pages is done based on the returned order, no need to run per page and evaluate. In addition, moving to use hmm_range_fault() enables to reduce page faults in the system with it's snapshot mode, this will be introduced in next patches from this series. As part of this, cleanup some flows and use the required data structures to work with hmm_range_fault(). Link: https://lore.kernel.org/r/20200930163828.1336747-2-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-10-01 16:39:54 -03:00
Leon Romanovsky	b925c555a1	RDMA/drivers: Remove udata check from special QP GSI QP can't be created from the user space, hence the udata check is always false (udata == NULL). Remove that check and simplify the flow. Link: https://lore.kernel.org/r/20200926102450.2966017-9-leon@kernel.org Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-09-29 13:11:06 -03:00
Leon Romanovsky	eebe580feb	RDMA/mlx5: Delete not needed GSI QP signal QP type GSI QP doesn't need signal QP type because it is initialized statically to zero, which is IB_SIGNAL_ALL_WR also wr->send_flags isn't set too. This means that the GSI QP signal QP type can be removed. Link: https://lore.kernel.org/r/20200926102450.2966017-5-leon@kernel.org Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-09-29 13:09:49 -03:00

... 3 4 5 6 7 ...

1659 Commits