kmap() is being deprecated and will break uses of device dax after PKS
protection is introduced.[1]
The use of kmap() in siw_tx_hdt() is all thread local therefore
kmap_local_page() is a sufficient replacement and will work with pgmap
protected pages when those are implemented.
siw_tx_hdt() tracks pages used in a page_array. It uses that array to
unmap pages which were mapped on function exit. Not all entries in the
array are mapped and this is tracked in kmap_mask.
kunmap_local() takes a mapped address rather than a page. Alter
siw_unmap_pages() to take the iov array to reuse the iov_base address of
each mapping. Use PAGE_MASK to get the proper address for kunmap_local().
kmap_local_page() mappings are tracked in a stack and must be unmapped in
the opposite order they were mapped in. Because segments are mapped into
the page array in increasing index order, modify siw_unmap_pages() to
unmap pages in decreasing order.
Use kmap_local_page() instead of kmap() to map pages in the page_array.
[1] https://lore.kernel.org/lkml/20201009195033.3208459-59-ira.weiny@intel.com/
Link: https://lore.kernel.org/r/20210624174814.2822896-1-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
kmap() is being deprecated and will break uses of device dax after PKS
protection is introduced.[1]
These uses of kmap() in the SIW driver are thread local. Therefore
kmap_local_page() is sufficient to use and will work with pgmap protected
pages when those are implemnted.
There is one more use of kmap() in this driver which is split into its own
patch because kmap_local_page() has strict ordering rules and the use of
the kmap_mask over multiple segments must be handled carefully.
Therefore, that conversion is handled in a stand alone patch.
Use kmap_local_page() instead of kmap() in the 'easy' cases.
[1] https://lore.kernel.org/lkml/20201009195033.3208459-59-ira.weiny@intel.com/
Link: https://lore.kernel.org/r/20210622061422.2633501-4-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
On idle session, because we do not do signal for heartbeat, it will
overflow the send queue after sometime.
To avoid that, we need to enable the signal for heartbeat. To do that, add
a new member signal_interval in rtrs_path, which will set min of
queue_depth and SERVICE_CON_QUEUE_DEPTH, and track it for both heartbeat
and IO, so the sq queue full accounting is correct.
Fixes: b38041d50a ("RDMA/rtrs: Do not signal for heatbeat")
Link: https://lore.kernel.org/r/20210712060750.16494-4-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Pull tracing updates from Steven Rostedt:
- Added option for per CPU threads to the hwlat tracer
- Have hwlat tracer handle hotplug CPUs
- New tracer: osnoise, that detects latency caused by interrupts,
softirqs and scheduling of other tasks.
- Added timerlat tracer that creates a thread and measures in detail
what sources of latency it has for wake ups.
- Removed the "success" field of the sched_wakeup trace event. This has
been hardcoded as "1" since 2015, no tooling should be looking at it
now. If one exists, we can revert this commit, fix that tool and try
to remove it again in the future.
- tgid mapping fixed to handle more than PID_MAX_DEFAULT pids/tgids.
- New boot command line option "tp_printk_stop", as tp_printk causes
trace events to write to console. When user space starts, this can
easily live lock the system. Having a boot option to stop just after
boot up is useful to prevent that from happening.
- Have ftrace_dump_on_oops boot command line option take numbers that
match the numbers shown in /proc/sys/kernel/ftrace_dump_on_oops.
- Bootconfig clean ups, fixes and enhancements.
- New ktest script that tests bootconfig options.
- Add tracepoint_probe_register_may_exist() to register a tracepoint
without triggering a WARN*() if it already exists. BPF has a path
from user space that can do this. All other paths are considered a
bug.
- Small clean ups and fixes
* tag 'trace-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (49 commits)
tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT
tracing: Simplify & fix saved_tgids logic
treewide: Add missing semicolons to __assign_str uses
tracing: Change variable type as bool for clean-up
trace/timerlat: Fix indentation on timerlat_main()
trace/osnoise: Make 'noise' variable s64 in run_osnoise()
tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing
tracing: Fix spelling in osnoise tracer "interferences" -> "interference"
Documentation: Fix a typo on trace/osnoise-tracer
trace/osnoise: Fix return value on osnoise_init_hotplug_support
trace/osnoise: Make interval u64 on osnoise_main
trace/osnoise: Fix 'no previous prototype' warnings
tracing: Have osnoise_main() add a quiescent state for task rcu
seq_buf: Make trace_seq_putmem_hex() support data longer than 8
seq_buf: Fix overflow in seq_buf_putmem_hex()
trace/osnoise: Support hotplug operations
trace/hwlat: Support hotplug operations
trace/hwlat: Protect kdata->kthread with get/put_online_cpus
trace: Add timerlat tracer
trace: Add osnoise tracer
...
Pull SCSI updates from James Bottomley:
"This series consists of the usual driver updates (ufs, ibmvfc,
megaraid_sas, lpfc, elx, mpi3mr, qedi, iscsi, storvsc, mpt3sas) with
elx and mpi3mr being new drivers.
The major core change is a rework to drop the status byte handling
macros and the old bit shifted definitions and the rest of the updates
are minor fixes"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (287 commits)
scsi: aha1740: Avoid over-read of sense buffer
scsi: arcmsr: Avoid over-read of sense buffer
scsi: ips: Avoid over-read of sense buffer
scsi: ufs: ufs-mediatek: Add missing of_node_put() in ufs_mtk_probe()
scsi: elx: libefc: Fix IRQ restore in efc_domain_dispatch_frame()
scsi: elx: libefc: Fix less than zero comparison of a unsigned int
scsi: elx: efct: Fix pointer error checking in debugfs init
scsi: elx: efct: Fix is_originator return code type
scsi: elx: efct: Fix link error for _bad_cmpxchg
scsi: elx: efct: Eliminate unnecessary boolean check in efct_hw_command_cancel()
scsi: elx: efct: Do not use id uninitialized in efct_lio_setup_session()
scsi: elx: efct: Fix error handling in efct_hw_init()
scsi: elx: efct: Remove redundant initialization of variable lun
scsi: elx: efct: Fix spelling mistake "Unexected" -> "Unexpected"
scsi: lpfc: Fix build error in lpfc_scsi.c
scsi: target: iscsi: Remove redundant continue statement
scsi: qla4xxx: Remove redundant continue statement
scsi: ppa: Switch to use module_parport_driver()
scsi: imm: Switch to use module_parport_driver()
scsi: mpt3sas: Fix error return value in _scsih_expander_add()
...
Pull rdma updates from Jason Gunthorpe:
"This contains a replacement driver for Intel iWarp hardware. This new
driver supports the old ethernet hardware and also newer chips that
can do ROCE.
Other than that, this contains the typical mix of patches:
- Driver updates and cleanups for bnxt_re, cxgb4, mlx4, and mlx5
- Many static checker driven code clean ups, including a wide
refcount_t conversion
- Several series for the hns driver, more HIP09 HW capabilities,
migration to new HW register manipulators, and code cleanups
- Minor fixes and improvements in srp, rts, and cm
- Improvements throughout for sysfs related code to use
DEVICE_ATTR_*, make the ib_port sysfs first-class, and overall use
sysfs APIs properly
- Intel's new irdma driver replacing i40iw
- rxe general clean ups and Memory Window support"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (211 commits)
RDMA/core: Always release restrack object
RDMA/mlx5: Don't access NULL-cleared mpi pointer
RDMA/irdma: Fix potential overflow expression in irdma_prm_get_pbles
RDMA/irdma: Check contents of user-space irdma_mem_reg_req object
RDMA/rxe: Missing unlock on error in get_srq_wqe()
RDMA/cma: Fix rdma_resolve_route() memory leak
RDMA/core/sa_query: Remove unused argument
RDMA/cma: Fix incorrect Packet Lifetime calculation
RDMA/cma: Protect RMW with qp_mutex
RDMA/cma: Remove unnecessary INIT->INIT transition
RDMA/hns: Add window selection field of congestion control
RDMA/hfi1: Remove use of kmap()
RDMA/irdma: Remove use of kmap()
RDMA/bnxt_re: Fix uninitialized struct bit field rsvd1
IB/isert: Align target max I/O size to initiator size
RDMA/hns: Fix incorrect vlan enable bit in QPC
MAINTAINERS: Update Broadcom RDMA maintainers
RDMA/irdma: Use the queried port attributes
RDMA/rxe: Fix redundant skb_put_zero
RDMA/rxe: Fix extra copy in prepare_ack_packet
...
Pull drm updates from Dave Airlie:
"Highlights:
- AMD enables two more GPUs, with resulting header files
- i915 has started to move to TTM for discrete GPU and enable DG1
discrete GPU support (not by default yet)
- new HyperV drm driver
- vmwgfx adds arm64 support
- TTM refactoring ongoing
- 16bpc display support for AMD hw
Otherwise it's just the usual insane amounts of work all over the
place in lots of drivers and the core, as mostly summarised below:
Core:
- mark AGP ioctls as legacy
- disable force probing for non-master clients
- HDR metadata property helpers
- HDMI infoframe signal colorimetry support
- remove drm_device.pdev pointer
- remove DRM_KMS_FB_HELPER config option
- remove drm_pci_alloc/free
- drm_err_*/drm_dbg_* helpers
- use drm driver names for fbdev
- leaked DMA handle fix
- 16bpc fixed point format fourcc
- add prefetching memcpy for WC
- Documentation fixes
aperture:
- add aperture ownership helpers
dp:
- aux fixes
- downstream 0 port handling
- use extended base receiver capability DPCD
- Rename DP_PSR_SELECTIVE_UPDATE to better mach eDP spec
- mst: use khz as link rate during init
- VCPI fixes for StarTech hub
ttm:
- provide tt_shrink file via debugfs
- warn about freeing pinned BOs
- fix swapping error handling
- move page alignment into BO
- cleanup ttm_agp_backend
- add ttm_sys_manager
- don't override vm_ops
- ttm_bo_mmap removed
- make ttm_resource base of all managers
- remove VM_MIXEDMAP usage
panel:
- sysfs_emit support
- simple: runtime PM support
- simple: power up panel when reading EDID + caching
bridge:
- MHDP8546: HDCP support + DT bindings
- MHDP8546: Register DP AUX channel with userspace
- TI SN65DSI83 + SN65DSI84: add driver
- Sil8620: Fix module dependencies
- dw-hdmi: make CEC driver loading optional
- Ti-sn65dsi86: refclk fixes, subdrivers, runtime pm
- It66121: Add driver + DT bindings
- Adv7511: Support I2S IEC958 encoding
- Anx7625: fix power-on delay
- Nwi-dsi: Modesetting fixes; Cleanups
- lt6911: add missing MODULE_DEVICE_TABLE
- cdns: fix PM reference leak
hyperv:
- add new DRM driver for HyperV graphics
efifb:
- non-PCI device handling fixes
i915:
- refactor IP/device versioning
- XeLPD Display IP preperation work
- ADL-P enablement patches
- DG1 uAPI behind BROKEN
- disable mmap ioctl for discerte GPUs
- start enabling HuC loading for Gen12+
- major GuC backend rework for new platforms
- initial TTM support for Discrete GPUs
- locking rework for TTM prep
- use correct max source link rate for eDP
- %p4cc format printing
- GLK display fixes
- VLV DSI panel power fixes
- PSR2 disabled for RKL and ADL-S
- ACPI _DSM invalid access fixed
- DMC FW path abstraction
- ADL-S PCI ID update
- uAPI headers converted to kerneldoc
- initial LMEM support for DG1
- x86/gpu: add Jasperlake to gen11 early quirks
amdgpu:
- Aldebaran updates + initial SR-IOV
- new GPU: Beige Goby and Yellow Carp support
- more LTTPR display work
- Vangogh updates
- SDMA 5.x GCR fixes
- PCIe ASPM support
- Renoir TMZ enablement
- initial multiple eDP panel support
- use fdinfo to track devices/process info
- pin/unpin TTM fixes
- free resource on fence usage query
- fix fence calculation
- fix hotunplug/suspend issues
- GC/MM register access macro cleanup for SR-IOV
- W=1 fixes
- ACPI ATCS/ATIF handling rework
- 16bpc fixed point format support
- Initial smartshift support
- RV/PCO power tuning fixes
- new INFO query for additional vbios info
amdkfd:
- SR-IOV aldebaran support
- HMM SVM support
radeon:
- SMU regression fixes
- Oland flickering fix
vmwgfx:
- enable console with fbdev emulation
- fix cpu updates of coherent multisample surfaces
- remove reservation semaphore
- add initial SVGA3 support
- support arm64
msm:
- devcoredump support for display errors
- dpu/dsi: yaml bindings conversion
- mdp5: alpha/blend_mode/zpos support
- a6xx: cached coherent buffer support
- gpu iova fault improvement
- a660 support
rockchip:
- RK3036 win1 scaling support
- RK3066/3188 missing register support
- RK3036/3066/3126/3188 alpha support
mediatek:
- MT8167 HDMI support
- MT8183 DPI dual edge support
tegra:
- fixed YUV support/scaling on Tegra186+
ast:
- use pcim_iomap
- fix DP501 EDID
bochs:
- screen blanking support
etnaviv:
- export more GPU ID values to userspace
- add HWDB entry for GPU on i.MX8MP
- rework linear window calcs
exynos:
- pm runtime changes
imx:
- Annotate dma_fence critical section
- fix PRG modifiers after drmm conversion
- Add 8 pixel alignment fix for 1366x768
- fix YUV advertising
- add color properties
ingenic:
- IPU planes fix
panfrost:
- Mediatek MT8183 support + DT bindings
- export AFBC_FEATURES register to userspace
simpledrm:
- %pr for printing resources
nouveau:
- pin/unpin TTM fixes
qxl:
- unpin shadow BO
virtio:
- create dumb BOs as guest blob
vkms:
- drmm_universal_plane_alloc
- add XRGB plane composition
- overlay support"
* tag 'drm-next-2021-07-01' of git://anongit.freedesktop.org/drm/drm: (1570 commits)
drm/i915: Reinstate the mmap ioctl for some platforms
drm/i915/dsc: abstract helpers to get bigjoiner primary/secondary crtc
Revert "drm/msm/mdp5: provide dynamic bandwidth management"
drm/msm/mdp5: provide dynamic bandwidth management
drm/msm/mdp5: add perf blocks for holding fudge factors
drm/msm/mdp5: switch to standard zpos property
drm/msm/mdp5: add support for alpha/blend_mode properties
drm/msm/mdp5: use drm_plane_state for pixel blend mode
drm/msm/mdp5: use drm_plane_state for storing alpha value
drm/msm/mdp5: use drm atomic helpers to handle base drm plane state
drm/msm/dsi: do not enable PHYs when called for the slave DSI interface
drm/msm: Add debugfs to trigger shrinker
drm/msm/dpu: Avoid ABBA deadlock between IRQ modules
drm/msm: devcoredump iommu fault support
iommu/arm-smmu-qcom: Add stall support
drm/msm: Improve the a6xx page fault handler
iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault info
iommu/arm-smmu: Add support for driver IOMMU fault handlers
drm/msm: export hangcheck_period in debugfs
drm/msm/a6xx: add support for Adreno 660 GPU
...
Change location of rdma_restrack_del() to fix the bug where
task_struct was acquired but not released, causing to resource leak.
ucma_create_id() {
ucma_alloc_ctx();
rdma_create_user_id() {
rdma_restrack_new();
rdma_restrack_set_name() {
rdma_restrack_attach_task.part.0(); <--- task_struct was gotten
}
}
ucma_destroy_private_ctx() {
ucma_put_ctx();
rdma_destroy_id() {
_destroy_id() <--- id_priv was freed
}
}
}
Fixes: 889d916b6f ("RDMA/core: Don't access cm_id after its destruction")
Link: https://lore.kernel.org/r/073ec27acb943ca8b6961663c47c5abe78a5c8cc.1624948948.git.leonro@nvidia.com
Reported-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Fix a memory leak when "mda_resolve_route() is called more than once on
the same "rdma_cm_id".
This is possible if cma_query_handler() triggers the
RDMA_CM_EVENT_ROUTE_ERROR flow which puts the state machine back and
allows rdma_resolve_route() to be called again.
Link: https://lore.kernel.org/r/f6662b7b-bdb7-2706-1e12-47c61d3474b6@oracle.com
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
An approximation for the PacketLifeTime is half the local ACK timeout.
The encoding for both timers are logarithmic.
If the local ACK timeout is set, but zero, it means the timer is
disabled. In this case, we choose the CMA_IBOE_PACKET_LIFETIME value,
since 50% of infinite makes no sense.
Before this commit, the PacketLifeTime became 255 if local ACK
timeout was zero (not running).
Fixed by explicitly testing for timeout being zero.
Fixes: e1ee1e62be ("RDMA/cma: Use ACK timeout for RoCE packetLifeTime")
Link: https://lore.kernel.org/r/1624371207-26710-1-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The struct rdma_id_private contains three bit-fields, tos_set,
timeout_set, and min_rnr_timer_set. These are set by accessor functions
without any synchronization. If two or all accessor functions are invoked
in close proximity in time, there will be Read-Modify-Write from several
contexts to the same variable, and the result will be intermittent.
Fixed by protecting the bit-fields by the qp_mutex in the accessor
functions.
The consumer of timeout_set and min_rnr_timer_set is in
rdma_init_qp_attr(), which is called with qp_mutex held for connected
QPs. Explicit locking is added for the consumers of tos and tos_set.
This commit depends on ("RDMA/cma: Remove unnecessary INIT->INIT
transition"), since the call to rdma_init_qp_attr() from
cma_init_conn_qp() does not hold the qp_mutex.
Fixes: 2c1619edef ("IB/cma: Define option to set ack timeout and pack tos_set")
Fixes: 3aeffc46af ("IB/cma: Introduce rdma_set_min_rnr_timer()")
Link: https://lore.kernel.org/r/1624369197-24578-3-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In rdma_create_qp(), a connected QP will be transitioned to the INIT
state.
Afterwards, the QP will be transitioned to the RTR state by the
cma_modify_qp_rtr() function. But this function starts by performing an
ib_modify_qp() to the INIT state again, before another ib_modify_qp() is
performed to transition the QP to the RTR state.
Hence, there is no need to transition the QP to the INIT state in
rdma_create_qp().
Link: https://lore.kernel.org/r/1624369197-24578-2-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The bit field rsvd1 in resp is not being initialized and garbage data is
being copied from the stack back to userspace via the ib_copy_to_udata
call. Fix this by setting the entire struct resp to zero; this will ensure
that further new bit fields in the future will be zero'd too.
Link: https://lore.kernel.org/r/20210623182437.163801-1-colin.king@canonical.com
Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: 879740517d ("RDMA/bnxt_re: Update ABI to pass wqe-mode to user space")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
[jgg: remove extra zeroing]
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Currently get_srq_wqe() in rxe_resp.c copies the maximum possible number
of bytes from the wqe into the QPs copy of the SRQ wqe. This is usually
extra work and risks reading past the end of the SRQ circular buffer if
the SRQ is configured with less than the maximum possible number of SGEs.
Check the number of SGEs is not too large.
Compute the actual number of bytes in the WR and copy only those.
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20210618045742.204195-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
For IPV4 packets sent on the wire the rxe driver calls ip_local_out()
which immediately calls __ip_local_out() which sets iph->tot_len and calls
ip_send_check(). This code is duplicated in prepare4(). On the loopback
path the IP header checksum and tot_len fields are not used so they do not
need to be set.
Remove this redundant code.
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20210618045742.204195-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>