As more and more new asics start to reuse the old device IDs before
launch, there is a need to quickly override the existing asic type
corresponding to the reused device ID through a kernel parameter. With
this, engineers no longer need to rely on local hack patches,
facilitating cooperation across teams.
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Problem:
Under certain conditions, when some IP bocks take a RAS error,
we can get into a situation where a GPU reset is not possible
due to issues in RAS in SMU/PSP.
Temporary fix until proper solution in PSP/SMU is ready:
When uncorrectable error happens the DF will unconditionally
broadcast error event packets to all its clients/slave upon
receiving fatal error event and freeze all its outbound queues,
err_event_athub interrupt will be triggered.
In such case and we use this interrupt
to issue GPU reset. THe GPU reset code is modified for such case to avoid HW
reset, only stops schedulers, deatches all in progress and not yet scheduled
job's fences, set error code on them and signals.
Also reject any new incoming job submissions from user space.
All this is done to notify the applications of the problem.
v2:
Extract amdgpu_amdkfd_pre/post_reset from amdgpu_device_lock/unlock_adev
Move amdgpu_job_stop_all_jobs_on_sched to amdgpu_job.c
Remove print param from amdgpu_ras_query_error_count
v3:
Update based on prevoius bug fixing patch to properly call amdgpu_amdkfd_pre_reset
for other XGMI hive memebers.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
From rdma.git
Jason Gunthorpe says:
====================
This is a collection of general cleanups for ODP to clarify some of the
flows around umem creation and use of the interval tree.
====================
The branch is based on v5.3-rc5 due to dependencies, and is being taken
into hmm.git due to dependencies in the next patches.
* odp_fixes:
RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr
RDMA/mlx5: Use ib_umem_start instead of umem.address
RDMA/core: Make invalidate_range a device operation
RDMA/odp: Use kvcalloc for the dma_list and page_list
RDMA/odp: Check for overflow when computing the umem_odp end
RDMA/odp: Provide ib_umem_odp_release() to undo the allocs
RDMA/odp: Split creating a umem_odp from ib_umem_get
RDMA/odp: Make the three ways to create a umem_odp clear
RMDA/odp: Consolidate umem_odp initialization
RDMA/odp: Make it clearer when a umem is an implicit ODP umem
RDMA/odp: Iterate over the whole rbtree directly
RDMA/odp: Use the common interval tree library instead of generic
RDMA/mlx5: Fix MR npages calculation for IB_ACCESS_HUGETLB
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This is a significant simplification, it eliminates all the remaining
'hmm' stuff in mm_struct, eliminates krefing along the critical notifier
paths, and takes away all the ugly locking and abuse of page_table_lock.
mmu_notifier_get() provides the single struct hmm per struct mm which
eliminates mm->hmm.
It also directly guarantees that no mmu_notifier op callback is callable
while concurrent free is possible, this eliminates all the krefs inside
the mmu_notifier callbacks.
The remaining krefs in the range code were overly cautious, drivers are
already not permitted to free the mirror while a range exists.
Link: https://lore.kernel.org/r/20190806231548.25242-6-jgg@ziepe.ca
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The current code won't likely work on production hw when
it ships so leave it as experimental until it's ready.
Acked-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
DC already handles this correctly since amdgpu minor version 31. Bump
the minor version again so that xf86-video-amdgpu can take advantage of
this working without DC as well now.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm-next-5.4-2019-08-09:
Same as drm-next-5.4-2019-08-06, but with the
readq/writeq stuff fixed and 5.3-rc3 backmerged.
amdgpu:
- Add navi14 support
- Add navi12 support
- Add Arcturus support
- Enable mclk DPM for Navi
- Misc DC display fixes
- Add perfmon support for DF
- Add scatter/gather display support for Raven
- Improve SMU handling for GPU reset
- RAS support for GFX
- Drop last of drmP.h
- Add support for wiping memory on buffer release
- Allow cursor async updates for fb swaps
- Misc fixes and cleanups
amdkfd:
- Add navi14 support
- Add navi12 support
- Add Arcturus support
- CWSR trap handlers updates for gfx9, 10
- Drop last of drmP.h
- Update MAINTAINERS
radeon:
- Misc fixes and cleanups
- Make kexec more reliable by tearing down the GPU
ttm:
- Add release_notify callback
uapi:
- Add wipe memory on release flag for buffer creation
Signed-off-by: Dave Airlie <airlied@redhat.com>
[airlied: resolved conflicts with ttm resv moving]
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190809184807.3381-1-alexander.deucher@amd.com
drm-misc-next for 5.4:
UAPI Changes:
- HDCP: Add a Content protection type property
Cross-subsystem Changes:
Core Changes:
- Continue to rework the include dependencies
- fb: Remove the unused drm_gem_fbdev_fb_create function
- drm-dp-helper: Make the link rate calculation more tolerant to
non-explicitly defined, yet supported, rates
- fb-helper: Map DRM client buffer only when required, and instanciate a
shadow buffer when the device has a dirty function or says so
- connector: Add a helper to link the DDC adapter used by that connector to
the userspace
- vblank: Switch from DRM_WAIT_ON to wait_event_interruptible_timeout
- dma-buf: Fix a stack corruption
- ttm: Embed a drm_gem_object struct to make ttm_buffer_object a
superclass of GEM, and convert drivers to use it.
- hdcp: Improvements to report the content protection type to the
userspace
Driver Changes:
- Remove drm_gem_prime_import/export from being defined in the drivers
- Drop DRM_AUTH usage from drivers
- Continue to drop drmP.h
- Convert drivers to the connector ddc helper
- ingenic: Add support for more panel-related cases
- komeda: Support for dual-link
- lima: Reduce logging
- mpag200: Fix the cursor support
- panfrost: Export GPU features register to userspace through an ioctl
- pl111: Remove the CLD pads wiring support from the DT
- rockchip: Rework to use DRM PSR helpers, fix a bug in the VOP_WIN_GET
macro
- sun4i: Improve support for color encoding and range
- tinydrm: Rework SPI support, improve MIPI-DBI support, move to drm/tiny
- vkms: Rework of the CRC tracking
- bridges:
- sii902x: Add support for audio graph card
- tc358767: Rework AUX data handling code
- ti-sn65dsi86: Add Debugfs and proper DSI mode flags support
- panels
- Support for GiantPlus GPM940B0, Sharp LQ070Y3DG3B, Ortustech
COM37H3M, Novatek NT39016, Sharp LS020B1DD01D, Raydium RM67191,
Boe Himax8279d, Sharp LD-D5116Z01B
- Conversion of the device tree bindings to the YAML description
- jh057n00900: Rework the enable / disable path
- fbdev:
- ssd1307fb: Support more devices based on that controller
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190808121423.xzpedzkpyecvsiy4@flea
When doing a GPU reset or unloading the driver, we need to
put the SMU into the apprpriate state for the re-init after
the reset or unload to reliably work.
I don't think this is necessary for BACO because the SMU actually
controls the BACO state to it needs to be active.
For suspend (S3), the asic is put into D3 so the SMU would be
powered down so I don't think we need to put the SMU into
any special state.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Instead of using to_pci_dev + pci_get_drvdata,
use dev_get_drvdata to make code simpler.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Apply the same setting to SH_MEM_CONFIG and VM_CONTEXT1_CNTL. This
makes the noretry param no longer KFD-specific. On GFX10 I'm not
changing SH_MEM_CONFIG in this commit because GFX10 has different
retry behaviour in the SQ and I don't have a way to test it at the
moment.
Suggested-by: Christian König <Christian.Koenig@amd.com>
CC: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by : Shaoyun.liu < Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use the IP discovery table rather than hardcoding the
settings in the driver.
Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Navi10 will use sw smu driver for dynamic power managment,
while vega20 could also use sw smu driver when amdgpu_dpm is
set to 2
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amdgpu_mes, which is a driver scope parameter, is used
to whether enable mes or not.
MES (Micro Engine Scheduler) is the new on chip hw scheduling
microcontroller. It can be used to handle queue scheduling and
preemption and priorities.
Acked-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
VDDGFX requires gfx queue to be installed via MAP_QUEUES packet.
Hence, enable async gfx ring by default.
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm-misc-next for v5.3:
UAPI Changes:
Cross-subsystem Changes:
- Add code to signal all dma-fences when freed with pending signals.
- Annotate reservation object access in CONFIG_DEBUG_MUTEXES
Core Changes:
- Assorted documentation fixes.
- Use irqsave/restore spinlock to add crc entry.
- Move code around to drm_client, for internal modeset clients.
- Make drm_crtc.h and drm_debugfs.h self-contained.
- Remove drm_fb_helper_connector.
- Add bootsplash to todo.
- Fix lock ordering in pan_display_legacy.
- Support pinning buffers to current location in gem-vram.
- Remove the now unused locking functions from gem-vram.
- Remove the now unused kmap-object argument from vram helpers.
- Stop checking return value of debugfs_create.
- Add atomic encoder enable/disable helpers.
- pass drm_atomic_state to atomic connector check.
- Add atomic support for bridge enable/disable.
- Add self refresh helpers to core.
Driver Changes:
- Add extra delay to make MTP SDM845 work.
- Small fixes to virtio, vkms, sii902x, sii9234, ast, mcde, analogix, rockchip.
- Add zpos and ?BGR8888 support to meson.
- More removals of drm_os_linux and drmP headers for amd, radeon, sti, r128, r128, savage, sis.
- Allow synopsis to unwedge the i2c hdmi bus.
- Add orientation quirks for GPD panels.
- Edid cleanups and fixing handling for edid < 1.2.
- Add runtime pm to stm.
- Handle s/r in dw-hdmi.
- Add hooks for power on/off to dsi for stm.
- Remove virtio dirty tracking code, done in drm core.
- Rework BO handling in ast and mgag200.
Tiny conflict in drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c,
needed #include <linux/slab.h> to make it compile.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/0e01de30-9797-853c-732f-4a5bd6e61445@linux.intel.com
[Why]
It's non trivial to configure or specify an ABM reduction level for
userspace outside of X. There is also no method to specify the default
ABM value at boot time.
A parameter should be added to configure this.
[How]
Expose a module parameter that can specify the default ABM level to
use for eDP connectors on DC enabled hardware that loads the DMCU
firmware.
The default is still disabled (0), but levels can range from 1-4. Levels
control how much the backlight can be reduced, with being the least
amount of reduction and four being the most reduction.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: David Francis <david.francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This option is no longer needed. The default code paths
are now the only option.
v2: Add HPAGE support and a default for non contiguous maps
v3: Misread 512 pages as MiB ...
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add amdgpu_amdkfd interface to get num_gws and add num_gws
to /sys/class/kfd/kfd/topology/nodes/x/properties. Only report
num_gws if MEC FW support GWS barriers. Currently it is
determined by a module parameter which will be replaced
with MEC FW version check when firmware is ready.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We are getting a dma-buf implementation completely separate from drm prime,
so rename the files now and cleanup the code a bit.
No functional change.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>