Stanley.Yang
8882f90a3f
drm/amdgpu: add new query interface for umc block v2
...
add message smu to query error information
v2:
rename message_smu to ecc_info
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2021-11-22 14:45:14 -05:00
Tao Zhou
aaca8c3861
drm/amdgpu: add poison mode query for UMC
...
Add ras poison mode query interface for UMC.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2021-09-28 09:30:06 -04:00
Mukul Joshi
719e433ed0
drm/amdgpu: Fix channel_index table layout for Aldebaran
...
Fix the channel_index table layout to fetch the correct
channel_index when calculating physical address from
normalized address during page retirement.
Also, fix the number of UMC instances and number of channels
within each UMC instance for Aldebaran.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com >
Reviewed-By: John Clements <john.clements@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2021-08-05 21:17:58 -04:00
John Clements
186c8a8585
drm/amdgpu: initialize umc ras function
...
support umc ras function initialization for aldebaran
v2: squash in compile fix
Signed-off-by: John Clements <john.clements@amd.com >
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com >
Reviewed-by: Guchun Chen <guchun.chen@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2021-07-08 17:47:28 -04:00
Hawking Zhang
49070c4ea3
drm/amdgpu: split umc callbacks to ras and non-ras ones
...
umc ras is not managed by gpu driver when gpu is
connected to cpu through xgmi. split umc callbacks
into ras and non-ras ones so gpu driver only
initializes umc ras callbacks when it manages
umc ras.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com >
Reviewed-by: Dennis Li <Dennis.Li@amd.com >
Reviewed-by: John Clements <John.Clements@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2021-04-09 16:51:11 -04:00
Hawking Zhang
87da0cc101
drm/amdgpu: implement query_ras_error_address callback
...
query_ras_error_address will be invoked to query bad
page address when there is poison data in HBM consumed
by GPU engines.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com >
Acked-by: Alex Deucher <alexander.deucher@amd.com >
Reviewed-by: John Clements <John.Clements@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2021-04-09 16:51:01 -04:00
Hawking Zhang
878b9e944c
drm/amdgpu: implement umc query error count callback
...
umc query_ras_error_count will be invoked to query
umc correctable and uncorrectable error. It will
reset the umc ras error counter after the query.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com >
Acked-by: Alex Deucher <alexander.deucher@amd.com >
Reviewed-by: John Clements <John.Clements@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2021-04-09 16:50:58 -04:00
Hawking Zhang
3f903560d1
drm/amdgpu: add helper funtion to query umc ras error
...
Add helper functions to query correctable and
uncorrectable umc ras error.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com >
Acked-by: Alex Deucher <alexander.deucher@amd.com >
Reviewed-by: John Clements <John.Clements@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2021-04-09 16:50:56 -04:00
Hawking Zhang
1696bf3589
drm/amdgpu: create umc_v6_7_funcs for aldebaran
...
umc_v6_7_funcs are callbacks to support umc ras
functionalities in aldebaran
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com >
Acked-by: Alex Deucher <alexander.deucher@amd.com >
Reviewed-by: John Clements <John.Clements@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2021-04-09 16:50:52 -04:00