Pull misc x86 updates from Borislav Petkov:
"A variety of fixes which don't fit any other tip bucket:
- Remove unnecessary function export
- Correct asm constraint
- Fix __setup handlers retval"
* tag 'x86_misc_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Cleanup the control_va_addr_alignment() __setup handler
x86: Fix return value of __setup handlers
x86/delay: Fix the wrong asm constraint in delay_loop()
x86/amd_nb: Unexport amd_cache_northbridges()
Pull EDAC updates from Borislav Petkov:
- Switch ghes_edac to use the CPER error reporting routines and
simplify the code considerably this way
- Rip out the silly edac_align_ptr() contraption which was computing
the size of the private structures of each driver and thus allowing
for a one-shot memory allocation. This was clearly unnecessary and
confusing so switch to simple and boring kmalloc* calls.
- Last but not least, the usual garden variety of fixes, cleanups and
improvements all over EDAC land
* tag 'edac_updates_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/xgene: Fix typo processsors -> processors
EDAC/i5100: Remove unused inline function i5100_nrecmema_dm_buf_id()
EDAC: Use kcalloc()
EDAC/ghes: Change ghes_hw from global to static
EDAC/armada_xp: Use devm_platform_ioremap_resource()
EDAC/synopsys: Add a SPDX identifier
EDAC/synopsys: Add driver support for i.MX platforms
EDAC/dmc520: Don't print an error for each unconfigured interrupt line
EDAC/mc: Get rid of edac_align_ptr()
EDAC/device: Sanitize edac_device_alloc_ctl_info() definition
EDAC/device: Get rid of the silly one-shot memory allocation in edac_device_alloc_ctl_info()
EDAC/pci: Get rid of the silly one-shot memory allocation in edac_pci_alloc_ctl_info()
EDAC/mc: Get rid of silly one-shot struct allocation in edac_mc_alloc()
efi/cper: Reformat CPER memory error location to more readable
EDAC/ghes: Unify CPER memory error location reporting
efi/cper: Add a cper_mem_err_status_str() to decode error description
powerpc/85xx: Remove fsl,85... bindings
Combine all collected EDAC changes for submission into v5.19:
* edac-misc:
EDAC/xgene: Fix typo processsors -> processors
EDAC/i5100: Remove unused inline function i5100_nrecmema_dm_buf_id()
EDAC/ghes: Change ghes_hw from global to static
EDAC/armada_xp: Use devm_platform_ioremap_resource()
EDAC/synopsys: Add a SPDX identifier
EDAC/synopsys: Add driver support for i.MX platforms
EDAC/dmc520: Don't print an error for each unconfigured interrupt line
efi/cper: Reformat CPER memory error location to more readable
EDAC/ghes: Unify CPER memory error location reporting
efi/cper: Add a cper_mem_err_status_str() to decode error description
powerpc/85xx: Remove fsl,85... bindings
* edac-alloc-cleanup:
EDAC: Use kcalloc()
EDAC/mc: Get rid of edac_align_ptr()
EDAC/device: Sanitize edac_device_alloc_ctl_info() definition
EDAC/device: Get rid of the silly one-shot memory allocation in edac_device_alloc_ctl_info()
EDAC/pci: Get rid of the silly one-shot memory allocation in edac_pci_alloc_ctl_info()
EDAC/mc: Get rid of silly one-shot struct allocation in edac_mc_alloc()
Signed-off-by: Borislav Petkov <bp@suse.de>
The dmc520 driver requires that at least one interrupt line, out of the
ten possible, is configured. The driver prints an error and returns
-EINVAL from its .probe function if there are no interrupt lines
configured.
Don't print a KERN_ERR level message for each interrupt line that's
unconfigured as that can confuse users into thinking that there is an
error condition.
Before this change, the following KERN_ERR level messages would be
reported if only dram_ecc_errc and dram_ecc_errd were configured in the
device tree:
dmc520 68000000.dmc: IRQ ram_ecc_errc not found
dmc520 68000000.dmc: IRQ ram_ecc_errd not found
dmc520 68000000.dmc: IRQ failed_access not found
dmc520 68000000.dmc: IRQ failed_prog not found
dmc520 68000000.dmc: IRQ link_err not
dmc520 68000000.dmc: IRQ temperature_event not found
dmc520 68000000.dmc: IRQ arch_fsm not found
dmc520 68000000.dmc: IRQ phy_request not found
Fixes: 1088750d78 ("EDAC: Add EDAC driver for DMC520")
Reported-by: Sinan Kaya <okaya@kernel.org>
Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220111163800.22362-1-tyhicks@linux.microsoft.com
Use boring kzalloc() instead. Add pointers to the different allocated
members in struct edac_device_ctl_info for easier freeing later.
One of the reasons, perhaps, why it was done this way is to be able to
do a single kfree(ctl_info) without having to kfree() the other parts of
the struct too but that is not nearly a sensible reason as to why there
should be this obscure pointer alignment.
There should be no functional changes resulting from this.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220310095254.1510-4-bp@alien8.de
This has probably meant something at some point but there's no need for
it anymore - the struct mem_ctl_info allocation can happen with normal,
boring k*alloc() calls like everyone else does it.
No functional changes.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220310095254.1510-2-bp@alien8.de
Switch the GHES EDAC memory error reporting functions to use the common
CPER ones and get rid of code duplication.
[ bp:
- rewrite commit message, remove useless text
- rip out useless reformatting
- align function params on the opening brace
- rename function to a more descriptive name
- drop useless function exports
- handle buffer lengths properly when printing other detail
- remove useless casting
]
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220308144053.49090-3-xueshuai@linux.alibaba.com
amd_cache_northbridges() is exported by amd_nb.c and is called by
amd64-agp.c and amd64_edac.c modules at module_init() time so that NB
descriptors are properly cached before those drivers can use them.
However, the init_amd_nbs() initcall already does call
amd_cache_northbridges() unconditionally and thus makes sure the NB
descriptors are enumerated.
That initcall is a fs_initcall type which is on the 5th group (starting
from 0) of initcalls that gets run in increasing numerical order by the
init code.
The module_init() call is turned into an __initcall() in the MODULE=n
case and those are device-level initcalls, i.e., group 6.
Therefore, the northbridges caching is already finished by the time
module initialization starts and thus the correct initialization order
is retained.
Unexport amd_cache_northbridges(), update dependent modules to
call amd_nb_num() instead. While at it, simplify the checks in
amd_cache_northbridges().
[ bp: Heavily massage and *actually* explain why the change is ok. ]
Signed-off-by: Muralidhara M K <muralimk@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220324122729.221765-1-nchatrad@amd.com
A bug in legacy U-Boot causes a crash during SDRAM boot if ECC is not
enabled in the bitstream but enabled in the Linux config.
Memory mapped read of the ECC Enabled bit was only enabled if U-Boot
determined ECC was enabled in the bitstream.
The Linux driver checks the ECC enable bit using a memory map read.
In the ECC disabled bitstream case, U-Boot didn't enable ECC register
memory map reads and since they are not allowed this results in a crash.
Always read the ECC Enable register through a SMC call which is always
allowed and it works with legacy and current U-Boot.
[ bp: Massage commit message. ]
Signed-off-by: Rabara Niravkumar L <niravkumar.l.rabara@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Link: https://lore.kernel.org/r/20220305014118.4794-1-niravkumar.l.rabara@intel.com
Introduce a "family flags" bitmask that can be used to indicate any
special behavior needed on a per-family basis.
Add a flag to indicate a system uses the new register offsets introduced
with Family 19h Model 10h.
Use this flag to account for register offset changes, a new bitfield
indicating DDR5 use on a memory controller, and to set the proper number
of chip select masks.
Rework f17_addr_mask_to_cs_size() to properly handle the change in chip
select masks. And update code comments to reflect the updated Chip
Select, DIMM, and Mask relationships.
[uninitialized variable warning]
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: William Roche <william.roche@oracle.com>
Link: https://lore.kernel.org/r/20220202144307.2678405-3-yazen.ghannam@amd.com
Current AMD systems allow mixing of DIMM types within a system. However,
DIMMs within a channel, i.e. managed by a single Unified Memory
Controller (UMC), must be of the same type.
Handle this possible configuration by checking and setting the memory
type for each individual "UMC" structure.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: William Roche <william.roche@oracle.com>
Link: https://lore.kernel.org/r/20220202144307.2678405-2-yazen.ghannam@amd.com
Do alignment logic properly and use the "ptr" local variable for
calculating the remainder of the alignment.
This became an issue because struct edac_mc_layer has a size that is not
zero modulo eight, and the next offset that was prepared for the private
data was unaligned, causing an alignment exception.
The patch in Fixes: which broke this actually wanted to "what we
actually care about is the alignment of the actual pointer that's about
to be returned." But it didn't check that alignment.
Use the correct variable "ptr" for that.
[ bp: Massage commit message. ]
Fixes: 8447c4d15e ("edac: Do alignment logic properly in edac_align_ptr()")
Signed-off-by: Eliav Farber <farbere@amazon.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220113100622.12783-2-farbere@amazon.com
The driver overrides error codes returned by platform_get_irq_optional()
to -EINVAL for some strange reason, so if it returns -EPROBE_DEFER, the
driver will fail the probe permanently instead of the deferred probing.
Switch to propagating the proper error codes to platform driver code
upwards.
[ bp: Massage commit message. ]
Fixes: 0d4429301c ("EDAC: Add APM X-Gene SoC EDAC driver")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220124185503.6720-3-s.shtylyov@omp.ru
The driver overrides the error codes returned by platform_get_irq() to
-ENODEV for some strange reason, so if it returns -EPROBE_DEFER, the
driver will fail the probe permanently instead of the deferred probing.
Switch to propagating the proper error codes to platform driver code
upwards.
[ bp: Massage commit message. ]
Fixes: 71bcada88b ("edac: altera: Add Altera SDRAM EDAC support")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220124185503.6720-2-s.shtylyov@omp.ru
There are currently 2 ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field. Move the edac sysfs code to use default_groups field which has
been the preferred way since
aa30f47cf6 ("kobject: Add support for default attribute groups to kobj_type")
so that the obsolete default_attrs field can be removed soon.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220104112401.1067148-2-gregkh@linuxfoundation.org
The EDAC sysfs code is doing some crazy casting of the list of
attributes that is not necessary at all. Instead, properly point to the
correct attribute structure in the lists, which removes the need to
cast anything and the code is now properly typesafe (as much as sysfs
attribute logic is typesafe...)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220104112401.1067148-1-gregkh@linuxfoundation.org
Pull EDAC updates from Borislav Petkov:
- Add support for version 3 of the Synopsys DDR controller to
synopsys_edac
- Add support for DRR5 and new models 0x10-0x1f and 0x50-0x5f of AMD
family 0x19 CPUs to amd64_edac
- The usual set of fixes and cleanups
* tag 'edac_updates_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/amd64: Add support for family 19h, models 50h-5fh
EDAC/sb_edac: Remove redundant initialization of variable rc
RAS/CEC: Remove a repeated 'an' in a comment
EDAC/amd64: Add support for AMD Family 19h Models 10h-1Fh and A0h-AFh
EDAC: Add RDDR5 and LRDDR5 memory types
EDAC/sifive: Fix non-kernel-doc comment
dt-bindings: memory: Add entry for version 3.80a
EDAC/synopsys: Enable the driver on Intel's N5X platform
EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR
EDAC/synopsys: Use the quirk for version instead of ddr version
Pull RAS updates from Borislav Petkov:
"A relatively big amount of movements in RAS-land this time around:
- First part of a series to move the AMD address translation code
from arch/x86/ to amd64_edac as that is its only user anyway
- Some MCE error injection improvements to the AMD side
- Reorganization of the #MC handler code and the facilities it calls
to make it noinstr-safe
- Add support for new AMD MCA bank types and non-uniform banks layout
- The usual set of cleanups and fixes"
* tag 'ras_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/mce: Reduce number of machine checks taken during recovery
x86/mce/inject: Avoid out-of-bounds write when setting flags
x86/MCE/AMD, EDAC/mce_amd: Support non-uniform MCA bank type enumeration
x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types
x86/mce: Check regs before accessing it
x86/mce: Mark mce_start() noinstr
x86/mce: Mark mce_timed_out() noinstr
x86/mce: Move the tainting outside of the noinstr region
x86/mce: Mark mce_read_aux() noinstr
x86/mce: Mark mce_end() noinstr
x86/mce: Mark mce_panic() noinstr
x86/mce: Prevent severity computation from being instrumented
x86/mce: Allow instrumentation during task work queueing
x86/mce: Remove noinstr annotation from mce_setup()
x86/mce: Use mce_rdmsrl() in severity checking code
x86/mce: Remove function-local cpus variables
x86/mce: Do not use memset to clear the banks bitmaps
x86/mce/inject: Set the valid bit in MCA_STATUS before error injection
x86/mce/inject: Check if a bank is populated before injecting
x86/mce: Get rid of cpu_missing
...
AMD systems currently lay out MCA bank types such that the type of bank
number "i" is either the same across all CPUs or is Reserved/Read-as-Zero.
For example:
Bank # | CPUx | CPUy
0 LS LS
1 RAZ UMC
2 CS CS
3 SMU RAZ
Future AMD systems will lay out MCA bank types such that the type of
bank number "i" may be different across CPUs.
For example:
Bank # | CPUx | CPUy
0 LS LS
1 RAZ UMC
2 CS NBIO
3 SMU RAZ
Change the structures that cache MCA bank types to be per-CPU and update
smca_get_bank_type() to handle this change.
Move some SMCA-specific structures to amd.c from mce.h, since they no
longer need to be global.
Break out the "count" for bank types from struct smca_hwid, since this
should provide a per-CPU count rather than a system-wide count.
Apply the "const" qualifier to the struct smca_hwid_mcatypes array. The
values in this array should not change at runtime.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211216162905.4132657-3-yazen.ghannam@amd.com
Add HWID and McaType values for new SMCA bank types, and add their error
descriptions to edac_mce_amd.
The "PHY" bank types all have the same error descriptions, and the NBIF
and SHUB bank types have the same error descriptions. So reuse the same
arrays where appropriate.
[ bp: Remove useless comments over hwid types. ]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211216162905.4132657-2-yazen.ghannam@amd.com
scripts/kernel-doc complains about a comment that begins with "/**"
but is not in kernel-doc format, so correct it.
Prevents this warning:
drivers/edac/sifive_edac.c:23: warning: This comment starts with '/**', \
but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* EDAC error callback
Fixes: 91abaeaaff ("EDAC/sifive: Add EDAC platform driver for SiFive SoCs")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211201030913.10283-1-rdunlap@infradead.org
Define an address translation context struct. This will hold values that
will be passed between multiple functions.
Save return address, Node ID, and the Instance ID number to start.
Currently, the UMC number is used as the Instance ID, but future DF
versions may use another value.
Also include a "tmp" field to use when reading registers. This is to
avoid having to define a temporary variable in multiple functions.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211028175728.121452-5-yazen.ghannam@amd.com
The DF Indirect Access method allows for "Broadcast" accesses in which
case no specific instance is targeted. Add support using a reserved
instance ID of 0xFF to indicate a broadcast access. Set the FICAA
register appropriately.
Define helpers functions for instance and broadcast reads and use them
where appropriate.
Drop the "amd_" prefix since these functions are all static.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211028175728.121452-4-yazen.ghannam@amd.com
Pull EDAC updates from Borislav Petkov:
"A small pile of EDAC updates which the autumn wind blew my way. :)
- amd64_edac: Add support for three-rank interleaving mode which is
present on AMD zen2 servers
- The usual fixes and cleanups all over EDAC land"
* tag 'edac_updates_for_v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/sb_edac: Fix top-of-high-memory value for Broadwell/Haswell
EDAC/ti: Remove redundant error messages
EDAC/amd64: Handle three rank interleaving mode
EDAC/mc_sysfs: Print MC-scope sysfs counters unsigned
EDAC/al_mc: Make use of the helper function devm_add_action_or_reset()
EDAC/mc: Replace strcpy(), sprintf() and snprintf() with strscpy() or scnprintf()
The number of correctable errors is displayed as uncorrectable
errors because the "SBE" error count is passed to both calls of
edac_mc_handle_error().
Pass the correct uncorrectable error count to the second
edac_mc_handle_error() call when logging uncorrectable errors.
[ bp: Massage commit message. ]
Fixes: 7f6998a412 ("ARM: 8888/1: EDAC: Add driver for the Marvell Armada XP SDRAM and L2 cache ECC")
Signed-off-by: Hans Potsch <hans.potsch@nokia.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20211006121332.58788-1-hans.potsch@nokia.com